Python 2.7 or 3.5 - which one to choose for Data Science?

Hi everyone,

I just switched from R to Python and I am amazed with the powerf of Python in comparison of R (I think because of structures that python support like tuple,set ,…)

Anyways, I installed python 3.5 but in most of the tutorials over internet, people use python 2.7. This baffles me in some error like the simple print function that should use () and the harder one in GridsearchCv function that doesn’t support the dict parameter.

I am now confused and not sure which out of the two is better? Do u think it is better to use python 2.7?

There is no right or wrong answer to choose between these versions. Normally, I say that find out the best tutorial /course / reference which works for you and go ahead with whatever version it was written on. Before I compare the two versions of the language, it makes sense to quickly understand why do these two versions exist in first place?

##Why do we have 2 versions of the same language which are incompatible?
Like all good things, programming languages evolve over time. Most of the time, they are compatible with previous versions. But imagine what would you do, if you know that there are some features in language which need a change to make it better, but can not be backward compatible. Same happened with Python.

In this case, Guido van Rossum (the original creator of the Python language) decided to clean up Python 2.x properly, with less regard for backwards compatibility than is the case for new releases in the 2.x range. The most drastic improvement is the better Unicode support (with all text strings being Unicode by default) as well as saner bytes / Unicode separation. Don’t worry, if you don’t understand the last part!

Several aspects of the core language were adjusted to make it easier for newcomers to learn and to be more consistent with the rest of the language. So in summary, you had to bite a bad bullet - better do it as soon as you can! That is what lead to co-existence of 2 parallel versions of the same language.

Here are the pros and cons of choosing one version over another:

##Benefits of version 2.7
Version 2.7 is the most popular version of Python till date. It has been tried and tested extensively by people across the globe and is compatible with most of the previous versions of code in use today. This means that if you want to rely on a lot of third party libraries for your work, you have a better chance of finding them for 2.7 today over 3.5. Same thing applies for any roadblocks you might hit.

Other general scenario is that if your application is using legacy code, you should stick with 2.7, but that doesn’t look to be a concern here.

##Cons of version 2.7 (or benefits of Python 3.5)
3.5 is the future of Python. There is no new development planned on Python 2.7 and it is planned to phase out this version by 2020. So, if you stick with 2.7, you might miss out on any advancements in the language. For example, if some of the libraries are re-written to reduce the execution time, Python 3.5 is going to benefit from it but not 2.7

If you are starting afresh and can find tutorials on 3.5, then I would recommend to go with that. On the other hand, if you are using legacy code or are dependent on a lot of third party tools, you should stick with 2.7

Hope this helps



Hi Kunal
I just switch to python 2.7 :slightly_smiling:and hope that help me alot in data science experience

© Copyright 2013-2020 Analytics Vidhya