Discussions for article "A Complete Tutorial to Learn Data Science with Python from Scratch"



Hi All,

The article “A Complete Tutorial to Learn Data Science with Python from Scratch” is quiet old now and you might not get a prompt response from the author.

We would request you to post your queries here to get them resolved.

A brief description of the article -

This article gives a step by step guide for beginners who wish to start their journey in data science using python. It includes introduction to python, python libraries and data structures. Furthermore the three most common ML algorithms, logistic regression, decision tree and random forest are explained and implemented in this tutorial.



Many of the codes in that tutorial is either become obsolete or don’t work as explained by the author.

kindly see to it, as this would create so much distraction for beginners.

Thank you!


Hi @vibhuk16,

Thanks for notifying. Codes have been updated.

Happy learning!!


Code need minor updation like
cross_validation module is deprecated. Therefore
from sklearn.cross_validation import KFold should change to
from sklearn.model_selection import KFold

n_folds changed to n_splits
kf = KFold(n_splits=5)
for train, test in kf: should change to
kf.split(data[predictors]) // I am not sure whether we should pass data[predictors] or some other value. But it compiles fine


Hi @nadeeshtv,

Thanks for pointing it out. We will update the same in the article.


Still I am getting this error …

TypeError Traceback (most recent call last)
in ()
2 model = LogisticRegression()
3 predictor_var = [‘Credit_History’]
----> 4 classification_model(model, df,predictor_var,outcome_var)

in classification_model(model, data, predictors, outcome)
20 #Perform k-fold cross-validation with 5 folds
—> 21 kf = KFold(data.shape[0], n_splits=5)
23 error =

TypeError: init() got multiple values for argument 'n_splits


I am getting a similar error here.
Do post if you find something?



I got it to work using nadeeshtv advise!

I changed the code into:

kf = KFold(n_splits=5)
error =
for train, test in kf.split(data[predictors]):

Since I am still very new to python, I can’t assure you this is the correct code to use. I can only tell you that using this code I could run the program without errors and it gave me the same result as in the original tutorial.

EDIT: I just realized that this way it is not running well afterall, as it seems to only take into account the first predictor variable instead of all predictor variables in the list… If anyone knows how to resolve this, please let me know!


Hi @nadeeshtv,

My model is working perfectly fine with cross_validation and n_folds. What is the error that you get?


Hi @abhijitaradhye, @data_crat

Please use from sklearn.cross_validation import KFold and n_folds. I copy pasted the code from the article and here are the results-