Cross Validation in Data Science Life Cycle

r
machine_learning

#1

Hi All,

I am a newbie in Data Science. I would like to know when shall we use Cross Validation techniques in a Data Science Project.
For ex . I understand K Fold Cross Validation technique is used to iterate over the entire dataset to choose training and test data set and get the maximum accuracy in a particular combination of training and test set. Then we will calculate the RMSE. In this i would like to know only based on Cross Validation technique we will choose the training and test set which gives the maximum accuracy ???

Thank you
Nagu.


#2

Hi @nagu2487,

We use cross validation to check whether our algorithm would perform well on a newer data (aka test). So what we do generally is take random samples from the train data (aka validation) and check our algorithm’s accuracy. So our aim (in k-fold CV) is not to choose the best data, but to get the best generalized model.

Hope it helps!


#3

Great thank you so much Sir. I understand that once the initial model development is done and productionised , newer sample is given as a feed to our model we would have already kept our code for Cross Validation in place (during our development) and it will check for the accuracy of our model for our newer dataset. Kindly help me whether my understanding is right.


#4

Hi @nagu2487,

Read [this] (http://www.analyticsvidhya.com/blog/2015/11/improve-model-performance-cross-validation-in-python-r/) article for a clearer understanding of the concept.

On a side note…please don’t call me sir, it makes me quesy :stuck_out_tongue:


#5

@jalFaizy : Thank you so much buddy i got it :grin: