Linear Regression



I was going through few linear regression models. Can someone explain the relevance of " random_state" during the splitting of the data to trained set and test test.

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.30, random_state=0)

what is the consequence of setting different values to random_state?


I think it has to do with the selection of the data for the smaller test set in a random pattern or not. If it is set to one its on and zero is off. When set to zero it just takes the first 30% of the data entries in sequence. Using the random setting may give better results depending on the data.


Hello Vishal,

random_state is the seed used by the random number generator

setting this will enable you to get the exact same records in train and test sets splits, for reproducibilty of splits.

Hope this helps.

Jagdish Joshi