Difference in train and test values




Below is the size of train and test dataset -

X_train.shape, y_train.shape, X_test.shape
((548, 7), (548,), (548, 6))

After running a Linear Regression, I am getting an error as follows -
ValueError: shapes (548,6) and (7,) not aligned: 6 (dim 1) != 7 (dim 0)

What is the reason behind it and how to correct it?


hello @ASHISH_17

I suppose you have not removed the target variable from your X_train. So remove that column and try again.

Hope this helps.


Thanks for the reply. I found the mistake.

Can you tell me how to change ‘object’ dtype to ‘int’.
As is is a combination of string and int in the form - id100001.

Will doing splitting of id and the int part can work?
Or is there an alternative?



Yeah, the best option is to extract out the latter part after id, and then convert them into ‘int’ dtype.


Hi @shubham.jain,

How to correct this error -

ValueError: shapes (622764,10) and (11,) not aligned: 10 (dim 1) != 11 (dim 0)

I am pasting the features that I have used in the train and test -



trip duration is the target variable.

Kindly help me get out of this dilemma.



feature and feature_cols that you have created should contain same features for the modelling purpose. As far I can see that you have taken ‘id’ in feature but not in feature_cols, which is the reason behind the error.

So remove ‘id’ from the feature list as it would not be a useful feature.
Hope this will solve your problem.


Hey @shubham.jain, I have removed the ‘id’ variable and successfully got the result.

I wanted to know can’t we assign the different no of features to train and test values?


No, the features should be the same.


Thanks a lot for helping me @shubham.jain

I am facing another problem.
I have grouped the pickup datetime on the basis of a day of a year.
So in total there are 180 days. But instead of getting days ranging from 1 to 180, I am getting days from -128 to 127.

Can you tell me what could be the problem?

I have used dt.dayofyear

I was getting a correct order of days but when I plotted a line plot and compared with the test data it showed me -128 to 127 days.