Creating new features in the bike sharing problem

kaggle
feature_engineering
bikesharing

#1

Hello,
I’m working on the bike sharing problem in kaggle. My hypothesis is that higher the temperature and lower the humidity people prefer to use bikes. So I created the variable:
data$newfeature = data$temp/data$humidity

Though this feature has a very high importance in my model the model’s accuracy has decreased a little bit.

I think the problem here is I that the temperature has a different scale and the humidity has a different scale. Should I scale it? If then how? What is the most optimum way in case of this problem to create a feature containing my hypothesis?

Regards


#2

Hi @B.Rabbit,

One way I can think of creating the new feature is to use “if then” kind of rule. You can bin temperatures & Humidity into 3 categories - High, Medium, Low. Then use something like “If temp is High and humidity is High” then New Feature == 1 etc. New Feature will take on any of the 9 values (maximum) corresponding to different combinations of Temperature and Humidity.

Hope this makes sense. All the best.