Creating new features in the bike sharing problem



I’m working on the bike sharing problem in kaggle. My hypothesis is that higher the temperature and lower the humidity people prefer to use bikes. So I created the variable:
data$newfeature = data$temp/data$humidity

Though this feature has a very high importance in my model the model’s accuracy has decreased a little bit.

I think the problem here is I that the temperature has a different scale and the humidity has a different scale. Should I scale it? If then how? What is the most optimum way in case of this problem to create a feature containing my hypothesis?



Hi @B.Rabbit,

One way I can think of creating the new feature is to use “if then” kind of rule. You can bin temperatures & Humidity into 3 categories - High, Medium, Low. Then use something like “If temp is High and humidity is High” then New Feature == 1 etc. New Feature will take on any of the 9 values (maximum) corresponding to different combinations of Temperature and Humidity.

Hope this makes sense. All the best.