I am beginner for Analytics. i am trying to predict electricity usage in the household using external and internal temperature and other weather condition .i have done regression model but i am getting very low R squared value. like 14%. what are some good practice to improve accuracy.? what are the initial steps to be followed ?
It simply means there is not enough features or information available from your independent variables. Figure out what other features you can add as independent variable which impacts the final dependent variable.
What you have predicted with 14% R square is beyond actual utilities using electricity there is a connection between temperate/weather and consumption.
14% may sound low, but you are dealing with minimal effect on actual electricity reading as the readings direct contributor is utilities used during this time.
How you may want create feature is by creating a relationship between what is happening to the independent variables based on weather and creating some factors around it or features around it.
You can try ridge and lasso regression means you can add regularization to improve the model performance.
You can also do transformation of variables such as log transform and exponential transform.
I don’t think it helps you much. reasons:
1: May be not enough information provided by the features to the model and model is not able to make good decision boundary.
2: May be features are not linearly separable in that case i don’t think regression work.
What you can try:
You can try tree based models. create a tree based model with default parameters if that performs better than regression then your problem is more suitable for tree based models.
First you have 14% R square you do not explain lot of variance, now did you do one analysis of the variance? or of leverage. Electricity goes through very extreme period for example Christmas in Europe. You exclude those periods rebuild your model, other way of doing is the check the leverage an exclude the high leverage.
Concerning feature is weather linear with consumption no !! it is a quadratic relation in europe and australia try with the square of temperature.
Hope this help.
Please make sure your dataset follows the linear regression assumptions.
Could be due to:
- Less number of variables or data
- Due to multicollinearity or non-linearity between the X and the Y variables
- Could be due to non-constant variance
- Try to get more data
- Try to perform a shapiro wilk test to see if your residuals are normally distributed, if not , try different data transformations
- Try to build a neural networks model if you have enough data as it accounts for the interactions between the variables
- Try out different models(Ex: random forests, ensemble, generalized linear model, etc.) to see which one favours your dataset
Do you want to share the dataset you are working on and analysis where you are stuck.