Help to Improve Multiple Regression Model

black_friday

#1

Hi, I’m a newb to Data Science. I’m trying to improve my Mulitple regression model but running out of ideas.

Here’s the structure of Blackfriday training data.

str(train)
‘data.frame’: 550068 obs. of 10 variables:
Gender : Factor w/ 2 levels "F","M": 1 1 1 1 2 2 2 2 2 2 ... Age : Factor w/ 7 levels “0-17”,“18-25”,…: 1 1 1 1 7 3 5 5 5 3 …
Occupation : Factor w/ 21 levels "0","1","2","3",..: 11 11 11 11 17 16 8 8 8 21 ... City_Category : Factor w/ 3 levels “A”,“B”,“C”: 1 1 1 1 3 1 2 2 2 1 …
Stay_In_Current_City_Years: Factor w/ 5 levels "0","1","2","3",..: 3 3 3 3 5 4 3 3 3 2 ... Marital_Status : Factor w/ 2 levels “0”,“1”: 1 1 1 1 1 1 2 2 2 2 …
Product_Category_1 : Factor w/ 20 levels "1","2","3","4",..: 3 1 12 12 8 1 1 1 1 8 ... Product_Category_2 : Factor w/ 18 levels “0”,“2”,“3”,“4”,…: 1 6 1 14 1 2 8 15 16 1 …
Product_Category_3 : Factor w/ 16 levels "0","3","4","5",..: 1 12 1 1 1 1 15 1 1 1 ... Purchase : int 8370 15200 1422 1057 7969 15227 19215 15854 15686 7871 …

Here’s my simple model:
fit1 <- lm(Purchase ~.,data = train)
Since all independent variables are Categorical, not sure how to go about in improving this model, considering data set to be a little large and Lot of categorical levels.

Please help!


#2

Hi @Rakesh_Kumar1,

Whenever you attempt to increase the accuracy of a machine learning problem, you should follow through this checklist:

Regarding categorical variables, you can use methods such as

  • Convert categorical values to numbers, i.e. label encoding them
  • Combine multiple categorical values to more discrete classes. For example, you can combine countries to continents.

Refer the resource on How to handle categorical values

I suggest you to go through this article on winners approach for Black friday problem.


#3