Need Help Improving Accuracy of Model

data_science
feature_engineering
python

#1

This is what I followed:

  1. load dataset and separate features and target variables
  2. Separate numeric, categorical and ordinal variables
  3. Imputed numeric variables with median (using groupby) and categorical/ordinal with most frequent values.
  4. Encoded categorical values using one hot encoder(pd.get_dummies) and ordinal with label
    encoder.
  5. Used GridSearchCV to tune hyper parameters of LogisticRegression, RandomForest, SVM, KNN, XGBoost. Highest accuracy was 0.784 with XGBoost. LogisticRegression, SVM and RandomForest gave 0.77.
    I’m new to data science and have completed datacamp courses and read analytics vidya blogposts. I spent considerable amount of time on this problem but the accuracy is not increasing. I tried standardizing features and scaling for appropriate algorithms and using subset of features(most significant of them). Any help will be appreciated.

#2

HI @kakashi

There are various things which you can try:

Feature engineering

Ensembling

Also, you can do some visualization before in order to get some insights, which can be further useful in creating new features.

Further, you can read some winner’s approach from here to get more ideas,


Hope this helps.
Shubham


#3

thanks shubham. I tried ensembling but the accuracy is same as logistic regression. I ensembled 6 models. I’m going to just move on to next problem and check out winners code once competetion is over.


#4

hello please I am new in data science and I have problem loading the dataset, please can somebody put me through?


#5

Hi @dayomitchell

You can refer this article.

Cheers
Shubham


#6

blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } Hi Shubham, Thanks a lot well appreciated

shubham.jain
September 1 |

Hi @dayomitchell

You can refer this article.

Cheers
Shubham

Visit Topic or reply to this email to respond.

In Reply To

dayomitchell
September 1 |

hello please I am new in data science and I have problem loading the dataset, please can somebody put me through?
Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.


#7

Hi,

Is accuracy of 83.22% good for this problem? I tried feature selection and ensembling as suggested by Shubham here but can’t improve the accuracy beyond 83.22%. I am thinking to submit the code now and move on to another problem.

Thanks,
Himanshu


#8

I’ve tried the 3 techniques shown in the introductory learning: linear regression, tree decision & random forest. By filling in the missing values with mean. Each of them gave 77~78% accuracy.

What is your strategy to achieve >80% accuracy?

  1. Change method to fill in missing values?
  2. Try different columns (predictor) in prediction?
  3. Explore other techniques for prediction? (appreciate if you would share what technique you use)
  4. Explore parameters in linear regression/ tree decision/ random forest?

I had been trying #1 & #2, but there I am still stuck at <78%.

Thanks.