Date your data-How to improve model accuracy



Can anyone give any pointers on how to proceed further for improving accuracy for this hackathon (Mine is only 0.54)
Steps followed by me so far:

  1. Joined Internship data to train data.
    2 Created features for student data such as distinct profiles got by a student, and distinct work exp for student , and then made my data frame to contain only unique student info
  2. Joined my student data with the data in first step

Once I had the complete data, I tried developing models( LR, RF, Xgboost) by considering intuitive features which can help in a candidate getting shortlisted. Additionally I created variables such as duration check (Duration given by Student < Duration required by Company), part-time check etc. But all this effort could not help me in improving my accuracy.
Further I used SMOTE package to remove the imbalance of classes, which also didn’t help much

Am I going somewhere wrong!! please assist


Hi @sowmiyanm,

Here are the things you can do -

  1. Treating Outliers, there are lot of inconsistencies in data if you look for it you will know
  • Reducing categories in which there is high categorical count

  • Creating lot of new features based on business understanding of the problem

  • Doing some text analytics on intership profile and student profile

  • Optimizing and tuning your models

  • Finally Ensembling

Hope this helps.




I am getting an extremely low score of 0.46 (2nd last in the leaderboard). Can you give me some pointers to improve it?
I have used both logistic regression and random forest separately. Both give a very poor score of 0.46.

I have just used the train.csv file. And converted the categorical features to dummy variables, and created a dataset of 179 features on which I train logistic regression /. random forest model.

How to use other csv files the internship and student data? Can someone guide me there ?




Hi @ajayram198

Build new features will be the solution in this case, Usually when you have multiple files you can get some information about the context for example UG score for the student and this information is not in train!!, the skill matching is also one topic you have to look at I think. I did not do well either on this competition I am sure other people have better ideas.
In few words features engineering :slight_smile:
Do not worry you will do better next time.


Hi @Lesaffrea,

Is the hackathon still open ? Can we still submit solutions? I would like to continue learning how to improve my model. And what feature engineering I can further do .


You could possibly have a look at the winners code that was shared.