Use this category for the discussions related to contest: The Data Identity: Student DataFest 2018 Hackathon which will be starting from 15th May 2018. Feel free to share your approach & ask your questions here.

For more information, visit:

Use this category for the discussions related to contest: The Data Identity: Student DataFest 2018 Hackathon which will be starting from 15th May 2018. Feel free to share your approach & ask your questions here.

For more information, visit:

I think your account is not verified for the data fest 2018 because earlier I also got the same problem but after verification of my account now it’s ok.

Hello. I can’t join to the slack chat, it shows me this error: “already_in_team”. What can I do?

Thanks!

There’s a channel for student-datafest, join that if you haven’t…(name of the channel is **studentdatafest2018**)

Otherwise the error is prominent enough in the sense that you are already a member of that channel

…

You can get the yourself via these small code line,

```
import numpy as np
np.unique(df['education'])
```

`Output`

`'Bachelors', 'High School Diploma', 'Masters', 'Matriculation', 'No Qualification'`

Would you share any ideas how to enhance the accuracy on the model?

already used scaling, and parameter tuning

Its not so much about feature selection than feature engineering. Adding new features is often the best way to increase both diversity and quality of models.

Simply trying to time the parameters won’t take you to #1 so easily as that can be done by everyone.

`Don't bother predicting if you can't validate that your model is learning`

So, focus of Feature Engineering as this is what ML…

Hi all,

Here is the benchmark solution (Python) for The Data Identity hackathon to get you all started with the problem:

Happy learning!!

Hi @deepam,

You can use the mean, median or mode to impute the missing values in the dataset. For categorical variables you can use mode and for numerical variables you can use mean or median.

To learn advanced methods to treat missing values in a dataset, you can refer the below mentioned article:

I am using Decision Tree algorithm. I am getting decimal values as classes after prediction. Initially there are two classes ‘0’ and ‘1’ but after prediction I got result in four classes in decimal values as ‘0.56’, ‘0.58’, ‘0.84’ , ‘0.79’.

How it can be solved?

There are leaks in this dataset. If present, leaks are generally exploited in most of the data competitions. Can leaks be used here as well ?