Welcome to Practice Problem : Recommendation Engine



Welcome to Practice Problem : Recommendation Engine

This will be the official thread for any discussion related to practice problem. Feel free to ask questions, share approaches and learn.

Which algorithm is best suited for Recommendation System problem?

Where is its user data??


You can find user_data inside the zipped train file.


oh it was not coming in before, and only downloading train_sumission file


What is the difference between rating and max rating


HI, I am getting error please check with test data, all ids are not available. But, I have checked I am submitting all 66555 test IDs in the correct format. But, still I am unable to submit.
Please resolve this.


Can you explain the difference between “submission_count” and “problem_solved” columns?



Problem_solved will only include total number of problems a user has participated in. However, given you can submit multiple time submission count should be always greater than equal to problem solved


Hi All,

I am facing a problem with the test data. I have merged the user_data, problem_data and training data by inner join. Now I can train the model but I am not getting any clue how to test the testing_ data. Because the Attempts_Range column is empty and that is what we have to predict.

Could you please give me a clue on the above issue?

Thanks and Regards,


Hi Ankur, To train the model separate the features variables and target and train your model on the features.

Then using this model - you can predict on the features on testing dataset to get the outcome.

Also, I would recommend you to go through this article, to follow through the basics of participating in a machine learning hackathon


Hello, Can you please explain the contribution column and is it okay if i merge the the two data frames ‘train_submission’ and ‘user_data’?


testing data is showing error cannot convert strings to float ? I used lableencoders still not …what to do ?


Make sure you do not have any missing values in the dataset you are trying to label encode.


it’s normal that test data contain missing data after merging the differents files?


Can someone explain me what does last_online_time_seconds and registration_time_seconds means ? Its written that these are the Time the user was last online and Time user was registered.But some how it appears to be that the total time spent by the user online and not as a time stamp. Please help me understand this



Can anyone explain what the contribution field in user_data actually signifies ? It contains negative entries too



As i understood, it is given in Unix timestamp format,

For example, for userID ‘user_3311’

last_online_time_seconds = 1504111645 and registration_time_seconds = 1466686436
If you apply Unixtimestamp conversion as below you will get,
last_online_time_seconds = 2017-08-30 22:17:25
registration_time_seconds = 2016-06-23 18:23:56

Below condition is always true for the user_data,
last_online_time_seconds > registration_time_seconds

Hope this helps.


Hi All,

A very late question given the start date of the competition. However, if anyone of you can please help me in understanding the difference between “multiple attempts” (response variable) and “multiple submissions” (also one of the variable in the user_data dataset). How is attempts at the first place captured? say, Is attempts based on login?



Did u figure this out? I too dont understand the same


Assumed it is probably a metric to measure how much each user contributed to the problem understanding/solving while participating in discussion forums.