This data set has been used in the article: http://www.analyticsvidhya.com/blog/2016/05/h2o-data-table-build-models-large-data-sets for practice purpose.
This article demonstrates the use of data.table and H2O to build models on large data sets. There package work efficiently and help a user overcome the petty machine memory issues. A lot has already been said in the article.
You can download the data set and get started practicing with me. To download the data, one time login is required.
Below is the complete problem statement and data used in the article:
A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month.
The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month.
Now, they want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products.
Your model performance will be evaluated on the basis of your prediction of the purchase amount for the test data (test.csv), which contains similar data-points as train except for their purchase amount. Your submission needs to be in the format as shown in “SampleSubmission.csv”.
Submissions are scored on the root mean squared error (RMSE). RMSE is very common and is a suitable general-purpose error metric.
Note: This thread will expire on 15th May 2016.
Edit: This thread has expired now. Data is no longer available for download.
Edit: Data set is available for download again.