I want to do predictive modelling on a kaggle data set having 29 million observations. When I try to apply KNN or Logistic Regression in python my screen freezes, after checking System Monitor, I found that RAM was full and swap space is also getting pretty much used up.
After searching a little bit online I have found out following solutions that are possible :
Upgrading my laptop with RAM and/or SSD(currently I have 8gb RAM and 1Tb HDD).
Parallel computing using PyCuda with Nvidia (I have Nvdia gt 740m).
Dividing data into small chunks then fitting the algorithm on all the individual chunks.
Using a new language or libraries (like Julia, PySpark, H2O etc.)
Please suggest me the best possible solution, If you have any other solution feel free to suggest.
Thanks in advance