I have around 100 GB of log data in CSV format and I wish to do Exploratory analysis on this data. As pandas loads data in memory, I am looking for possible alternatives. I have tried Graph Lab’s Sframe on my 8GB RAM machine, but it takes too much time to process a subset of data. Another alternative is using Spark Data frame or a MPP database ?
Can you please suggest best approach for handling the above amount data? Also as the data set is large, to visualize the data, what viz libraries can be used ?