How to proceed with analysis from a single data file (192 variables/columns) and without proper information about the variables?


#1

How to proceed with analysis from a single data file (192 variables/columns) and without proper information about the variables ?


#2

have you tried the GUI rattle. Try that please

http://rattle.togaware.com

Also try using summary, describe, summarize commands and boxplot, plot and histograms for analysis.

There is no shortcut to doing analysis- it will depend on the data itself how to proceed. Which variable to ignore, which to split into new variables, which to keep as its- all depends on data.

For getting a more coherent answer, always share a few details about the data- size, number of rows, what it contains rather than number of columns alone


#3

@whystatistics

The only time I have come across this situation is in competitive modeling, which is usually not what happens in real time. Usually, I advice to understand the domain and the data fully before doing any modeling.

If you are in competitive modeling or have got a lot of masked data, then the only alternate is to explore them, come up with hypothesis, Quickly categorize which variables are significant through crude models and then create a refined model.

Hope this helps,
Kunal


#4

Hi Ajay, the data sizes 1 TB. It has 192 columns and 200 million records (rows). It contains customer base information, transactions related information and billing information.