Reading a wide dataset in R


#1

I originally had a wide dataset of about 18000 columns(and about 80 rows) that I am trying to read in R. It was stored in an Excel sheet,which unfortunately has a limit of only 16384 columns. Hence,whenever I execute : > dim(train_set) , I get:
[1] 83 16384

i.e 1000+ columns are getting eaten up ,and this would badly affect the accuracy of the predictions.How can I read all the columns in R?
Your suggestions are much appreciated. Thanks so much!


#2

@shaw38
Hi there,

I understand the problem you are facing . I have been through this almost.

So here is what I want you to do. Since Excel has a limitation of 16384 columns by default. So you save the first 16384 columns in “sheet 1” of your workbook and the remaining 1616 columns in “sheet 2” of the same workbook.

Now lets say your workbook name is “Shaw” and sheets are unnamed.

`library(openxlsx)
data_frame_1<-read.xlsx("…/Shaw.xlsx",sheet=1)
data_frame_2<-read.xlsx("…/Shaw.xlsx",sheet=2)

final_data_frame<-cbind(data_frame_1,data_frame_2)

Voila!!

Note: I have used openxlsx package instead of the most common xlsx because openxlsx deals with workbooks of huge size very efficiently in comparison to xlsx.

Hope this helps !!

Neeraj`


#3

If your data is stored in a csv file with all the columns, it won’t be a problem in R.