How to join two data tables in R?

dataexploration
r
data.table
data_wrangling
merge

#1

Hi,

I have a huge data set which I imported using fread function of data.table. It has a ID column and has 125 million rows.
I have another data set which is like 1000 rows with the ID column only. I want to filter the former data set as per the IDs in the latter one. Can you suggest me some ways to achieve this?

using merge to inner join times out. I tried %n% to get the row numbers also fails. Any smarter way?

Thanks
Nitish


#2

Try data.table. However, if you are near the limit of RAM available, it might be better to do the merging outside of R, in a SQL environment for example.

Another alternative is to use something like SparkR.


#3

Hi Nitish,

While asking such question, please try to provide a reproducible example which can help AV community users to answer your questions quickly.

Well, in your case, you should join the two data sets using data.table way. It should work faster than native merge function.
Assuming,
DT1 has 125 million rows and ID column
DT2 has only ID column

setkey(DT1, ID)
setkey(DT2, ID)
DT1[DT2, nomatch=NA] #give NA for IDs not found

Hope this helps!

Regards
Manish


#4

@nitish_dydx

I hope Below Three Links will solve your problem
Link1
Link2
Link3