While joining two data.tables in R, do we need to specify key in both or not?




While following this tutorial on data.table


In the joining data.tables section they directly join X to another data.table DT and say that as X does not have a key, so its first column is used as its key.

but when I use the code,

X = data.table(c(“b”,“c”),foo=c(4,2))
V1 foo
1: b 4
2: c 2
DT[i = X]

it gives me an error->

Error in [.data.table(DT, i = X) :
When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,…) first, see ?setkey.

So do we need to specify keys in each data.table before joining them or not?


Hello @Aditya_Sharma,

It is always a good practice to specify the keys by which to join tables(whether in R,SQL etc.).
And this error will be there if you do not set the key in case of data tables.But if the keys are set then the error will not be thrown.
So as you can see you have to specify keys twice if you do not use the by statement while joining.
Hope this helps!! :grinning:

dt1 <- data.table(x = 1:3,w = c("x","y","z"))
merge(dt,dt1,by = "x")