How to merge datasets on nearest values in R

r
data_wrangling
merge

#1

I have 2 datasets, which I want to join on nearest values. The algorithm should first search exact number (floating point) and if it is unable to find, it should merge the data on the nearest value.

For example:

Dataset 1:

Column_A       Column_B
1                     2.23
2                     9.12
3                     1.21

Dataset 2:

Column_X      Column_B
Asia               2.242
Africa             1.209
Australia          9.051

If I join the 2, I need the following outcome:

Column_A           Column_B           Column_X
1                     2.23            Asia
2                     9.12            Australia
3                     1.21            Africa

Please help


#2

you can use multiple round functions to come closer to value. then use multiple merge functions to merge the values

round(2.2344,3)
[1] 2.234
round(2.2344,2)
[1] 2.23
round(2.2344,1)
[1] 2.2


#3

however data.table has a beautiful way of doing this

setkey(x, z)
setkey(y, zP)
y[x, roll=“nearest”]


#4

Try:

ans <- vapply(df1$Column_B, function(x) x-df2$Column_B, numeric(3))
indx <- apply(abs(ans), 2, which.min)
cbind(df1, df2[indx, ][-2])
  Column_A Column_B  Column_X
1        1     2.23      Asia
3        2     9.12 Australia
2        3     1.21    Africa

We first find the difference between each value in data frame 1 to each value in data frame 2. This gives us a 3 x 3 matrix of which each column represents one value from df1. The second line searches for the minimum absolute value distance between data frames. Finally, we merge the two objects based on the index of closest values.