How to merge the data files in r or sas?


#1

hi guys
i need a help in this date with your date hackathon regarding how would we merge information of student and internship in train/test data set using primary key in R or SAS language?

P.S.- i tried merging but there are replicates of id’s with different location so that is why such a question.


#2

Hi @shyam

use the package deployer and function inner_join(, by=“Internship_ID”) for example.

Alain


#3

You can easily merge the Internship data with train/test data using “left_join” function of “dplyr” package.
Eg: left_join(train, internship, by = "Internship_ID")

However, this cannot be directly done with Student data since there StudentId has duplicate entries.
One option is to remove duplicate entries and use the above method. Another option would be transform all data into 1 single row for each Student ID.

Hope this helps!


#4

If you know SQL then the best way would be to use the sqdf function

#SQLDF
mergedData <- sqldf("select * from train inner join internship on train.Internship_ID=internship.Internship_ID")

If you are not familiar with SQL then you can use the merge command or the dplyr command as mentioned by @sonny