Outer Join in Pandas

pandas
python
jupyter_notebook

#1

Hi,

I’ve two pandas dataframes and have to perform an outer join on these two dataframes but I am not getting desired results. So what I want is when I perform join, missing data in one dataframe should be filled by data in other dataframe if it exists and key matches. And if there are observations with new keys than they should be appended at the bottom of the output dataframe.

I’ve attached the screenshots here as to know how dataframes look like. I’m using below code for performing outer join but its not giving me desired result i.e. there is some information in second dataframe which is not getting populated in first dataframe (wherever it’s missing) after performing join.

ColNames = list(df_second.columns.difference(df_first.columns))
ColNames.append(‘EmpID’)
merged_df = pd.merge(df_first,df_second[ColNames],how=‘outer’,on=[‘EmpID’])

I’ve also attached the output dataframe screenshot.

I’ll really appreciate if you guys can help me here.

Regards,
Kailash Negi


#2

I have got an intuition about what you are asking but I am not 100% sure. Is it the column ‘Name’ the one you think should be more populated? Could you provide with an expected output so I can help you out?


#3

Hi @kailash_negi,
Try to use name also as joining column. Hope you will get the desired result.
Below is the sample code:-
pd.merge(df_first, df_second, how=‘outer’, on=[‘Emp_ID’,‘Name’])

You are getting missing values for name because you are considering names only from first data frame.
Also you not need to remove any column from second data frame as you are doing in your code.

Thanks,
Gaurav