Pandas left merge and sqldf left join giving different results

sqldf
pandas
dataframe

#1

I’ve two dataframes with same shape. I’m doing left merge operation with pandas on both dataframes using

oppmaster_india1 = pd.merge(opp_master_india, email_ind[['Customer ID', 'Email']], on='Customer ID', how='left')

& left join using

oppmaster_india3 = sqldf("select [opp_master_india].*, [email_ind].[Email] as 'Email' from opp_master_india left join email_ind on [opp_master_india].[Customer ID] = [email_ind].[Customer ID]")

but the shape of resultant dataframes in both operations is coming different. Any idea, why?

In[247]: oppmaster_india3.shape

Out[247]: (3772516, 14)

In[248]: oppmaster_india1.shape

Out[248]: (3772644, 14)


#2

Hi Piyush,

I think it is because of Null values in email_ind for customer id.


#3

Hi Karthik, will you please be more specific? AFAIK, both left merge and left join will contain these null values until & unless we are using distinct or drop_duplicates