Data Merging in Python



Hey all,

I have a DataFrame like:

df = pd.DataFrame({'B': ['B2', 'B3', 'B4', 'B5'],
                              'D': ['D2', 'D3', 'D4', 'D5']}

And I want to create a ‘Key’ column

'key': ['B2D2','B2D3','B2D4'....'B3D2','B3D3'......'B5D5']

Please anyone tell me how to do in python

Thank You


You have mentioned D4 in your output, Where is D4 in input? Can you post input and expected output correctly?


Thank you, i have corrected the input and expected output.


Achieving this output would require a Cartesian product between column B and D. In the below code I have created two separate data frames for demo and make it easy for understanding. You can optimize the solution.

import pandas as pd
import numpy as np
df = pd.DataFrame({'B': ['B2', 'B3', 'B4', 'B5'],'D': ['D2', 'D3', 'D4', 'D5']})

df1 = pd.DataFrame(df['B'])
df2 = pd.DataFrame(df['D'])
df1['key1'] = pd.Series(0, index=df1.index)
df2['key2'] = pd.Series(0, index=df2.index)

df3= pd.DataFrame(df1.set_index('key1').join(df2.set_index('key2')) ) 

df3['key'] = df3['B'] + df3['D']



Conveting each column into np arrays will let you concatenate the contents by adding them elementwise. Then you convert it to a pandas series and declare it as the new column key.

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': ['B2', 'B3', 'B4', 'B5'], 'D': ['D2', 'D3', 'D4', 'D5']})
df['key'] = pd.Series(np.array(df.B) + np.array(df.B), index = df.index)


Hey Tito,
Yes, we can do . But it will not give the desire output. Converting into nparray and concatenate, it gives :

key(output) : B2D2, B3D3, B4D4, B5D5

And desire output what i need(as per @ mr. gksharma) is:

key(output) : B2D2, B2D3, B2D4, B2D5,B3D2,B3D3,B3D4…B5D5