Data Merging in Python

python

#1

Hey all,

I have a DataFrame like:

df = pd.DataFrame({'B': ['B2', 'B3', 'B4', 'B5'],
                              'D': ['D2', 'D3', 'D4', 'D5']}

And I want to create a ‘Key’ column

'key': ['B2D2','B2D3','B2D4'....'B3D2','B3D3'......'B5D5']

Please anyone tell me how to do in python

Thank You


#2

You have mentioned D4 in your output, Where is D4 in input? Can you post input and expected output correctly?


#3

Thank you, i have corrected the input and expected output.


#4

Achieving this output would require a Cartesian product between column B and D. In the below code I have created two separate data frames for demo and make it easy for understanding. You can optimize the solution.

import pandas as pd
import numpy as np
df = pd.DataFrame({'B': ['B2', 'B3', 'B4', 'B5'],'D': ['D2', 'D3', 'D4', 'D5']})

df1 = pd.DataFrame(df['B'])
df2 = pd.DataFrame(df['D'])
df1['key1'] = pd.Series(0, index=df1.index)
df2['key2'] = pd.Series(0, index=df2.index)

df3= pd.DataFrame(df1.set_index('key1').join(df2.set_index('key2')) ) 

df3['key'] = df3['B'] + df3['D']

df3

#5

Conveting each column into np arrays will let you concatenate the contents by adding them elementwise. Then you convert it to a pandas series and declare it as the new column key.

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': ['B2', 'B3', 'B4', 'B5'], 'D': ['D2', 'D3', 'D4', 'D5']})
df['key'] = pd.Series(np.array(df.B) + np.array(df.B), index = df.index)


#6

Hey Tito,
Yes, we can do . But it will not give the desire output. Converting into nparray and concatenate, it gives :

key(output) : B2D2, B3D3, B4D4, B5D5

And desire output what i need(as per @ mr. gksharma) is:

key(output) : B2D2, B2D3, B2D4, B2D5,B3D2,B3D3,B3D4…B5D5