Create features in python

Hi Team,

i have a column called field which stores the below data

{‘tag’: ‘050’, ‘ind1’: ‘0’, ‘ind2’: ‘0’, ‘subfields’: [{‘code’: ‘a’, ‘data’: ‘F74.T2’}, {‘code’: ‘b’, ‘data’: ‘A18’}]},

how can i get the above data as structured with code as seperate column and data as values

Is this pandas.DataFrame ? Create minimal working DataFrame so we could simply copy it. Show expected result for your example - as DataFrame or table. It will need to repeate tag, ind1, ind2 for every code and data. In DataFrame you could try to use .explode() or .apply()


If you want to convert to

{'tag': '050', 'ind1': '0', 'ind2': '0', 'a': 'F74.T2', 'b': 'A18'}

then you can use apply() with function

def convert(cell):
    for item in cell['subfields']:
        cell[ item['code'] ] = item['data']
    del cell['subfields']  # remove it 

Minimal working example

import pandas as pd

data = {
    'X': [
        {'tag': '050', 'ind1': '0', 'ind2': '0', 'subfields': [{'code': 'a', 'data': 'F74.T2'}, {'code': 'b', 'data': 'A18'}]},
        {'tag': '050', 'ind1': '0', 'ind2': '0', 'subfields': [{'code': 'a', 'data': 'F74.T3'}, {'code': 'b', 'data': 'B18'}]},
        {'tag': '050', 'ind1': '0', 'ind2': '0', 'subfields': [{'code': 'a', 'data': 'F74.T4'}, {'code': 'b', 'data': 'C18'}]},
    ], 
    'Y': ['D','E','F'], 
    'Z': ['G','H','I']
}

df = pd.DataFrame(data)

def convert(cell):
    for item in cell['subfields']:
        cell[ item['code'] ] = item['data']
    del cell['subfields']
    
df['X'].apply(convert)

print(df['X'][0])
print(df['X'][1])
print(df['X'][2])

Result:

{'tag': '050', 'ind1': '0', 'ind2': '0', 'a': 'F74.T2', 'b': 'A18'}
{'tag': '050', 'ind1': '0', 'ind2': '0', 'a': 'F74.T3', 'b': 'B18'}
{'tag': '050', 'ind1': '0', 'ind2': '0', 'a': 'F74.T4', 'b': 'C18'}

If you want to create columns a, b

        a    b
0  F74.T2  A18
1  F74.T3  B18
2  F74.T4  C18

then you can use apply() with

def convert(row):
    result = dict()
    for item in row['subfields']:
        result[ item['code'] ] = item['data']
    del row['subfields']
    return result

to create new column subfields

                     subfields
0  {'a': 'F74.T2', 'b': 'A18'}
1  {'a': 'F74.T3', 'b': 'B18'}
2  {'a': 'F74.T4', 'b': 'C18'}

Next you can use apply() with pd.Series to create new DataFrame

        a    b
0  F74.T2  A18
1  F74.T3  B18
2  F74.T4  C18

and finally you can use .join() to add these columns to original DataFrame

Minimal working example

import pandas as pd

data = {
    'X': [
        {'tag': '050', 'ind1': '0', 'ind2': '0', 'subfields': [{'code': 'a', 'data': 'F74.T2'}, {'code': 'b', 'data': 'A18'}]},
        {'tag': '050', 'ind1': '0', 'ind2': '0', 'subfields': [{'code': 'a', 'data': 'F74.T3'}, {'code': 'b', 'data': 'B18'}]},
        {'tag': '050', 'ind1': '0', 'ind2': '0', 'subfields': [{'code': 'a', 'data': 'F74.T4'}, {'code': 'b', 'data': 'C18'}]},
    ], 
    'Y': ['D','E','F'], 
    'Z': ['G','H','I']
}

df = pd.DataFrame(data)

def convert(row):
    result = dict()
    for item in row['subfields']:
        result[ item['code'] ] = item['data']
    del row['subfields']
    return result
    
df['subfields'] = df['X'].apply(convert)
print(df[['subfields']])

new_columns = df['subfields'].apply(pd.Series)
print(new_columns)

df = df.join(new_columns)
print(df)

Result:

                                          X  Y  Z                    subfields       a    b
0  {'tag': '050', 'ind1': '0', 'ind2': '0'}  D  G  {'a': 'F74.T2', 'b': 'A18'}  F74.T2  A18
1  {'tag': '050', 'ind1': '0', 'ind2': '0'}  E  H  {'a': 'F74.T3', 'b': 'B18'}  F74.T3  B18
2  {'tag': '050', 'ind1': '0', 'ind2': '0'}  F  I  {'a': 'F74.T4', 'b': 'C18'}  F74.T4  C18
1 Like

Thank you!

© Copyright 2013-2021 Analytics Vidhya