Hi @B.Rabbit,
I recently took part in a competition in which I had to solve a similar problem
The approach was:
- Find out all the unique elements of dataframes
- Create arrays a1, a2, a3 containing all unique elements from these dataframes
- Loop through the arrays hierarchically, find possible combination and put it in a new array
- Convert new array to dataframe
Heres the code (written in python):
(About the code below)
// dataframes
- df1 = sub
- df2 = revenue
- df3 = profile
- df4 = projected
// Values to find combinations of
- Hospital_ID
- District_ID
- Instrument_ID
resulting_list = sub.Hospital_ID.unique().tolist()
resulting_list.extend(x for x in revenue.Hospital_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in profile.Hospital_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in projected.Hospital_ID.unique().tolist() if x not in resulting_list)
all_h_ids = resulting_list
resulting_list = sub.District_ID.unique().tolist()
resulting_list.extend(x for x in revenue.District_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in profile.District_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in projected.District_ID.unique().tolist() if x not in resulting_list)
all_d_ids = resulting_list
resulting_list = sub.Instrument_ID.unique().tolist()
resulting_list.extend(x for x in revenue.Instrument_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in projected.Instrument_ID.unique().tolist() if x not in resulting_list)
all_i_ids = resulting_list
df = []
for i in all_h_ids:
for j in all_d_ids:
for k in all_i_ids:
df.append([i, j, k])
df = pandas.DataFrame(df, columns=['Hospital_ID', 'District_ID', 'Instrument_ID'])