How do I create a data frame with all possible combinations of three other data frames of 1 column each in R?

sapply
r
loopfunction
for

#1

Hello,

I have three data frames each of one column and different number of rows(Say df1, df2, df3 are the three dataframes). I need to create a new data frame containing all possible combinations of df1, df2 and df3 as rows in R. How do I do it?

Regards


#2

You should use the expand.grid function
type ?expand.grid for more details
If you need only unique values, then use the unique function in combination with expand.grid


#3

Hi @B.Rabbit,

I recently took part in a competition in which I had to solve a similar problem

The approach was:

  • Find out all the unique elements of dataframes
  • Create arrays a1, a2, a3 containing all unique elements from these dataframes
  • Loop through the arrays hierarchically, find possible combination and put it in a new array
  • Convert new array to dataframe

Heres the code (written in python):

(About the code below)

// dataframes

  • df1 = sub
  • df2 = revenue
  • df3 = profile
  • df4 = projected

// Values to find combinations of

  • Hospital_ID
  • District_ID
  • Instrument_ID

resulting_list = sub.Hospital_ID.unique().tolist()
resulting_list.extend(x for x in revenue.Hospital_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in profile.Hospital_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in projected.Hospital_ID.unique().tolist() if x not in resulting_list)
all_h_ids = resulting_list

resulting_list = sub.District_ID.unique().tolist()
resulting_list.extend(x for x in revenue.District_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in profile.District_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in projected.District_ID.unique().tolist() if x not in resulting_list)
all_d_ids = resulting_list

resulting_list = sub.Instrument_ID.unique().tolist()
resulting_list.extend(x for x in revenue.Instrument_ID.unique().tolist() if x not in resulting_list)
resulting_list.extend(x for x in projected.Instrument_ID.unique().tolist() if x not in resulting_list)
all_i_ids = resulting_list

df = []
for i in all_h_ids:
    for j in all_d_ids:
        for k in all_i_ids:
            df.append([i, j, k])

df = pandas.DataFrame(df, columns=['Hospital_ID', 'District_ID', 'Instrument_ID'])