Better Way of creating a dataframe in Pandas

pandas
dataframe
python

#1

Hi,

I’ve to create a dataframe in pandas and I have the following options:

Option 1:
Initialize a dummy data frame using:
df = pd.DataFrame(index=range(4),columns=['A','B','C','D'])

Then fill in each value using dummly.iloc[…,…]

Option 2:
Create a dictionary containing the required columns and rows. Then create a dataframe using:
df = pd.DataFrame(dict)

In option 1, no additional space is required for dictionary creation and converting dictionary to dataframe will involve additional computation.
In option 2, I think indexing a dictionary is faster and adding values 1 by 1 should take less time.

Which option is computationally better in case of:
(a) Fixed number of records (will allow initialization of dataframe indices)
(b) Dynamic number of records (a new row has to be added to dataframe every time)

I think option 1 for scenario (a) and option 2 for scenario (b). But I’m not sure if my reasoning is justified. Please share some thoughts.

Thanks,
Aarshay


#3

Hi @Aarshay,

I generally do not take option 1, because time complexity is more with respect to option 2. The overhead of creating a dictionary will take so much less time in comparison to directly working on dataframes.

I would consider option 1 only in the scenario where space complexity is more important than time complexity (eg. Big Data)