Why does feature created in a temp dataframe get created in original dataframe as well

pandas
dataframe

#1

Hello,

I am using the following technique to add a new Feature/ Field.

for temp_df in full_data_df:
    temp_df["Family"] = temp_df["Sibsp"]+temp_df["Parch"]+1

full_data_df- exist already.

My questions:

  1. why does Family feature/field gets added to full_data_df as well since I am only adding it to temp_df

  2. how to check if a dataframe is a reference to the original object

  3. in the for loop does dataframe (full_data_df) pass the data in temp_df one series at a time

thanks


#2

Hi @mohitlearns,

temp_df is not a different dataframe, it is a variable. For example, if I write,

   for i in range(0,10)

this means, for every i in the range 0 to 10, it will perform the required condition. So i takes the value 0,1,2,3,… . You will have to change the code accordingly.

Could you make it more clear? What do you mean by checking the datafrane as a reference to the original object?

What have you defined as temp_df? Is it an empty dataframe?


#3

Thanks for your reply!

Just a quick clarification required.

 train=pd.read_csv("train.csv",header=0,dtype={"PassengerId":np.int64,"Age":np.float64})

 test=pd.read_csv("test.csv",header=0,dtype={"PassengerId":np.int64,"Age":np.float64})  

 full_data=[train,test]

for dataset in full_data:
dataset[“Family_Size”]=dataset[“SibSp”]+dataset[“Parch”]+1

now if i perform the following…

type(dataset)

…the response i get from python is :

 pandas.core.frame.DataFrame

can you throw some light as to how its a variable and not a dataframe

secondly, I am still not clear how adding a feature to “dataset” creates a feature in “fulldata”


#4

Hi @mohitlearns,

This gives a list. I suppose you want full_data to be a dataframe. If yes, use the below line of code :

   full_data=pd.concat([train,test],ignore_index=True)

Another question, why are you doing this? Do you want to create another dataset and add a column to it?

You have written for dataset in full_data . It is reading every row in full_data and performing the operation you assigned. As the example mentioned previously :

for i in range(0,5):
     >>operation assigned 

it will take i =0, i=1, i=2 and so on, in simpler words, replaces i with each value in the range. Here it is taking dataset in full_data, and working on full_data.


#5

You must use .copy()