Python error: "cannot reindex from a duplicate axis"

pandas
dataframe
python

#1

Hello,

I’m facing a problem in python wherein I’m getting the error “cannot reindex from a duplicate axis”.

I have 2 data sets which I have concatenated into 1 using:
data = pd.concat([data_train,data_test])

Now I’m trying to access specific part of the combined dataframe using:
data.loc[i,"Col 1"] = x

This statement results in error. However I don’t get an error when I use:
data[i,"Col 1"] = x

The python documentation (http://pandas.pydata.org/pandas-docs/stable/indexing.html) suggests that the second option is not recommended and may not yield correct result.

Please help me figure out where I’m going wrong.

Thanks,
Aarshay

`


#2

@Aarshay -

For DataFrame label-indexing on the rows, I the special indexing field ix. It enables to select a subset of the rows and columns from a DataFrame with NumPy like notation plus axis labels.

You can use ix field to access any element of a data frame.

data.ix['i', 'Col 1']

Hope this helps!

Regards,
Hinduja


#3

@hinduja1234 -

Thanks for your kind response.

This is definitely another way of indexing. But I think the error was somewhere else. I figured that while concatenating the data frames, the indices of each dataframe were getting combined which resulted in multiple entries having the same index. This is why the error: “cannot reindex from a duplicate axis”

I found a simple fix:
data = pd.concat([data_train,data_test], ignore_index=True)

This ignores the index of the original dataframes and creates new indices.

I would love to discuss your thoughts on this.

Cheers,
Aarshay