LabelEncoder() - How to reverse it?


#1

Hi all,

I am currently practicing the practice problem in the Hackathon, Experiments with data. I am little confused in the last part.

le = LabelEncoder()
for var in categorical_variables:
      train[var] = le.fit_transform(train[var])
for var in categorical_variables[:len(categorical_variables)-1]:
      test[var] = le.fit_transform(test[var])

here we have converted the categorical variables to numeric codes. this is fine.

Now in the Test data Frame i am creating a new column “Income.Group” and assigning the predicted values to that column.

model = DecisionTreeClassifier(max_depth = 10,min_samples_leaf = 100, max_features = 'sqrt')
model.fit(train[independent_variable],train[dependent_variable])
predictions_train = model.predict(train[independent_variable])
predictions_test = model.predict(test[independent_variable])
test['Income.Group'] = predictions_test 
test.to_csv('D:/AnalyticsVidya/Workshop/ttt.csv')

Now when i open the output csv file, it is showing the values in the numeric format (which is obvious as we converted the dataframe with LabelEncoder).

But, i i want to reconvert the categorical variables from Numeric back into the categories, how to do that. Basically, how to reverse the process done by the function LabelEncoder() ?


#2

Hi @rajiv2806,

To reverse the process of LabelEncoder, it has a function provided specifically for the task called inverse_transform.

The code would look as follows:

le = LabelEncoder()
for var in categorical_variables:
      train[var] = le.fit_transform(train[var])

###
### after model building and testing step
###

predictions_test = le.inverse_transform(prediction_test)

#3

Thank u bro.

I now i am not clear with the “DecisionTreeClassifier” algorithm.

Below code is for the train data set:

model = DecisionTreeClassifier(max_depth = 10,min_samples_leaf = 100, max_features = 'sqrt')
model.fit(train[independent_variable],train[dependent_variable])
predictions_train = model.predict(train[independent_variable])
print le.inverse_transform(predictions_train)[:100]

The o/p is as expected. Like:

[’<=50K’ ‘>50K’ ‘<=50K’ ‘<=50K’ ‘>50K’ ‘>50K’ ‘<=50K’ ‘<=50K’ ‘<=50K’
’>50K’]

But when i do the same for the Test dataset

predictions_test = model.predict(test[independent_variable])
print le.inverse_transform(predictions_test)[:10]

The o/p is like:

[‘Others’ ‘Others’ ‘Others’ ‘Others’ ‘Others’ ‘Others’ ‘Others’ ‘Others’
‘Others’ ‘Others’]

I suppose, i am doing some mistake here. can u look into it and suggest edits.