Abnormal score(1.0) using random forest and SVM



I am new to analytics and i have been doing the loan prediction III. I am using the below features and modeled using random forest and SVM. But the score which i am getting is 1.0. So please help me out what and where am i missing?

Gender, Married, Dependents, Education ,Self_Employed ,ApplicantIncome ,CoapplicantIncome ,LoanAmount, Loan_Amount_Term ,Credit_History ,Property_Area, TotalIncome ,EMI ,ratio

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(x_train, y_train)
y_pred = random_forest.predict(x_test)
random_forest.score(x_train, y_train)



There may be two issues which I think could be causing the problem:

  • Training on the target variable, this maybe the case when you mapping from output to output. To check this out, make sure the columns which you are training on is exclusive of the target variable.
  • You are trying to validate your model on training set itself. I would suggest you to make a validation set which is completely different from the training set. Refer here

If the above things do not help, could you post the code here so that others can help you?



df=pd.read_csv(’/resources/data/Loan prediction/train.csv’)
df_test=pd.read_csv(’/resources/data/Loan prediction/test.csv’)

x_train= df.drop([‘Loan_Status’,‘Loan_ID’,‘LoanAmount_log’,‘TotalIncome_log’], axis=1)

as you can see i am training the train(df) dataframe and testing (df_test) frame which are exclusive with each other. on the other hand, when i apply inverse_transform(ypred), i get the results:

array([‘Semiurban’, ‘Semiurban’, ‘Semiurban’, ‘Semiurban’, ‘Semiurban’,
‘Semiurban’, ‘Rural’, ‘Semiurban’, ‘Semiurban’, ‘Semiurban’,
‘Rural’, ‘Rural’, ‘Semiurban’, ‘Semiurban’, ‘Semiurban’,…)
where i am expecting the Loan_status instead of property_area.