Credit Rating Prediction Model

credit_rating

#1

Aim: To predict companies Credit Rating

Training Data: Internal data with financial numbers and financial ratios spanning across 3 years
Training Data Target Variable: Credit Rating with 20 discrete values
Training Data Remarks: Existence of missing data

Scoring Data: External data from various data sources with financial numbers and financial ratios spanning across 3 years
Scoring Data Remarks: Existence of missing data is higher than Training Data depending on data source

Which method should i use to predict credit rating? Logistics Regression comes to my mind first. However, there are missing values in both training and scoring data. A lot of imputation needs to be done and the model may not be accurate. I can accept predicting Credit Rating into 3 groups:

Group 1: A to F
Group 2: G to L
Group 3: M to V

rather than predicting the 20 discrete value from A to V.

I think the accuracy of a 20 discrete value model will be challenging.

Can anyone advise me?


#2

Hi @johnalytics,

Would be glad to help you. At first, I would advise you to understand the data and its source. For example, few missing values can directly be inferred as 0. Try to see that if mean imputation works. Additionally, I would recommend you to explore the GBM tree models as they are very good at handling classification tasks out of the box.

 I can accept predicting Credit Rating into 3 groups:

Group 1: A to F
Group 2: G to L
Group 3: M to V

I love the idea of segregating in groups. That would be really a way to go solution to this problem. On the other hand, I would caution you to check counts of these groups too. Let’s assume that if A - F occur 90% of cases and G - L 8 % and only 2% fell under M - V. You will be observing that model will be biased towards Group 1.
Hope this answered your query.


#3

Hi Shaz13,

Thanks for the quick reply.

I will do a count check of the groups you mentioned and will leave the missing values as they are. Can GBM handle missing values well?


#4

Hi @johnalytics,

XGBoost can handle missing values but you’ll have to impute missing values while using GBM.