Aim: To predict companies Credit Rating
Training Data: Internal data with financial numbers and financial ratios spanning across 3 years
Training Data Target Variable: Credit Rating with 20 discrete values
Training Data Remarks: Existence of missing data
Scoring Data: External data from various data sources with financial numbers and financial ratios spanning across 3 years
Scoring Data Remarks: Existence of missing data is higher than Training Data depending on data source
Which method should i use to predict credit rating? Logistics Regression comes to my mind first. However, there are missing values in both training and scoring data. A lot of imputation needs to be done and the model may not be accurate. I can accept predicting Credit Rating into 3 groups:
Group 1: A to F
Group 2: G to L
Group 3: M to V
rather than predicting the 20 discrete value from A to V.
I think the accuracy of a 20 discrete value model will be challenging.
Can anyone advise me?