Is concordance the best way to predict Logistic regression Model reliability?

concordance
logistic
statistics

#1

Hi,

I have created logistic model and we know that strength of logistic model is highly dependent on number of concordance. Is there any chance that the model has higher concordance may has less prediction power compare to model has lower concordance?

Thx,
Imran


#2

Imran,

Concordance is just one of the measures to see the goodness of Logistic Regression. It alone can not tell you much about how good the model is. You need to use it along with other measure to see how good a model is.

Let us understand what is Concordance and how is it calculated?
Let us say you are predicting whether a customer will default on the loan provided to him / her. So, the outcome is probability of default. You build a Logistic Regression to predict the outcome and let us say here is the outcome for some hypothetical case

Customer_id         Default          probability
A                             1                     0.76
B                             0                     0.23
C                             1                     0.33
D                             0                     0.63

Here default was the actual outcome in past and probability is the outcome from your Logistic Regression. Next, you divide the population by default 0 & 1 forming 2 groups:

Group 1 (default = 1): A & C
Group 2 (default = 0): B & D

Next you create pairs by picking one data point from Group 1 and one data point from Group 2 and do a relative comparison between the data points. Here are the groups, you will create in this case:

Pair 1: A & B
Pair 2: A & D
Pair 3: C & B
Pair 4: C & D

Each group can have one of the characteristic:

  1. The probability of default of the data from group which has defaulted is higher than the other data point. This means that our model is rightly classifying higher risk population on a higher probability. This is an example of concordant pair. In this example the first 3 pairs are concordant.
  2. The probability of default of the data from group which has defaulted is lower than the other data point. This means that the model is wrongly classifying lower risk population on a higher probability. This is a discordant pair. 4th pair in this example is discordant.
  3. Third likelyhood is that you end up with 2 data points with equal probability - this is tied paar.

Now, there are various ways to measure quality of model:

  1. Somers D: This is (% concordant pair - % discordant pair). Higher Somers D indicates a better model
  2. In addition to Somers D, Gamma, tau-a and c, which use various combinations of these pairs.

So, it is quite possible that you have high concordance, high discordance in one model and low conconrdance, discordance, but significant tied pairs and you would end up making half baked decision, if you only look at concordance.

You can also look at some other metrics used for classification problems here: