I am currently studying about the ways by which I can fill the missing value in the variable while studying it I came to know that it depends upon the type of the variable .For the nominal variable we can use mode and for the quantitative variable, we can use median or mode.But while studying it, I came across a question is median can be used for filling missing in an ordinal variable.
Lets take an example to try to answer this:
Suppose you have a dataframe of 100 users who have rated a movie as:
Very Bad(1),Bad(2),ok(3),good(4),very good(5).
If we want to replace missing values by median here it would simply mean that we are assigning the ok rating to all such records which may not be a good idea.Also median is used when the data is numeric and ordinal data is not numeric.So I don’t think we can use median here.
Also say there are only 4 options:1,2,3,4:We cannot use median here.
So for ordinal variables we can use mode or the rating which most of the people have given.
Hope this helps!!
Median is generally used for continuous variables. For discrete data a better approach will be to use mode.
Hope this helps!!
You have one interesting question, all the answers mentioned before answer your question. The only point here is related to usage of one statistic point in case of missing value. If the number of missing values is small (relative to your sample) it could work, let say the distribution of your variable will not change (too much), but that is not always the case.
In many cases you will create a pick around the statistics value you have taken. In case of ordinal or categorical there are few methods to avoid this, but if you have time and interest you can look at using a multinomial method to do the imputation.
Hope this could help to understand the limits of using mode, mean for imputation
Have a good day