What do you meant by mis-matched factor levels?


#1

Hi, I have some doubt in the blog, that you wrote for R programming

. Before I jump on to the doubt directly, I would like to thank you for creating such a useful and comprehensive article on R for people who wants to learn R.
Coming to my doubts. They are as follows:

  1. What is the difference between Categorical and Continous variable ?
  2. What is the difference between Categorical and Factor variable?
  3. Can you please elaborate more on the basics of these classification of the variable?
  4. At one place, you have mentioned as :
    > summary(train)

Here are some quick inferences drawn from variables in train data set:

Item_Fat_Content has mis-matched factor levels.

What did you meant by “Item_Fat_Content has mis-matched factor levels”. Please explain.

Thanks,
Anindo


#2

Imagine categorical variables as playing the piano and continuous as playing the violin.
Categorical have distinct levels and may/may not have natural ordering (see nominal and ordinal variables for more info).

In context of categorical variables, factors are the same as categorical variable. Example: consider variable: ‘heat_level’. it contains the following distinct values: High, Medium, Low. So there are 3 levels in the variable and they have a natural order where High > Medium > Low.

As far as mis-matched factor level goes, still trying to figure it out. That’s what got me here.


#3

unique(train$Item_Fat_Content)

[1] Low Fat Regular low fat LF reg
Levels: LF low fat Low Fat reg Regular

I figure this is what it means - even though there exist 5 levels in the variable, there are actually only 2 levels (Low Fat, Regular)