I’m dealing with tabular datasets where it’s really hard to tell if the integer column is Numeric or Categorical. My main consideration is the accuracy of the model that I am building (no deep learning). Thus, I’m wondering if I can treat the integer column as both Numeric (use as it is) and Categorical (do one-hot encoding or use a decision tree with set-based split). i.e. give both representations of the column at the same time and let the model figure out the suitable features.
My question is: Are there any scenarios where doing this multiple representation approach makes sense or does not make sense? And if so, how does it relate to the model you are training and the bias-variance tradeoff? For instance, Logistic (high bias) vs Random Forest (high variance). Are there any established theories or best practices out there that show the advantage/disadvantage of doing this? I’m asking this question in the context of classification problems.