Dealing with special values in prediction problem


#1

Suppose there is a ratio variable presents (say payment-balance ratio in typical credit risk scorecard problem). This variable can take numeric values, also there will be some cases for which balance will be equal to 0. Now for such cases the model developer must assign some special value (say -12345678) to this variable. Therefore now the PB variable takes values such as 1.25,89.95,105.56 along with -12345678 . For decision tree (CHAID) based problem model developer can make a separate node for this special value, but I wonder how can we treat this in other algorithms (Logistic or any Machine Learning techniques).

I have used weight of evidence technique (using binning approach) which basically turns the raw values into a categorical value (by checking the similarity in outcome rate). But what if I want to use the raw values instead? How to use dummy variables in this case? Does anybody have any clear cut idea about the problem? Thanks in advance.


#2

In R by simply declaring the variable as Factor wont it take care of creating the dummies while doing logistic. It will treat the variable as factors and each value as levels.