I have an XGBoost model. It runs well enough with decent cv scores. It is a dataset of around 500,000 with 30 features of categorical, continuous and binary. My test set is 250,000. There are absolutely no negative values in the target variable, however, on prediction with the test set I get around 4,000 instances where a negative value is predicted. There are also 7,000 or so predicted on the training set. Why is this?
Should this happen?
Are there parameters that can be tuned to prevent this?