Logistic regression over sampling



In case of a rare event i.e when we have probability of an event < 2% , we need to oversample the data.
which means keeping the rare events as it is and reducing the non-events, which does not affects the model.
I understood this theoretically but how to practically implement it in sas?



You just need to split the dataset in 2 parts, take sample from one of them and then append the 2 again.

Once you have done so, you can go ahead and apply logistic regression.

This article can help you provide more details: http://www.analyticsvidhya.com/blog/2014/01/logistic-regression-rare-event/

Hope this helps