How to handle an unbalanced dataset




I have a dataset in which the response variable(Loan Default) is highly skewed-(2600 Yes,260 No).
I have tried several methods like randomforest,decision tree and logistic regression but the results are not encouraging.I read somewhere about oversampling in which I can have a dataset with 260 No’s,a dataset created by selecting around 260 records from the part of the data having Default = Yes,then combine them and then apply logistic regression or decision tree.The results are marginally better(auc = 0.56) than when I use the real dataset(auc = 0.5) but is this the right way to go about this problem.
Can someone please help me in dealing with this problem??



you can do oversampling with closest neighbours and bootstrap, you can refer to Synthetic Minority Over-sampling Technique. The SMOTE function is available in R if you use this language for data mining package .
Be careful not to over sample for example wanted 2600 Yes and same amount of NO, more that double could led to issues. I do not know your data, I shall start with 520 No and 520 Yes and then adjust after the first trial.

Hope this help.