Over/Undersampling for fraud analytics


Anyone has any document/vedio/knowledge around the steps the data analyst need to perform to create a model to identify potential fraudulent transactions for a bank transaction file.Say the input dataset has only 2% fraud transactions how we can use oversampling and what steps should we follow to complete the model


Hey @arnabitsme

For dealing with unbalanced classes, you can take help of the following resources:

Hope this helps.


Because there is a small percentage of fraudulent instances, a model that predicts instances as non-fraud will achieve 99% accuracy. But such a model will not help in finding fraudulent cases.

You could use one-class classification method . It tries to identify examples of a specific class, by learning from a training set which contains only the majority examples. The model learns the normal pattern, which are non-fraud cases. It gives highest error when it comes across fraud cases if it is given as test set.