Anyone has any document/vedio/knowledge around the steps the data analyst need to perform to create a model to identify potential fraudulent transactions for a bank transaction file.Say the input dataset has only 2% fraud transactions how we can use oversampling and what steps should we follow to complete the model
For dealing with unbalanced classes, you can take help of the following resources:
Hope this helps.
Because there is a small percentage of fraudulent instances, a model that predicts instances as non-fraud will achieve 99% accuracy. But such a model will not help in finding fraudulent cases.
You could use one-class classification method . It tries to identify examples of a specific class, by learning from a training set which contains only the majority examples. The model learns the normal pattern, which are non-fraud cases. It gives highest error when it comes across fraud cases if it is given as test set.