OCR output Vs Ground Truth Population and Error Identification



Hello Every one!!

I have came across the problem like i would like to predict the error behaviour based on the OCR output and ground truth by build a deep learning model.
The model should learn from the given historical data and should be useful to detect the errors present in the sentence. It would be nice to see, if the system does suggest the valid corrections also.

In order to achieve this, i am going to first create the population which consist of system prediction and ground truth and then create a model to understand the error behaviour.
Core objective of this model is:

  1. Error detection
  2. Error corrections(optional)

Can any one suggest which algorithm is the good choice for this problem and how can i get the maximum error detection rate. Any suggestions would be greatly appreciated!

Thank you