Data Quality Critical Data elements identification


Hi All

As we are all aware that there are stringent regulatory compliance which banks, pharma customers have to go through. Many of these compliances are mandating the customers to relook at their overall data quality and data governance process. While we have built a robust framework to measure and monitor the data quality and cost of quality by each critical data element mapped to source applications which are part of certain business processes, the key question which comes up is is there a way i can auto detect the critical data elements. The reason being if you look at the massive source system landscape in any bank or large enterprise, the number is really huge.

The question now is, has anyone explored applying any of the techniques or ways of identifying auch critical data elements ? Would feature selection be even applicable in these case ? Has anyone looked at data quality challenge and applied machine learning to solve it ?