Resources for data preparation requested


When discussing analytics projects, the typical statement is that the data preparation phase is 80 to 90% of the effort. At the same time, data preparation seems to be the area that is “glossed over” in terms of how to perform this step. I realize that the data preparation process is obviously project specific but I have to believe that there are some general resources on this topic that provide examples and starting recommendations on the data cleaning process.

As I work on our university’s analytics curriculum, I am therefore looking for any material (web sites, books,
software, etc.) that I can use as resources as well as provide to my students for teaching both undergraduate and graduate classes. In addition to these type of resources, any recommendations on where else I can post this question is also appreciated.


Hi @jflatto

A good reference could “Exploratory Data Mining and Data Cleaning” By t. Dasu and T Johnson Wiley.
Chapter 4 about Data Quality is really worth reading, to my knowledge this is the only book, which gives one overview of the problems of data cleaning.

Hope this help




You can refer to some of the articles written on Analytics VIdhya:

There would be other articles as well on the topic.

Hope this helps.