How can get the actionable insights faster from data set?

data_preparation
cleaning
data_science

#1

Hi friends,

While working on any data science problem, bulk of your time spend on data cleaning and exploration tasks. It takes almost 60 to 70 percent time to prepare data for analysis which includes extraction, cleaning and manipulating data. After these stages, you have less time to generate meaningful insights and develop more powerful model.

Do you suggest any methods that can help to reduce the data cleaning or preparation time? It will definitely help me to improve myself as a data scientist.

Mark


#2

Hi Mark,

What I can suggest you to write some basic codes for EDA so that you don’t have to break a sweat for reinventing the wheel. Also you can try this Shiny Radiant -
http://vnijs.github.io/blog/2015/05/introducing-radiant.html

Using Radiant to do basic EDA and data slicing and dicing without writing a single code can reduce your time invested over data munging and leading you to focus more on modelling aspect.

Hope this helps.

Regards,
Aayush Agrawal


#3

Hi Mark,

Aayush has made a very valid point.

I would like to add that you can use GraphLab for exploration as it allows the user to derive insights with very few lines of code. You can get an intro to GraphLab by reading the following article:
http://www.analyticsvidhya.com/blog/2015/12/started-graphlab-python/

Regarding data cleaning, it is difficult to provide a generic solution. If you have to deal with some specific kind of data and you can standardize the cleaning steps, it might be possible to automate that.

Cheers,
Aarshay