Noob needs help with data preprocessing, feature selection and feature engineering

machine_learning
data_wrangling
data_science
python

#1

I have completed all the courses on datacamp. I can use numpy pandas scikit learn matplotlib and bokeh i.e I know the syntax and what it does. But I’m still clueless as to when to use what ex. when to scale or normalize or when to impute mean/median for missing values (how will it affect the algorithm) or which algorithm should I prefer as there are multiple choices. In short how to develop intuition. whenever I download data set I’m clueless as to how should I prepare my data for training. Also how much should I know about algorithms, I only have a high level understanding of most of the algorithms, do I need to understand them at mathematical level? I’m in my final year of engineering so I can spare time for learning. By the next june I want to be more than comfortable with data analytics with python and have basic knowledge of hadoop.
I’m in desperate need of guidance. Any help will be appreciated. Thank You.


#2

Here is a link to a book


I hope this book will be going to clear all your doubts, I had a chance to recently read it and its a nice one.
yes you need to understand mathematics at the coarse level, because when you will be on job you need to modify some algorithm and sometimes build some new one but only if you are in product development.
Thanks