Wandering In Data Science

machine_learning
data_science
statistics

#1

Hi All,

I am trying hard to be a data scientist and for last two years I am struggling a lot to complete all the background knowledge to be a data scientist. There are heaps of strategies in google to land you a data scientist role. I am still struggling and don’t feel confident enough to apply for a job. I started to dive to learn all data scientist stuff without having any sound Math/Statistics background e.g. Machine Learning e.g. Classification , regression (Decision tree ,Random Forest, Simple/Logistic regression) etc. During these learning I always stuck to background knowledge like Over fitting Vs Under fitting , bias vs un biased parameters , probability vs odd , data distribution i.e. Normal vs non normal , outliers finding and data transformation etc. Now I have jumped into Bayesian Statistics and the reason to jump into this Statistics is to cover Maximum Likelihood estimation, actually I was trying to understand KNN algorithms where some data were not normally distributed and author mentioned that for KNN normal distribution works fine which makes sense as well , but his data was transforming easily to normally distributed by using simple log transformation. Some data came into my hand where log normal was not working fine to transform it normally distributed , for that I came across to use Box Cox transformation , now at this stage while reading box cox transformation one method is being used MLE (Maximum Likelihood Transformation) , For covering maximum likelihood I jumped into Bayesian Statistic inference. Now at this stage rather just focusing Bayesian background in light of MLE ,I am determined to complete all the ins and outs of Bayesian Statistics and this is not first time I jumped from one space to another space. This is how I am diving in this world but once I am done with one then jump to another one , but to learn a whole thing I have to complete all the background series of learning this specific topic- I thought I should ask to seniors who can really give me some mentor ship regarding my learning curve , is it normal or I am wandering a lot here and there. In Short please share strategies and point me what I am doing correct and wrong ? When should I apply for the jo?

Thanks

Sufyan


#2

Hey
This is one of the most common issue faced by the Data Science community. Data Science is a vast subject and it is extremely difficult for anyone to cover all the topics in detail. I usually prefer to cover the basics such that I am able to explain the logic behind the algorithm. Getting too deep in any technique or algorithm is something which I avoid. Although I do go in deep for techniques which are widely used and for techniques which when played with can provide better solutions. As optimisation of any algorithm requires you to understand the basics and logic behind it, going too deep in any topic is something you can avoid in general.

Though getting deep into any topic which make you master that topic for sure. Therefore it depends what you want to be - “Jack of all trades or Master of one”.


#3

Hello.

In my opinion, you should consider do a concrete master in ‘data science’ or, better yet, a master in statistics, to reach all the confidence you need. A strategy I like is replicate examples in books, using different languages (R, SAS, stata…).

Cheers!


#4

I think you should try implementing these concepts you have learned in any programming language of your choice like Python or R. Data Science is huge & interdisciplinary, most of these concepts are built upon years of research. Pick any topic like neural networks & you will find many books & research papers to read & some new breakthrough happens almost every month in deep learning. I also find myself in similar wandering situation as you but following solutions have helped me:

  1. Focus: Focusing on a specific area of expertise like if you are inclined towards say Natural Language Processing then spend some time on NLP only before jumping to another topic.
  2. Solving Problems: This is most effective way to reinforce your learning & builds confidence. There are plenty of datasets, data science hackathon problems available on the internet. Apply concepts learned on these problems, it makes learning fun also & eventually if you want to enter in industry doing these small projects in Python or R is most effective way.
  3. Courses: There are plenty of courses available on coursera on data science topics like Machine Learning, Statistics, Data Mining specialization etc. You can learn by googling as well but course teaches a subject in step by step methodological way so you will not wander. Courses also have checkpoints, assignments, quizzes, discussion forums to ensure effective learning. Such certification will boost your confidence & you can also highlight this certificate while applying for a job.

I hope this helps, All the Best!


#5

Hi!

I’m in the same situation and I’ve been in this situation for more than a year now. First of all thanks for asking this question here. Through this discussion, I also knew that ‘wandering’ is a part of learning in data science.

I read this article in AV (I think you also have read it already). This gave me some clarity.

I have completed Andrew Ng’s certification last year and am now completing some certifications in Python. I will soon complete some certifications in basic statistics. During this time I’m also trying to create a portfolio in data science. For the portfolio, first I will replicate already implemented machine learning solutions from several data science blogs, then after getting some idea on how people arrive at a solution to a particular problem I will start implementing my own solutions to different data science problems.

I hope the link and my learning path helps you! Best wishes for your career :slightly_smiling_face:

BR, Thileepan


#6

Thanks Guys ,

@mohitatav and All other guys

That makes sense focus on specific area what about Bayesian statistics ? i am more concentrating on Bayesian statistics