A Sr. Statistician needs guidance to build Data Science Skills

sas
data_science

#1

To begin with something about myself: I am a senior statistician almost 50 years of age with over 20 years of experience in statistics. Have masters degree in statistics from University of Toronto and have been working in data driven business in senior leadership roles (Director and above levels) around the world. Have been using SAS for over 20 years with SAS advance programmer Certification (Note never used R; though I really wanted to).
Having said that, I really wanted to develop and improve my credentials/skills in Data Science (and/or big data analytics) with certifications in the end. Now here I want your help and guidance to understand following points:

  1. what exactly is the difference between data science and big data analytics (i.e. what are the commonalities and differences between those two intertwined domains)?

  2. for a senior statistician which one is more apt area to grow skills into? i.e. which one adds more value quickly (which ones could be consider as quick wins and the rest could be taken as knowledge/skills grow richer) keeping in mind I don’t have much time/energy?

  3. Please confirm, or otherwise, my understanding that data science, perhaps, is more logical path for someone like myself to grow skills into?

  4. please also guide me what would be the correct way forward i.e. step by step course needs to be taken for someone like myself (i.e. should I start with R, Python, hadoop, mapreduce, nosql ?); which courses are considered foundation level courses for one or both domains (i.e. data science and Big data)


#2

@alisq786 Its great to see such curiosity to improve skills for Data science from a senior person like you.

I would try to answer few of your question … hope it would be helpful.

  1. what exactly is the difference between data science and big data analytics (i.e. what are the commonalities and differences between those two intertwined domains)?
    Ans : From my understanding , Data science is a general term which include multiple thing in it like Machine learning, Big data, web analysis , Statistics etc. Now when you ask about the Difference between Data science and Big data analytics, In a simple language i would say Big data analytics is a sub part of Data science.
    Big Data tools like hadoop, mapreduce, pig, nosql etc are such tools which are used for data storage, fast processing and quick response.These tools are in demand today but in future, you never know something else will be in demand , where as the algorithm behind this tool will still remain the same which is a part of Data science.

  2. for a senior statistician which one is more apt area to grow skills into? i.e. which one adds more value quickly (which ones could be consider as quick wins and the rest could be taken as knowledge/skills grow richer) keeping in mind I don’t have much time/energy?
    Ans : Being a statistician, i would say Data science would be an appropriate field for you as your 60% of work related to basic statistic concept is already done . You only have to work on remaining 40% which includes getting some programming skill, SQL Skill and learning the new big data tool. All of these will take time but its all up to you.

  3. Please confirm, or otherwise, my understanding that data science, perhaps, is more logical path for someone like myself to grow skills into?
    Ans : Yes , you are right, i will recommend the same.

4)please also guide me what would be the correct way forward i.e. step by step course needs to be taken for someone like myself (i.e. should I start with R, Python, hadoop, mapreduce, nosql,…. ?); which courses are considered foundation level courses for one or both domains (i.e. data science and Big data)
Ans : Below is the basic skeleton step which you can follow :

  1. Learn Basic statistic concept used in data science.
  2. Learn R and/or python (note : if you have a good programming skill i would recommend you to go for python , else go for R)
  3. Learn Big Data tool like Hadoop, Mapreduce, Hive, Pig, NoSoql etc.

#3

Sir,

Let me start with two of the easiest articles to understand between approach to data science and big data.

You are basically a data scientist. You i assume based on your expert experience are an expert in statistics. You can build models, fit regression lines and basically create probability of something occurring or not statistically. This is a skill that looks at data.

You should be in the consulting layer and hardly worry about learning new skills. You should be guiding teams to use whatever tool they can but guide them on understanding the insights from data while building models while closely working with business heads to understand meaning of that data(more like feature engineering).

With your experience in SAS you will hardly take any effort to learn R or Python which basically is a free version what sas has been doing all these years plus some improvement.

Most importantly your skill should be around consulting and interpreting what the technicians generate using different tools to help business take right decisions.

Having said that it wont take you long to just get a hold of R or Python(i would say start with Python directly), spark using enormous source available online.

We would be happy learn from you various statistical interpretation. Please say you are in chennai or bangalore and i will find a way to learn more from you :slight_smile: :slight_smile:

Thanks
S.Vivek


#4

thanks Vivek for your kind and encouraging words ; feel happy and flattered. However most of my work has been in the domain of modelling & forecast (regression, econometric, TS etc.). I am working here in Dxb, unfortunately not in India.
What I understand from your comments is that the most likely first step for someone like myself would be to start learning python or R? and you recommended python over R, right?
again thanks for your very motivational comment; appreciate it :slight_smile:


#5

Thanks Saurabh; your comments are spot on and very helpful, truly appreciate it :))
My take on your following three recommendation would be as follows: Please comments on my understanding (in parenthesis beside your points) whether it all looks logical to you. thanks

  1. Learn Basic statistic concept used in data science. (assuming this would not be needed for me and I’ll take it as it goes; however, if possible, please list the most commonly used basic statistics concepts )
  2. Learn R and/or python (note : if you have a good programming skill i would recommend you to go for python , else go for R) (given my skills in SAS, sql, it seems python will be logical first step for me? if you agree, please help me with the way forward in learning python i.e. where to start, which book or course will be most helpful; note: given my daily routine, I prefer self-paced online learning )
  3. Learn Big Data tool like Hadoop, Mapreduce, Hive, Pig, NoSoql etc. (As your recommended, will be taken once I have some hold on python; right?. plus again would be helpful if you recommend the path forward on learning i.e. where to start, which book or course etc.

Lastly you didn’t included machine learning into your 3 points? or it is embedded in point 3?

thanks again Saurabh for your kind and quick response; much appreciated :)))

p.s. Saurabh, wondering did you get this message?


#6

@alisq786 Sorry for late response. as i needed some time.
Lets me give u detail explanation one by one .

  1. List of basic statistics concept:
    Descriptive statistic
    Summary statistic
    Population and sampling
    distribution (eg: Discrete , Continuous, Normal, etc)
    Central limit theorem
    Parametric / Non parametric test
    Type of Test (eg :One tail, two tail, Z test, T test, F test etc)
    Chi-square, ANOVA, MANOVA
    Hypothesis
    Goodness of fit

And the list goes on.

  1. yes i agree with you . Then you should go for python.
    Book which you can prefer:
    Introduction to Machine Learning with Python by Andreas Muller and Sarah Guido
    Mastering Python for Data Science by samir madhavan.
    In analyticsvidhya we have a list of book which can help you . Below is the link, please have a look:
    https://www.analyticsvidhya.com/blog/2016/10/18-new-must-read-books-for-data-scientists-on-r-and-python/

3)Yes you are right in Big Data part, 1st get some hands on python.

  1. Lastly you didn’t included machine learning into your 3 points? or it is embedded in point 3?
    Ans : I haven’t missed that , it is incorporated in point 2, where you should learn machine learning concept and how to implement effectively in python.

Machine learning concept include :
Linear Regression
Logistic Regression
Clustering
Decision Tree
Time series
SVM
Text mining
NLP
Neural network
Deep learning etc

Happy to help :slight_smile:


#7

Thanks Saurabh again for your spot-on advise; truly appreciate it :slight_smile:
couple of quick related questions

  • wondering in the list of machine learning concept you included many of statistical concept i.e. linear regression, logistic regression, time series etc.?
  • just out of curiosity what is Mahout and its relation with data science?

Thanks again for your time and generous help :))


#8

@alisq786 Yes you are right , i have included many statistical concept in machine learning because , these concept are used to train or learn machine during model building phase, where as other statistical concept are used for understanding data and for testing model. Hence Statistic and machine learning goes hand in hand .

what is Mahout ?
Ans : I have very limited knowledge about apache Mahout , only thing i know about is that , It is a platform which is been used to build scalable machine learning model. I on a personal level haven’t used it still. Hence don’t have any in depth knowledge about it.
Thanks for asking me about Mahout :slight_smile: and increasing my curiosity.
Will learn more about Mahout, and if i get any interesting insight about it , will definitely share it :slight_smile:
Thanks a lot :slight_smile:


#9

Thanks saurabh for your time and generous help; truly appreciate it :))))
please stay in touch and have a wonderful life ahead!