Maths use in Data Science and ML

Good afternoon everyone!

I am approaching my next task at the university (Mathematics) and I would really appreciate your help.

I am considering branches/areas of maths that are being most commonly used by data scientists. I’ve seen similiar topic but I would like to know what you, presonally, use/need most often. I am not data scientist myself (yet?) but I am aware that there are many different opinions either proper mathematical background is considered necessary at all. Pardon my rather crude selection when it comes to mathematical areas, but that is the ‘set’ of skills that is being mentioned most often. Options are, as follows:








If you’ve got spare second – I would really appreciate your help. Of course, any comments are more than welcome. What do you, personally find either most useful or fundamental?

Thanks in advance!

Data Science at its fundamental level uses Machine learning to make predictions.

In Machine learning there are four areas in which problems fall into:

  • Supervised Learning: Here you are trying to predict something.
    (1) Spam classification - Predicting whether a mail is spam or not
    (2) Predicting Stock price - Predicting the price of a stock next day given data of previous days

  • Unsupervised Learning: Here you try to find patterns in data
    Example: Finding the best places to put new telephone towers. Here we use something called clustering, It is done by maximizing inter cluster distance and minimizing distance within the cluster

  • Reinforcement Learning: Example - You want to create an autoplayer for chess. The end goal is known which is to Win and each move can have a reward or a punishment based on the chances it has to achieve the final result of winning

  • Learning to Learn: Learning features and higher level representations of data. Example: CNNs where you try to derive optimal representations of image

Now if you look at the four areas you will notice that we are trying to either predict something or optimize (maximize or minimize) something. Prediction inherently boils down to the theory of Probability and Statistics and Optimization (maximization/ minimization) to Calculus. Needless to say once you have Calculus, functions get involved.

Also note that machine learning generally involves working with millions of data points and 1000s of features of these data points. In order to structure the information into a mathematical format we need Linear Algebra and Matrices and all the Calculus is done on Matrices.

So in short these are the fundamental mathematical things you will need in order to become a good data scientist:

  1. Linear Algebra
  2. Calculus (of several variables and matrices)
  3. Probability and Statistics
1 Like
© Copyright 2013-2019 Analytics Vidhya