Learn mathematics for data science and machine learning.


by Hadrien Jean 

A lot of people are going into the field of data science from various backgrounds. Fortunately, you can find your place in the field without a Ph.D in machine learning. There are great libraries allowing you to implement complex algorithms in a few lines of code. I love this top-down approach: you start by building something and when you encounter difficulties, you dive a bit more into the theory (you can read about this in a great post of Rachel Thomas).

However, you’ll see that having an idea of what’s inside the hood can be a huge boost to your skills in data science and machine learning. I’ll try to show you my approach to succeed in this journey with the following crucial points:

  1. Target the right concepts to learn and at the right level of details
  2. Go back and forth between practice and theory using code and visualization

1. Target the important concepts to learn and at the right level of details

It can be hard at first to select the topics to learn without being overwhelmed by the multiple theoretical notions that you need to digest. Furthermore, many great resources in machine learning and data science are not for non-math people. For instance, if you decide to look at the seminal machine learning books like “The Deep Learning Book” by Ian Goodfellow, or “Pattern Recognition and Machine Learning” by Christopher Bishop, you might find it hard to approach with no math background. You’ll find a lot of concepts, notations, or steps, that are not specified.

One solution is to select the math you need to understand specific data science and machine learning topics you want to understand and get general resources. The issue is that these general resources might go too far (e.g. too detailed: you don’t want to become a mathematician but just get more insights about the data science tools you’re using) or not exactly into the direction you need.

For this reason, I worked for more than a year on this question, gathering the math topics required for data science and machine learning. The result is my book Essential Math for Data Science that I just released.

In this book, I’ll introduce you to the major math topics for data science:

Calculus

Example: Cost function applied to linear regression.

  • Statistics and probability theory

Example: joint probability distributions.

Linear algebra

Example: Scalars, vectors, matrices, and tensors.

The goal is to explain the steps in detail to be sure that even people with a small math background can follow along.

2. Go back and forth between practice and theory using code and visualization

For the story, I landed a job as a machine learning scientist after a Ph.D. in Cognitive Science. I had some experience in building machine learning projects but I really needed to sharpen my theoretical understanding. I discovered that coding was a great way to learn about the math behind the algorithms I was using. I find it especially useful to keep the motivation to learn about abstract notions.

In Essential Math for Data Science, my goal was to take a practical approach, using concrete examples and also a lot of code. Instead of showing proofs and theorems, I want to give insights and intuition about the topics. For this purpose, code and visualizations are the perfect tools. The book is thus particularly suited for people with a programming background (or an affinity for coding).

There is one hands-on project at the end of each chapter where practical notions from data science and machine learning like gradient descent, regularization, or Bayesian inference are developed.

Example from the hands-on project of Chapter 06: Polynomial regression.

If you’re interested, find more details about the project here.

You can also benefit from a big discount using this offer code: wandb20!

Essential Math for Data Science

(source)