Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy.

By clicking "Accept" or further use of this website, you agree to allow cookies.

Accept
Learn Machine Learning by Doing Learn Now
You are reading articles
The Elements of Statistical Learning book cover.jpg
Brendan Martin
Author: Brendan Martin
Founder of LearnDataSci

Most Recommended Data Science and Machine Learning Books by Top Master's Programs

See the most popular books assigned in Master's programs from top universities

How these books were found

After over 15 hours researching and logging materials assigned in Master’s programs, the following books were the most recommended to graduate students in those programs. Since data scientists can come from many backgrounds, the Master’s degrees considered were in applied math, statistics, computer science, machine learning, and data science.

Specifically, the following programs were explored:

  • Master in Machine Learning — Carnegie Mellon University
  • Masters in Statistics — Stanford University
  • Masters in Computer Science, specializing in Artificial Intelligence — Stanford University
  • Masters in Computer Science — Georgia Tech
  • Masters in Data Science — Harvard University
  • Master in Computational Science and Engineering — Harvard University
  • Masters in Data Science — Columbia University

Due to the amount of time it takes to wade through degree requirements, course codes, and catalogs, this article will continue to evolve as I gather more data.

In each book below, I’ve given an example of how the author(s) decided to introduce Linear Regression, one of the most basic machine learning algorithms. If you’re a beginner in data science, I think this will give you some insight into what sort of math background each book requires.

Without further ado, here’s the most assigned and recommended books from top universities.

Most Recommended Books

#1 The Elements of Statistical Learning: Data Mining, Inference and Prediction (“ESL”)

Amazon or FreeAuthors: Trevor Hastie, Robert Tibshirani, Jerome Friedman

This book was either the assigned textbook or recommended reading in every Masters program I researched. Due to its advanced nature, you’ll find that book #5 in this list — An Introduction to Statistical Learning with Applications in R (ISLR) — was written as a more accessible version, and even includes exercises in R.

It’s usually recommended for beginners in data science to master the content in ISLR before moving to ESL, where you’ll get a more theoretical background. Just mastering ISLR is often enough for data analyst roles.

Overall, ESL takes an applied, frequentist approach, as opposed to a Bayesian approach like in the next book. Exercises in this book are not only challenging, but also very useful for individuals generally interested in machine learning research. Fortunately, you can find solutions to the exercises freely available.

To get an idea of the math required, Linear Regression is introduce like so:

We have an input vector $X^T = (X_1, X_2, X_3,...,X_p)$ and want to predict a real-valued output $Y$. The linear regression model has the form $$f(X) = \beta_0 + \sum_{j=1}^p X_j \beta_j$$

#2 Pattern Recognition and Machine Learning (“PRML”)

Amazon or FreeAuthor: Christopher Bishop

Recommended in _almost_ every Masters surveyed, this book usually comes up second after ESL in many course syllabi. PRML is a great resource for understanding the Bayesian derivations of classical machine learning algorithms.

Despite being very clear and rich in diagrams, to get the full benefit of PRML you'll need advanced calculus, linear algebra, and optimization knowledge. Many of the derivations do not show the intermediate steps so it'll be important for you to go through each step on your own for a good understanding.

Unlike the applied approach of ESL, PRML is more theoretical. Here's how Linear Regression in introduced by Bishop:

$$y(\boldsymbol{x}, \boldsymbol{w}) = w_0 + \sum_{j=1}^{M-1} w_j \phi_j (x)$$ where $\phi_j(x)$ are known as basis functions

Luckily, Bishop has also authored solutions to the exercises labeled “www” in the book, making this book a possibility for self-study. You can find those solutions as a PDF here.

#3 Machine Learning: A Probabilistic Perspective (“MLAPP”)

AmazonAuthors: Kevin P. Murphy

MLAPP is another book recommended in almost every program; usually it's between this and the previous book. Considered to be more comprehensive and relevant than PRML, MLAPP is a very dense and broad encyclopedic guide to machine learning.

A great resource for graduate courses, but since it's not freely available and the solutions manual can only be purchased by professors, it's a little more closed off than others in this list and is not recommended for self-study. Also, If you're a beginner in machine learning, this textbook isn't an ideal starting point.

Here's how Linear Regression is introduced:

$$y(x) = \boldsymbol{w}^T\boldsymbol{x}+\epsilon = \sum_{j=1}^D w_jx_j+\epsilon$$ where $w^Tx$ represents the inner or scalar product between the input vector $x$ and the model's weight vector $w$, and $\epsilon$ is the residual error between our linear predictions and the true response.

Within the next couple of lines, Murphy redefines this in probabilistic terms like so:

...we can rewrite the model in the following form: $$ p(y\vert\boldsymbol{x}, \boldsymbol{\theta}) = \mathcal{N}(y\vert\mu(\boldsymbol{x}), \sigma^2(\boldsymbol{x}))$$ This makes it clear that the model is a conditional probability density.

Without a more advanced math foundation, it's easy to get caught in the notation when reading this book on your own.

#4 Deep Learning

Amazon or FreeAuthors: Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach

Unlike the previous two books listed, this textbook goes into a nice general survey of math and machine learning methods. There's many concrete examples and the math is much simpler than MLAPP and PRML.

For example, Linear Regression is introduced like so:

Let $\hat{y}$ be the value that our model predicts $y$ should take on. We define the output to be $$\hat{y} = \boldsymbol{w}^T\boldsymbol{x}$$ where $\boldsymbol{w} \in \mathbb{R}^n$ is a vector of parameters [and $\boldsymbol{x} \in \mathbb{R}^n$ is a vector of inputs]

This notation is much more straightforward for beginners, and very similar to how both the next book, ISLR, presents it, as well as Andrew Ng’s famous Machine Learning course on Coursera.

Overall, this book serves as a good reference and starting point for digging deeper elsewhere, but isn’t comprehensive by any means. There’s not much direct application, so you won’t gain any insight in how to actually implement neural networks, but it is a good high-level complement to deep learning courses — which Andrew Ng has also created.

#5 An Introduction to Statistical Learning with Applications in R ("ISLR")

Amazon or FreeAuthors: Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani

I’ll start out by saying that this a fantastic book. ISLR is usually recommended in the first course of programs specifically built for data science, which makes a lot of sense from how this book is structured.

Although not a thick book by any means, it’s derived from the #1 book, The Elements of Statistical Learning, and comprehensively covers the fundamentals every data scientist should know.

Not only is it extremely clear and accessible to those with a basic undergrad math background, but it has a very applied approach. Every chapter comes with exercises in R that let you work applying the concepts you’re learning directly on some data.

Furthermore, the authors of the book created an accompanying online course, which follows each chapter and is totally free.

For comparison, here’s how ISLR introduces Linear Regression:

Mathematically, we can write this linear relationship as $$ Y \approx \beta_0 + \beta_1X$$ You might read "$\approx$" as "is approximately modeled as". We will sometimes describe [this equation] by saying that we are regressing $Y$ on $X$ (or $Y$ onto $X$).

As you can see, ISLR is much more beginner-friendly. Each statistical/machine learning concept is introduced just like this, without heavy notation, and in a very approachable way.

Let me know your thoughts

Have you read any of the books listed? Did you use any of these in a course? What did you think?

I'm going to continue compiling books I find in course syllabi from top universities and frequently update this article, but I would also love to know what you all think about each of these.

If there wasn't a book mentioned that you've found particularly helpful, leave a comment and let me know!


Meet the Authors

Brendan Martin

Chief Editor at LearnDataSci and software engineer

Get updates in your inbox

Join over 7,500 data science learners.