Books

The best books on Data Science, Big Data, Data Mining, Machine Learning, Python, R, SQL, NoSQL and more.

Data Mining and Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques
4.0 (228 Ratings)

Data Mining: Practical Machine Learning Tools and Techniques

Ian H. Witten & Eibe Frank, 2005

Offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations.

Learning Languages
Learn Python the Hard Way
Languages: Python

Learn Python the Hard Way

Zed A. Shaw, 2013

This is a free sample of Learn Python 2 The Hard Way with 8 exercises and Appendix A available for you to review.

Distributed Computing Tools
Cloudera Impala
Languages: SQL

Cloudera Impala

John Russell, 2014

Learn about Cloudera Impala--an open source project that's opening up the Apache Hadoop software stack to a wide audience of database analysts, users, and developers.

Interviews with Data Scientists
The Data Analytics Handbook

The Data Analytics Handbook

Brian Liou, Tristan Tao, & Declan Shener 2015
N/A

A free handbook series released by Leada to help promote data analytics literacy.

Learning Languages
Learning with Python 3
Languages: Python

Learning with Python 3

Peter Wentworth, Jeffrey Elkner, Allen B. Downey, & Chris Meyers, 2012

Introduction to computer science using the Python programming language. It covers the basics of computer programming in the first part while later chapters cover basic algorithms and data structures.

Learning Languages
Dive Into Python 3
Languages: Python

Dive Into Python 3

Mark Pilgrim, 2009
Mark Pilgrim is a developer advocate for open source and open standards

This is a hands-on guide to Python 3 and its differences from Python 2. Each chapter starts with a real, complete code sample, picks it apart and explains the pieces, and then puts it all back together in a summary at the end.

Data Science in General
School of Data Handbook

School of Data Handbook

School of Data, 2015

The School of Data Handbook is a companion text to the School of Data. Its function is something like a traditional textbook – it will provide the detail and background theory to support the School of Data courses and challenges.

Distributed Computing Tools
Hadoop Tutorial as a PDF

Hadoop Tutorial as a PDF

Tutorials Point
Online Learning Resource

Intro to Hadoop - An open-source framework for storing and processing big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines.

Learning Languages
Python Practice Book
Languages: Python

Python Practice Book

Anand Chitipothu, 2014
Anand conducts Python training classes on a semi-regular basis in Bangalore, India.

This book is prepared from the training notes of Anand Chitipothu.

Forming Data Science Teams
Understanding the Chief Data Officer

Understanding the Chief Data Officer

Julie Steele
Director of Communications at Silicon Valley Data Science

To manage today's flood of available data, a number of high-profile corporations have adopted a new position in addition to existing CTOs and CIOs: the Chief Data Officer, or CDO.

Be notified when we release new material

Join over 3,500 data science enthusiasts.