Pulled from the web, here is a our collection of the best, free books on Data Science, Big Data, Data Mining, Machine Learning, Python, R, SQL, NoSQL and more.
If you’re looking for even more learning materials, be sure to also check out an online data science course through our comprehensive courses list.
Looking for more books? Go back to our main books page.
Note that while every book here is provided for free, consider purchasing the hard copy if you find any particularly helpful. In many cases you will find Amazon links to the printed version, but bear in mind that these are affiliate links, and purchasing through them will help support not only the authors of these books, but also LearnDataSci. Thank you for reading, and thank you in advance for helping support this website.
Instantly find the books you are looking for, just start typing below.Comma delimit (e.g.,Python,Clustering)
This book was developed for the Certificate of Data Science pro- gram at Syracuse University’s School of Information Studies.
The School of Data Handbook is a companion text to the School of Data. Its function is something like a traditional textbook – it will provide the detail and background theory to support the School of Data courses and challenges.
Learn how to use a problem's "weight" against itself. Learn more about the problems before starting on the solutions—and use the findings to solve them, or determine whether the problems are worth solving at all.
A free handbook series released by Leada to help promote data analytics literacy.
To manage today's flood of available data, a number of high-profile corporations have adopted a new position in addition to existing CTOs and CIOs: the Chief Data Officer, or CDO.
Intro to Hadoop - An open-source framework for storing and processing big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines.
MapReduce  is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers. It was originally developed by Google...
'Hadoop illuminated' is the open source book about Apache Hadoop™. It aims to make Hadoop knowledge accessible to a wider audience, not just to the highly technical.
This book describes Python, an open-source general-purpose interpreted programming language available for a broad range of operating systems. This book describes primarily version 2, but does at times reference changes in version 3.
Practical programming for total beginners. In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand-no prior programming experience required.
"Invent Your Own Computer Games with Python" teaches you computer programming in the Python programming language. Each chapter gives you the complete source code for a new game and teaches the programming concepts from these examples.
This book is designed to introduce students to programming and computational thinking through the lens of exploring data. You can think of Python as your tool to solve problems that are far beyond the capability of a spreadsheet.
This book is prepared from the training notes of Anand Chitipothu.
This is a simple book to learn the Python programming language, it is for the programmers who are new to Python.
This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code.
The aim of this Wikibook is to be the place where anyone can share his or her knowledge and tricks on R. It is supposed to be organized by task but not by discipline. We try to make a cross-disciplinary book, i.e. a book that can be used by all.
This is a simple introduction to time series analysis using the R statistics software.
The R Manuals.
I (Dani) started teaching the introductory statistics class for psychology students offered at the University of Adelaide, using the R statistical package as the primary tool. These are my own notes for the class which were trans-coded to book form.
This book is NOT introductory. The emphasis of this text is on the practice of regression and analysis of variance. The objective is to learn what methods are available and more importantly, when they should be applied.
The first truly practical introduction to modern statistical methods for ecology. In step-by-step detail, the book teaches ecology graduate students and researchers everything they need to know to analyze their own data using the R language.
My intent is to present a relatively brief, non-jargony overview of how practicing epidemiologists can apply some of the extremely powerful spatial analytic tools that are easily available to them.
This tutorial will give you a quick start to SQL. It covers most of the topics required for a basic understanding of SQL and to get a feel of how it works.
Three of CouchDB’s creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications.
Essentials of the MongoDB system. Starting with creating a MongoDB database, you'll learn how to make collections and interact with their data, how to build a console application to interact with binary and image collection data, and much more.
Get started with O'Reilly's Graph Databases and discover how graph databases can help you manage and query highly connected data.
Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts in social media mining
A guide to practical data mining, collective intelligence, and building recommendation systems by Ron Zacharski. This work is licensed under a Creative Commons license.
The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers many more cutting-edge data mining topics.
illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments.
For final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models.
Modeling with Data offers a useful blend of data-driven statistical methods and nuts-and-bolts guidance on implementing those methods. --Pat Hall, founder of Translation Creation
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular.
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you concepts behind neural networks and deep learning.
This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language.
Applications and Strategies for Human-in-the-loop Machine Learning.
Learning and Intelligent Optimization (LION) is the combination of learning from data and optimization applied to solve complex and dynamic problems. Learn about increasing the automation level and connecting data directly to decisions and actions.
Comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence.
This is a textbook aimed at junior to senior undergraduate students and first-year graduate students. It presents artificial intelligence (AI) using a coherent framework to study the design of intelligent computational agents.
Think Bayes is an introduction to Bayesian statistics using computational methods. The premise of this book, and the other books in the Think X series, is that if you know how to program, you can use that skill to learn other topics.
This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics.
This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, and much more.
Suitable for either a service course for non-statistics graduate students or for statistics majors. Unlike most texts for the one-term grad/upper level course on experimental design, this book offers a superb balance of both analysis and design.
The foundations for inference are provided using randomization and simulation methods. Once a solid foundation is formed, a transition is made to traditional approaches, where the normal and t distributions are used for hypothesis testing and...
This book provides an historically-informed overview through a wide range of topics, from the evolution of commodity supercomputing and the simplicity of big data technology, to the ways conventional clouds differ from Hadoop analytics clouds.
If you want a basic understanding of computer vision’s underlying theory and algorithms, this hands-on introduction is the ideal place to start. You’ll learn techniques for object recognition, 3D reconstruction, stereo imaging, augmented reality, etc
The probability and statistics cookbook is a succinct representation of various topics in probability theory and statistics. It provides a comprehensive mathematical reference reduced to its essence, rather than aiming for elaborate explanations.
This book gives a self- contained treatment of linear algebra with many of its most important applications. It is very unusual if not unique in being an elementary book which does not neglect arbitrary fields of scalars and the proofs of the theorems
This text has been written in clear and accurate language that students can read and comprehend. The author has minimized the number of explicitly state theorems and definitions, in favor of dealing with concepts in a more conversational manner.