Pulled from the web, here is a our collection of the best, free books on Data Science, Big Data, Data Mining, Machine Learning, Python, R, SQL, NoSQL and more.
If you’re looking for even more learning materials, be sure to also check out an online data science course through our comprehensive courses list.
Looking for more books? Go back to our main books page.
Note that while every book here is provided for free, consider purchasing the hard copy if you find any particularly helpful. In many cases you will find Amazon links to the printed version, but bear in mind that these are affiliate links, and purchasing through them will help support not only the authors of these books, but also LearnDataSci. Thank you for reading, and thank you in advance for helping support this website.
Instantly find the books you are looking for, just start typing below.Comma delimit (e.g.,Python,Clustering)
Comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence.
Learning and Intelligent Optimization (LION) is the combination of learning from data and optimization applied to solve complex and dynamic problems. Learn about increasing the automation level and connecting data directly to decisions and actions.
This book provides an historically-informed overview through a wide range of topics, from the evolution of commodity supercomputing and the simplicity of big data technology, to the ways conventional clouds differ from Hadoop analytics clouds.
Challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which you can use on you own personal media
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.
If you want a basic understanding of computer vision’s underlying theory and algorithms, this hands-on introduction is the ideal place to start. You’ll learn techniques for object recognition, 3D reconstruction, stereo imaging, augmented reality, etc
A guide to practical data mining, collective intelligence, and building recommendation systems by Ron Zacharski. This work is licensed under a Creative Commons license.
For final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models.
The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers many more cutting-edge data mining topics.
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular.
"Essential reading for students of electrical engineering and computer science; also a great heads-up for mathematics students concerning the subtlety of many commonsense questions." Choice
Essential reading for students and practitioners, this book focuses on practical algorithms used to solve key problems in data mining, with exercises suitable for students from the advanced undergraduate level and beyond.
Modeling with Data offers a useful blend of data-driven statistical methods and nuts-and-bolts guidance on implementing those methods. --Pat Hall, founder of Translation Creation
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you concepts behind neural networks and deep learning.
illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments.
Applications and Strategies for Human-in-the-loop Machine Learning.
A clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts in social media mining
This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language.
This book was developed for the Certificate of Data Science pro- gram at Syracuse University’s School of Information Studies.
Learn how to use a problem's "weight" against itself. Learn more about the problems before starting on the solutions—and use the findings to solve them, or determine whether the problems are worth solving at all.
The School of Data Handbook is a companion text to the School of Data. Its function is something like a traditional textbook – it will provide the detail and background theory to support the School of Data courses and challenges.
Create and publish your own interactive data visualization projects on the Web—even if you have little or no experience with data visualization or web development. It’s easy and fun with this practical, hands-on introduction.
MapReduce  is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers. It was originally developed by Google...
'Hadoop illuminated' is the open source book about Apache Hadoop™. It aims to make Hadoop knowledge accessible to a wider audience, not just to the highly technical.
Intro to Hadoop - An open-source framework for storing and processing big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines.
To manage today's flood of available data, a number of high-profile corporations have adopted a new position in addition to existing CTOs and CIOs: the Chief Data Officer, or CDO.
A free handbook series released by Leada to help promote data analytics literacy.
‘A Byte of Python’ is a free book on programming using the Python language. It serves as a tutorial or guide to the Python language for a beginner audience. If all you know about computers is how to save text files, then this is the book for you.
Useful tools and techniques for attacking many types of R programming problems, helping you avoid mistakes and dead ends. With ten+ years of experience programming in R, the author illustrates the elegance, beauty, and flexibility at the heart of R.
This is a simple introduction to time series analysis using the R statistics software.
Practical programming for total beginners. In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand-no prior programming experience required.
The first truly practical introduction to modern statistical methods for ecology. In step-by-step detail, the book teaches ecology graduate students and researchers everything they need to know to analyze their own data using the R language.
"Invent Your Own Computer Games with Python" teaches you computer programming in the Python programming language. Each chapter gives you the complete source code for a new game and teaches the programming concepts from these examples.
I (Dani) started teaching the introductory statistics class for psychology students offered at the University of Adelaide, using the R statistical package as the primary tool. These are my own notes for the class which were trans-coded to book form.
This book is NOT introductory. The emphasis of this text is on the practice of regression and analysis of variance. The objective is to learn what methods are available and more importantly, when they should be applied.
This book is designed to introduce students to programming and computational thinking through the lens of exploring data. You can think of Python as your tool to solve problems that are far beyond the capability of a spreadsheet.
This is a simple book to learn the Python programming language, it is for the programmers who are new to Python.
This book is prepared from the training notes of Anand Chitipothu.
This book describes Python, an open-source general-purpose interpreted programming language available for a broad range of operating systems. This book describes primarily version 2, but does at times reference changes in version 3.
The aim of this Wikibook is to be the place where anyone can share his or her knowledge and tricks on R. It is supposed to be organized by task but not by discipline. We try to make a cross-disciplinary book, i.e. a book that can be used by all.
This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code.
My intent is to present a relatively brief, non-jargony overview of how practicing epidemiologists can apply some of the extremely powerful spatial analytic tools that are easily available to them.
The R Manuals.
This text has been written in clear and accurate language that students can read and comprehend. The author has minimized the number of explicitly state theorems and definitions, in favor of dealing with concepts in a more conversational manner.
This book gives a self- contained treatment of linear algebra with many of its most important applications. It is very unusual if not unique in being an elementary book which does not neglect arbitrary fields of scalars and the proofs of the theorems
The probability and statistics cookbook is a succinct representation of various topics in probability theory and statistics. It provides a comprehensive mathematical reference reduced to its essence, rather than aiming for elaborate explanations.
Three of CouchDB’s creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications.
Get started with O'Reilly's Graph Databases and discover how graph databases can help you manage and query highly connected data.
Essentials of the MongoDB system. Starting with creating a MongoDB database, you'll learn how to make collections and interact with their data, how to build a console application to interact with binary and image collection data, and much more.
This tutorial will give you a quick start to SQL. It covers most of the topics required for a basic understanding of SQL and to get a feel of how it works.
MongoDB is an open source NoSQL database, easily scalable and high performance. It retains some similarities with relational databases which, in my opinion, makes it a great choice for anyone who is approaching the NoSQL world.
Suitable for either a service course for non-statistics graduate students or for statistics majors. Unlike most texts for the one-term grad/upper level course on experimental design, this book offers a superb balance of both analysis and design.
This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, and much more.
This is a textbook aimed at junior to senior undergraduate students and first-year graduate students. It presents artificial intelligence (AI) using a coherent framework to study the design of intelligent computational agents.
The foundations for inference are provided using randomization and simulation methods. Once a solid foundation is formed, a transition is made to traditional approaches, where the normal and t distributions are used for hypothesis testing and...
This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics.
Think Bayes is an introduction to Bayesian statistics using computational methods. The premise of this book, and the other books in the Think X series, is that if you know how to program, you can use that skill to learn other topics.