logo
languageENdown
menu

80 Best Data Science Books That Are Worth Reading

12 min read

Data science is probably the most popular concept nowadays. I believe that many people are looking for an entrance to get inside the industry, and I just happened to read an article that lists some great data science books that may be helpful for you. So I concluded the books in this article with their brief introductions, so you can choose the ones you’d like to read. Here are some you can find online. But most of them I think you may need to search on Amazon.

Part I: Data Scientist Core Skills

Data Science

1. The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists
Twenty-five experts in the industry give out advice in this handbook. This book is very helpful for starters.

2. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data. This guide also helps you understand many data-mining techniques in use today.

3. Doing Data Science: Straight Talk from the Frontline
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Math

4. Multivariate Calculus
https://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/index.htm

5. Linear Algebra
https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/index.htm

Probability and Statistics

6. Introduction to Probability, Statistics, and Random Processes
This book introduces probability, statistics and stochastic processes to students. It can be used by both students and practitioners in engineering, various sciences, finance, and other related fields. It provides a clear and intuitive approach to these topics while maintaining mathematical accuracy. You can also find courses and videos online.
https://www.probabilitycourse.com

7. OpenIntro Statistics
The OpenIntro project was founded in 2009 to improve the quality and availability of education by producing exceptional books and teaching tools that are free to use and easy to modify. Their inaugural effort is OpenIntro Statistics. Corresponding courses and videos can be found at:
https://www.openintro.org

8. Statistical Inference
It’s a textbook for fresh graduates in colleges, which discusses both theoretical statistics and the practical applications of theoretical developments. It also includes a large number of exercises covering theory and applications. 

9. Applied Linear Statistical Models

Applied Linear Statistical Models is the long-established leading authoritative text and reference on statistical modeling. The Fifth Edition provides an increased use of computing and graphical analysis throughout, without sacrificing concepts or rigor. In general, the 5e uses larger data sets in examples and exercises, and where methods can be automated within software without loss of understanding.

10. An Introduction to Generalized Linear Models

11. All of Statistics: A Concise Course in Statistical Inference
This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines.

12. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science
Efron and Hastie gave us a comprehensive introduction to statistics in the big data era in this book.

13. Statistics in a Nutshell: A Desktop Quick Reference

14. Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis

15. Think Bayes: Bayesian Statistics in Python
Briefly introduces how to use Python to do Bayesian Statistics.
https://www.greenteapress.com/thinkbayes/thinkbayes.pdf

16. Bayesian Methods for Hackers
Advanced tutorials on how to use Python to do Bayesian statistics
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

17. Practical Statistics for Data Scientists: 50 Essential Concepts
This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
You can find it here: https://github.com/andrewgbruce/statistics-for-data-scientists

Machine Learning

18. An Introduction to Statistical Learning: with Applications in R
Undoubtfully it is a good book. Everyone in the field should have heard about it.
https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about

19. Applied Predictive Modeling
Applied Predictive Modeling covers the overall predictive modeling process. A must-read before an interview or work.

20. Python Machine Learning
Python Machine Learning Second Edition now includes the popular TensorFlow deep learning library. The scikit-learn code has also been fully updated to include recent improvements and additions to this versatile machine learning library.

21. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies
A comprehensive introduction to the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications.

22. Real-World Machine Learning
This book tells you how to use machine learning to solve real-world problems. I strongly recommend that all data scientists read it before an internship or work.

23. Learning From Data
Explains various machine learning theories that many books don’t mention, such as the VC dimension.
https://work.caltech.edu/telecourse.html

24. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. Great ESL is suitable for thumbing through and excerpting.

25. Pattern Recognition and Machine Learning
The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other book applies graphical models to machine learning.

Data Mining

26. Principles of Data Mining
A basic introduction to data mining, which explains a lot about association rules.

27. Introduction to Data Mining
Presents fundamental concepts and algorithms for those learning data mining for the first time.

28. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Uses practical examples to introduce how to use data mining to earn from customers.

SQL

29. SQL Cookbook: Query Solutions and Techniques for Database Developers
This cookbook mentions lots of traps in the SQL query, and it gives out every popular database’s query code.

R

30. R in Action
The book begins with introducing the R language, including the development environment. Focusing on practical solutions, it also offers a crash course in practical statistics and covers elegant methods for dealing with messy and incomplete data using features of R.

31. R for Data Science

32. R Packages

33. Advanced R
This book was written by Professor Hadley Wickham.
R for Data Science introduces key tools for doing data science with R.
R packages teach good software engineering practices for R, using packages for bundling, documenting, and testing your code.
Advanced R helps you master R as a programming language, teaching you what makes R tick.

Python

34. Think Python
This hands-on guide takes you through the language one step at a time. It begins with basic programming concepts, and then moves on to functions, recursion, data structures, and object-oriented design. It’s suitable for beginners.

35. Fluent Python
Author Luciano Ramalho takes you through Python’s core language features and libraries and shows you how to make your code shorter, faster, and more readable at the same time.

36. Python for Probability, Statistics, and Machine Learning
This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas.

37. Python Data Science Handbook
A very comprehensive handbook about using Python to solve data science problems.
https://github.com/jakevdp/PythonDataScienceHandbook

Data Scientist Interview

38. Data Science Interviews Exposed
Data Science Interviews Exposed offers data science career advice and REAL interview questions to help you get a six-figure salary job!

39. Cracking PM Interview: How to Land a Product Manager Job in Technology
In the US, many data scientists work closely with products, even some of them are employed as product managers. This book is about PM interviews and is valuable to data scientists.

Algorithm

40. Grokking Algorithms: An illustrated guide for programmers and other curious people
Grokking Algorithms is a fully illustrated and friendly guide that teaches you how to apply common algorithms to the practical problems you face every day as a programmer.

41. Problem Solving with Algorithms and Data Structures Using Python
The study of algorithms and data structures is critical to understanding what computer science is all about. 

42. Algorithms in a Nutshell: A Practical Guide
An algorithm guide for quick review.

Handbook

43. The Data Science Handbook
A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline.

Web Scraping and Data Wrangling

44. Web Scraping with Python: Collecting Data from the Modern Web
With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. In fact, simply using Octoparse can fulfill your web scraping needs.

45. Data Wrangling with Python: Tips and Tools to Make Your Life Easier
This book teaches you how to cleanse messy original data and wrangle it into the way you want it.

46. Regular Expressions Cookbook
Though regular expressions are annoying, you have to face it. You can use this book to check up on the regular expressions you want

Data Visualization and Storytelling

47. Communicating Data with Tableau: Designing, Developing, and Delivering Data Visualizations
This practical guide shows you how to use Tableau Software to convert raw data into compelling data visualizations that provide insight or allow viewers to explore the data for themselves.

48. Interactive Data Visualization for the Web: An Introduction to Designing with D3
This fully updated and expanded second edition takes you through the fundamental concepts and methods of D3, the most powerful JavaScript library for expressing data visually in a web browser.

49. Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data
With this hands-on guide, author Kyran Dale teaches you how to build a basic DataViz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations.

50. Storytelling with Data: A Data Visualization Guide for Business Professionals
This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative and compelling story.

A/B Testing

51. A / B Testing: The Most Powerful Way to Turn Clicks Into Customers

52. Designing with Data: Improving the User Experience with A/B Testing 

Part II: Data Science Advanced Skills

The following books are recommended for those who wish to become a Saiyan among data scientists.

Neural Network and Deep Learning

53. Make Your Own Neural Network

This guide will take you on a fun and step-by-step journey, starting with very simple ideas, and gradually builds up an understanding of how neural networks work. 

54. Deep Learning
An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.

55. Hands-On Machine Learning with Scikit-Learn and TensorFlow
This practical book shows you how to use simple and efficient tools to implement programs capable of learning from data.

Information Theory

56. Data Science and Information Theory
This is an article that introduces the importance of Information Theory in the data science field.

57. Information Theory: A Tutorial Introduction
In this richly illustrated book, accessible examples are used to introduce information theory in terms of everyday games like ‘20 questions’ before more advanced topics are explored.

58. Information, Entropy, Life and the Universe: What We Know and What We Do Not Know
If you are interested in exploring the world of information, entropy, and probability or just the world in general, this is a great place to start. Arieh takes the readers through a detailed unfolding of these topics while providing numerous common examples to help with these difficult-to-grasp topics.

Causal Inference

9. Causal Inference in Statistics: A Primer
Judea Pearl presents a book ideal for beginners in statistics, providing a comprehensive introduction to the field of causality.

60. Field Experiments: Design, Analysis, and Interpretation
A brief, authoritative introduction to field experimentation in social science.

Sampling

61. Sampling
Sampling provides an up-to-date treatment to both classical and modern sampling design and estimation methods, along with sampling methods for rare, clustered, and hard-to-detect populations.

Convex

62. Convex Optimization
A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. 

Growth Analytics

63. Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series)
Written by Alistair Croll (Coradiant, CloudOps, Startupfest) and Ben Yoskovitz (Year One Labs, GoInstant), the book lays out practical, proven steps to take your startup from initial idea to product/market fit and beyond.

64. Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity
Web Analytics 2.0 provides specific recommendations for creating an actionable strategy, applying analytical techniques correctly, solving challenges such as measuring social media and multichannel campaigns, achieving optimal success by leveraging experimentation, and employing tactics for truly listening to your customers.

Text Mining And Natural Language Processing

65. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
This book offers a highly accessible introduction to natural language processing, which supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.
E-book: https://www.nltk.org/book/

66. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data
Text Analytics with Python teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem.

67. Introduction to Information Retrieval
Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts.
E-book: https://nlp.stanford.edu/IR-book/

Anomaly Detection

68. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
It is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution.

69. Outlier Analysis
This book provides comprehensive coverage of the field of outlier analysis from a computer science point of view. It integrates methods from data mining, machine learning, and statistics within the computational framework and therefore appeals to multiple communities.

Recommender Systems

70. Recommender Systems: The Textbook
This book comprehensively covers the topic of recommender systems, which provides personalized recommendations of products or services to users based on their previous searches or purchases.

Social Network Analysis

71. Network Science
This pioneering textbook, spanning a wide range of topics from physics to computer science, engineering, economics, and the social sciences, introduces network science to an interdisciplinary audience.

72. Social and Economic Networks
In Social and Economic Networks, Matthew Jackson offers a comprehensive introduction to social and economic networks, drawing on the latest findings in economics, sociology, computer science, physics, and mathematics.

73. Social Network Analysis for Startups: Finding connections on the social web
You’ll learn concepts and techniques for recognizing patterns in social media, political groups, companies, cultural trends, and interpersonal networks.

Time Series Analysis and Forecasting

74. Practical Time Series Forecasting with R: A Hands-On Guide
The book introduces popular forecasting methods and approaches used in a variety of business applications. It offers clear explanations, practical examples, and end-of-chapter exercises and cases. 

75. Forecasting: principles and practice
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.

Reinforcement Learning and Artificial Intelligence

76. Reinforcement Learning: An Introduction
Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field’s intellectual foundations to the most recent developments and applications.

77. Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach, 3e offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence.

Part III: Leisure Reading

78. Soft Skills: The software developer’s life manual
Soft Skills: The software developer’s life manual is a unique guide, offering techniques and practices for a more satisfying life as a professional software developer.

79. The Healthy Programmer: Get Fit, Feel Better, and Keep Coding
This is an excellent book for any professional who sits too much for the job. It contains informative suggestions to improve your health in ways that fit into your busy days. 

80. Exposing the Magic of Design
This book offers a way of thinking about complicated, multifaceted problems with a repeatable degree of success. Design synthesis methods can be applied to businesses to produce new and compelling products and services, or these methods can be applied by the government to change society.

81. Thinking, Fast and Slow

82. Naked Statistics: Stripping the Dread from the Data
Perhaps the most interesting statistics textbook you will ever find.

83. Uncertainty: The Soul of Modeling, Probability & Statistics
This book presents a philosophical approach to probability and probabilistic thinking.

Hot posts

Explore topics

image
Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today

Download

Related Articles