Data is everywhere and for the first time in human history, we’re collecting and analyzing more of it than ever before.
If you want to get ahead in the fast moving world of data science, then immersing yourself in the essentials – whether it’s the basics of theory or practical application – is a real benefit to becoming a better (and more informed) data scientist.
Wherever you are on the path to data science mastery, bringing yourself up to speed on the essential elements of the subject from current practitioners, commentators, and researchers in the field is one of the best ways to get a handle on what’s happening on the front lines of data and reading up on the subject is a great way to improve overall knowledge.
Data science is by it’s very nature, a broad subject area and spans everything from statistics and the handling of large sets of information, to coding, the interpretation and presentation of results, and even the ethical and moral considerations of working with different types of data.
With all of this in mind, the following books on data science form my essential reading list for those at different stages of their data science odyssey and represent the books which I have found most useful in my own journey so far and frequently recommend to others.
Data Science Books for Beginners
Whether you want an overview on the basics of big data, are looking to get acquainted with the essentials of statistics, or want to know how to get started with coding for data science, the following books are those I recommend to beginners.
If you’re looking for one of the very best (and clearest) introductions to the field of data science in book-form, then it’s hard to do better than this one from the MIT Press Essential Knowledge series.
Data Science brings together the practical principles, concepts and processes of the field alongside where it came from and where it’s headed next, this book provides excellent coverage of the subject of data science for those completely new to the subject as well as those well versed in the field.
Covering everything from the history of data science through to the fundamental concepts of machine learning, the uses of data science today, and the kinds of ethical challenges which need to be considered, this book is an excellent concise intro to the subject of working with data in today and in the future.
What I love about the Very Short Introduction range is that they’re small enough to throw in a jacket pocket or bag when you need to run out the door in the morning, while still managing to cover the essential elements of a lot of complex topics from leading thinkers in the subject area.
Dawn Holmes’ intro to big data is no exception and I often grab either this book (or Margaret Boden’s intro to AI from the same series) if I’m heading out and know I’ll need to quickly grab a fact or piece of information on the go.
Beyond the practical side of things though, this book is an excellent primer on the topic of big data and provides a comprehensive overview of the essentials which are both accessible to beginners and incredibly useful as a resource when you need to refresh your memory.
A specialist in Bayesian networks, data mining, and machine learning at UC Santa Barbara, Dawn Holmes is well placed to cover the essential elements of the big data world we’re all a part of.
Cathy O’Neil and Rachel Schutt are both authorities in the fields of data science and mathematics respectively and so are well positioned to provide a thorough overview of what data science entails, including what it is and what it isn’t.
Doing Data Science is based on lectures from Columbia University’s Introduction to Data Science class and includes chapters covering topics ranging from statistics and exploratory data analysis, through to algorithms, data wrangling, logistic regression, and data visualization.
It should be noted that this book is designed for people who have some exposure to some related math concepts (statistics, linear algebra, probability) and some coding experience, however with that being said, it makes an equally good introduction to the field of data science for those who don’t (yet) have any of these skills – after all, by the time we’re done, you’re going to have the essentials of all of these down anyway!
If you want to cut through the hype and find out what carrying out practical data science in the real world looks like, then this book by two data scientists working on the front lines is essential reading.
Following a common trend of many books that explore the topic of big data, Brian Clegg’s introductory analysis of the subject highlights many of the ways in which we need to have a full and clear understanding of how, why, and in what way the copious amounts of data we now generate are being used.
Covering the uses of big data in areas as diverse a science and medicine on the one hand, to recommender systems on Netflix on the other, this book provides a good overview on the very nature of the fact that data is everywhere and with this in mind, why it’s so crucial that we educate ourselves on the responsibility we all have to understand and manage this new resource.
Big data has revolutionary potential.
On the one hand, it provides the possibility to solve fundamental problems we all face, creating a society that benefits all and elevates humanity.
On the other, big data has the potential to be seriously manipulated by corporations and states alike, leading to a path which ends in totalitarian conditions, exploitation, and general misery for billions.
This book lays out the current facts about big data alongside its promise and risks, making it an essential read if you want to inform yourself on the essential introductory elements of this subject.
This is the second book I recommend from Cathy O’Neil (see Doing Data Science above for the first) and gives great insight into the increasing use (and implications) of algorithms in our everyday lives.
Given the author’s knowledge in the fields of data science and math, O’Neil is well placed to talk through both the incredible power and the potential risks associated with the combination of big data and sophisticated mathematical models
What makes this such a useful read if you’re interested in understanding the role of data all of our lives is that it exposes the issues surrounding how so much of the data that’s analysed today is done so in a way which is often flawed or manipulated, highlighting the fact that while the statistics and data never lie, the way in which it’s used, interpreted, and presented can often result in a far from accurate end-result.
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are – Seth Stephens-Davidowitz
Seth Stephens-Davidowitz was formerly a data scientist at Google and in this books argues that much of our previously held notions about human interaction was based on a false assumption – that nobody lies.
Exploring how the inundation of data (mostly from the internet) has turned this notion on its head, the author looks to discover the truth by showing that the ability we now have to examine and analyze the results of people’s online data-trail mean that for the first time ever, we can know what people are actually thinking, wanting, and doing.
If you want to gain a good understanding of the way data is being used (for good and bad) in the real world, then this eye-opening analysis of the role of big data in our everyday lives is a great place to start.
When it comes to data science, statistics is a very big deal.
For many people however (although not Elle Knows Machines readers obviously), the ‘S word’ can send shivers running down the spine.
In this no-fear, no-nonsense approach to the subject of stats, Dartmouth professor Charles Wheelan does away with many of the technicalities that can send people running and focuses instead on the essential elements behind statistical analysis.
Providing an approachable introduction to key statistical concepts such as regression analysis, correlation, and inference, Naked Statistics is a great stepping stone into the basics of arguably the key discipline underpinning data science and is a valuable resource if you’re looking to bolster your journey into the world of statistics and data.