Data science is one of the most sought after skill-sets out there right now, with demand for those with skills far outstripping supply.
This is good news for those who’ve gained, are in the process of gaining, or are planning to gain those data analysis skills.
But what exactly is data science, what does it entail, and what makes a data scientist?
If you’re looking to get started in the dizzying world of data science but don’t know where to begin, read on to find out what data science is, what it really means, what you need to succeed in the field, and why its all the rage right now.
What is data science
The chances are that you’ve probably already heard about data science before arriving here. In fact, I’d be surprised if you hadn’t since the buzz around the field has reached crazy levels in recent years.
If you only listened to the hype you might think that data scientists have nearly superhuman abilities that can solve pretty much every.single.problem that exists in the universe (spoiler alert: they can’t).
Everyone is so excited about data science that Harvard Business Review even listed data scientist as “the sexiest job in the 21st century”.
Data has become kind of a big deal and by extension, those that know how to work with large volumes of it are so hot right now.
This rush for all things data science is in part due to the massive influx of data (aka “big data”) that now exists in the world and that we’re all producing every single day.
Understandably, this has led businesses (traditionally in the tech industry) to realize that being able to effectively and accurately analyze large quantities of data has the potential for very real (and very big) gains. To recruit the best and brightest employees that could help a company capitalize on this surge of data, companies began offering starting salaries in the six figures for job titles like data scientist and data analyst.
It’s not just companies that are interested in big data, of course.
High demand for data science has also been fueled by researchers, analysts, and academics who are equally as excited by the potentials of big data (and small data too!) and want to be able to leverage technology to answer very real questions. Combined with advances in computing power, this has all led to something exciting – what exactly that is though, is up for debate.
So, while data science is now an established career path, the details about what it actually is are still a bit fuzzy.
Let’s break it down to get to find out.
A quick internet search to help answer the question “what is data science?” yields a lot of contradictory answers filled with so many overused buzzwords that they are basically useless and often leave you with more questions than answers.
For instance, how big is big data? Is data science even a science? Or, on an even more fundamental level, what is data? And perhaps the most important question of all – why should I care about data science?
Part of the reason there is so much confusion around data science is that it isn’t well-defined.
As Cathy O’Neil and Rachel Schutt discuss in their must-read book on the topic Doing Data Science: Straight Talk from the Frontline, “part of the problem is that there ‘is no well-defined body of knowledge, and there’s no canonical corpus. It’s popularized and celebrated in the press and media, but there’s no ‘authority’ to push back on erroneous or outrageous accounts. There’s a lot of overlap with various other subjects as well; it could become redundant with a machine learning textbook, for example.”
Despite all the hype and the ambiguous, hard to pin down definitions, data science is definitely something even if only just an area of interest that is undergoing some sort of paradigm shift.
But what exactly is data science?
On the most basic level, data science is about using scientific methods to solve real problems with data.
Let’s start by breaking down the name to see if we can come to a suitable definition for our own use.
Merriam-Webster defines the two components of data science as:
Definition of data
1: factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation
1: the state of knowing : knowledge as distinguished from ignorance or misunderstanding
So, a useful way to think of data science could be:
The process of collecting factual information, measurements, or statistics for reference or analysis in order to reach a state of knowledge.
We could then extend this further to bring it in with the technologies shaping the modern world to include vast amounts of information (big data) and carry out predictive modelling and forecasts using things like machine learning and increasingly smart AI.
It’s in the name, right? Data + science = data science.
More broadly, data science is a multidisciplinary field which brings together practitioners and specialists from a number of different areas, generally sitting at the overlap between statistics and math, computer science, and specific area expertise (this could be anything from business or chemistry to zoology or history).
You might have already seen Drew Conway’s data science venn diagram circulating around the interwebs – it does a pretty good job of illustrating data science’s multidisciplinary nature.
Mike Driscoll offers a pretty fun definition of data science IRL:
Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics.
But data science is not merely hacking, because when hackers finish debugging their Bash one-liners and Pig scripts, few care about non-Euclidean distance metrics.
And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a tab-delimited file into R if their job depended on it.
Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools & materials, coupled with a theoretical understanding of what’s possible.
A simpler description of what a data scientist does comes from Mike Loukides:
Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.
Sounds pretty fun, right?
Data science: the basics
Although the term “data scientist” was only ‘officially’ coined in 2008, the field is actually pretty old, building on the work that statisticians have been doing for years.
It might seem like “machine learning algorithms were just invented last week and data was never ‘big’ until Google came along” but that just isn’t the case.
Everything that is being developed now is based on decades, or even centuries, of work done by all of the statisticians, computer scientists, engineers, mathematicians, and other scientists who have already paved the way.
With this in mind, perhaps a better way to think about data science in today’s context is to think about the basic definition and remit of the subject alongside the contemporary environment and technologies that now allow us to take it to the next level.
Sort of like a data science 2.0.
Because ultimately, while there’s nothing new about fundamental statistical methods or carrying out a linear regression, and programming languages have been around for decades now, what is new is the way in which these different disciplines are leveraging technology and coming together in order to allow us to process, analyze, and understand the masses of data that we now have access to.