The Data Science Debate

Written for STAT 503 after reading sections 2.1-2.2 of ISLR and these two blog posts by Vincent Granville.

To begin, I am going to be very honest and say that I am thoroughly bored of the whole ”Is data science statistics? / Is statistics data science?” question. And I’ve only considered myself a statistician for a couple of years! In my mind, the answer is obvious: sometimes. The phrase “data science” is still so vague and loosely defined that there are many subfields of statistics, computer science, economics, mathematics, engineering, political science, etc. that could all earn the label “data science.” It is obvious to me that data mining and statistical learning are two such fields, as well as econometrics and machine learning, and dozens more that I don’t know about.

The main crux of the data science debate, however, seems to lie at the heart of the intersection between statistics and computer science. I’ve read many memorable posts on the topic by Hadley Wickham, Andrew Gelman, and now Vincent Granville, and I’ve read many more that were not as memorable. (Someone should do some data science on all the blog posts about data science. Or is that too meta?) The main debate seems to STEM from the fact that computer scientists and statisticians are each trying to claim data science as their own, when in reality, it’s both of those subjects and many more. As with all shiny new toys, the kids immediately try to snatch them up and play keep away from the others. The idea of “data science” (as its own field) is the shiny new academic toy of the 21st century, and computer scientists and statisticians have spent too many years wrestling over it.

Perhaps I’m being naive, but all this fighting seems completely nonsensical to me. When Newton and Leibniz were inventing calculus over 300 years ago, I’m sure people debated whether it was actually geometry or physics or astronomy, and both men were constantly claiming that the other had stolen his work. (And they didn’t even have the Internet!) I see data science as I imagine many academics saw calculus in the 17th century: it’s a new field with applications in any number of other fields, and some key elements of it already exist in other fields. I think the challenge for the future, if we wish to continue talking about this nebulous field of “data science,” is to define it on its own, leaving room for much more interdisciplinary cooperation and allowing previously disparate pieces of computer science, statistics, etc. to be freely absorbed into this new field. For now, however, I really just think the bickering should stop.

Sam Tyner-Monroe, Ph.D.
Sam Tyner-Monroe, Ph.D.
Managing Director, Responsible AI

I am an applied statistician and data scientist, with a wide range of skills and experiences. I’m passionate about using data to make a difference.

Next
Previous

Related