The Biggest Mistakes Made by Data Scientists - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management
07:00 AM
Xin Heng, Senior Director of Data, Punchh
Xin Heng, Senior Director of Data, Punchh

The Biggest Mistakes Made by Data Scientists

While the tools may change, the mistakes stay the same. Here are four common issues that IT leaders should be aware of when managing data science teams.

Image: metamorworks -
Image: metamorworks -

In 2019, companies looking to gain an edge on competitors and insight into customers and trends have come to rely more heavily on data scientists to inform their business decisions. A good data scientist is invaluable to a company with any online presence. They will assess and interpret complex information and build out machine learning algorithms. 

Data volume keeps growing, and the amount of skill and effort needed to create data-driven initiatives is certainly keeping pace with that growth. Mistakes can produce huge consequences and, while the tools may change, the mistakes stay the same. Over the course of my career I’ve seen every permutation of these common mistakes, and my hope here is to help you identify and avoid them within your own teams.

Mistake #1: Lack of coding skills

This one may seem obvious, but you would be amazed at the number of people who feel data science is a career completely removed from the practice of coding. The central tenet of data science is, and really has always been, building a model with a long script. The quality of that script (or lack thereof) has endless consequences, from scalability to robustness of the model when it goes in production.

An excellent data scientist must also be a good programmer. My rule is: a senior data scientist must possess a mid-level software engineer’s coding skill and a mid-level data scientist should be on par with a junior software engineer.

Mistake #2: Lack of defensive mindset

The adage goes “the best offense is a good defense” and, while sports rarely overlap with code, in this case the saying is apt. Teams need to emphasize the mindset: “How wrong can the model be on a bad day?”

A single mistake can become a financial and legal consequence to the company. If you don’t test and retest your code with a defensive mindset, it will certainly have errors.

In machine learning, people use performance metrics like precision, RMSE, and MAE. Those are averages and do not act as a replacement for defensive testing.

Mistake #3: Poor use of time on data cleansing

In my career, I have trusted my data science teams’ data exploration skills and I rarely saw a data scientist make a data mistake. They have all been smart and prudent.

I have, however, seen numerous cases where they spend several weeks looking at the data, refusing to build the end-to-end ML software. This is too much time on data cleansing and ignoring the task of building the end-to-end flow.

I see a huge difference between a computer science-trained data scientist and a physics-trained data scientist. I come from physics, but I strongly prefer the “let’s write some code” approach.

Unless you build the ship, there will be many unforeseen holes that will sink you later. I would also anticipate the project managers will have little patience on troubleshooting numerous errors. They need something to show the product leaders on the fixed deadlines.

Mistake #4: Time wasted on studying individual models

When a data scientist spends too much time studying individual models, he or she can lose sight of how the models should talk to each other. A dynamic pricing project can easily affect an ad bidding project, which doesn’t normally know the price that the clicker will get. This question certainly belongs to the senior data scientists and their managers.

To prove useful, actions need to be taken on data collection. It’s up to the data scientist to help his or her company move through digital transformation by monitoring, testing, performing robust analytics, and building machine learning infrastructure to improve business practices and solve problems. By helping your data scientists with the above points, they can better support the company.

Xin Heng is VP of Data at Punchh, Inc., in San Mateo, California, where his team's primary responsibility is to build the world-class data solutions to drive the growth of both Punchh and its business partners. Prior to joining Punchh, Heng was the Head of Data Science at StubHub and Data Science Manager at Uber. He holds a Ph.D. in electrical engineering from the California Institute of Technology and a Master of Financial Engineering from the Walter Haas School of Business at the University of California, Berkeley. His Twitter handle: @xheng123


The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Learning: It's a Give and Take Thing
James M. Connolly, Editorial Director, InformationWeek and Network Computing,  1/24/2020
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Flash Poll