The Trouble with Data About Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
Commentary
9/21/2016
08:00 AM
Lisa Morgan
Lisa Morgan
Commentary
Connect Directly
Twitter
RSS
50%
50%

The Trouble with Data About Data

It's time to accept the fact that so much of the data we see is biased, whether intentionally OR not. so what can you do about it?

Two people looking at the same analytical result can come to different conclusions. The same goes for the collection of data and its presentation. A couple of experiences underscore how the data about data -- even from authoritative sources -- may not be as accurate as the people working on the project or the audience believe. You guessed it: Bias can turn a well-meaning, "objective" exercise into a subjective one. In my experience, the most nefarious thing about bias is the lack of awareness or acknowledgement of it.

The Trouble with Research

I can't speak for all types of research, but I'm very familiar with what happens in the high-tech industry. Some of it involves considerable primary and secondary research, and some of it involves one or the other.

Let's say we're doing research about analytics. The scope of our research will include a massive survey of a target audience (because higher numbers seem to indicate statistical significance). The target respondents will be a subset of subscribers to a mailing list or individuals chosen from multiple databases based on pre-defined criteria. Our errors here most likely will include sampling bias (a non-random sample) and selection bias (aka cherry-picking).

The survey respondents will receive a set of questions that someone has to define and structure. That someone may have a personal agenda (confirmation bias), may be privy to an employer's agenda (funding bias), and/or may choose a subset of the original questions (potentially selection bias).

The survey will be supplemented with interviews of analytics professionals who represent the audience we survey, demographically speaking. However, they will have certain unique attributes -- a high profile or they work for a high-profile company (selection bias). We likely won't be able to use all of what a person says so we'll omit some stuff -- selection bias and confirmation bias combined.

We'll also do some secondary research that bolsters our position -- selection bias and confirmation bias, again.

Then, we'll combine the results of the survey, the interviews, and the secondary research. Not all of it will be usable because it's too voluminous, irrelevant, or contradicts our position. Rather than stating any of that as part of the research, we'll just omit those pieces -- selection bias and confirmation bias again. We can also structure the data visualizations in the report so they underscore our points (and misrepresent the data).

We Need to Improve, Desperately

Bias is not something that happens to other people. It happens to everyone because it is natural, whether consciously or unconsciously. Rather than dismiss it, it's prudent to acknowledge the tendency and attempt to identify what types of bias may be involved, why, and rectify them, if possible.

I recently worked on a project for which I did some interviews. Before I began, someone in power said, "This point is [this] and I doubt anyone will say different." Really? I couldn’t believe my ears. Personally, I find assumptions to be a bad thing because unlike hypotheses, there's no room for disproof or differing opinions.

Meanwhile, I received a research report. One takeaway was that vendors are failing to deliver "what end customers want most." The accompanying infographic shows, on average, that 15.5% of end customers want what 59% of vendors don't provide. The information raised more questions than it answered on several levels, at least for me, and I know I won't get access to the raw data.

My overarching point is that bias is rampant and burying our heads in the sand only makes matters worse. Ethically speaking, I think as an industry, we need to do more.

 

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
Top-Paying U.S. Cities for Data Scientists and Data Analysts
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/5/2019
Slideshows
10 Strategic Technology Trends for 2020
Jessica Davis, Senior Editor, Enterprise Apps,  11/1/2019
Commentary
Is the Computer Science Degree Dead?
Guest Commentary, Guest Commentary,  11/6/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll