Bad Data: An Enterprise-level Threat

Machine Learning & AI

Being able to trust an autonomous system goes beyond ensuring that data is accurate; it also calls for understanding the origins of that data.

Guest Commentary, Guest Commentary

May 2, 2018

5 Min Read

On September 26, 1983, Soviet watch officer Stanislav Petrov was monitoring a satellite system when it suddenly indicated that the U.S. had launched a nuclear missile at the Soviet Union. Although protocol dictated that Petrov notify Soviet leaders — who would likely order an immediate counterattack — he didn’t.

Why not? Because after considering the satellite system’s warning data within its larger context — experts believed that any preemptive attack from the U.S. would be massive, with bomber and attack support — Petrov wasn’t convinced the alerts were true. By questioning the validity of data, he saved the world from nuclear disaster. (The cause of the erroneous warning? Soviets satellites had confused the reflection of sunlight off clouds for a missile launch.)

Today’s global economy runs on live information, and companies everywhere are betting big on advances in data-hungry technologies. In 2017 alone, investments in AI and the Internet of Things investments were expected to reach $12.5 billion and $800 billion, respectively. Yet without an accompanying push for data veracity, these investments could easily become a sucker’s bet.

While the decisions that businesses make aren’t about launching nuclear missiles, companies that don’t establish the veracity, or accuracy, of that data leave themselves vulnerable to business insights and decisions that are questionable at best, and corrupted at worst. As businesses spend heavily to determine what they can get out of data-driven insights and technologies, they also need to invest in what’s going into them. You know the saying: “garbage in, garbage out.”

As more organizations push toward fully autonomous decision-making, the risks around poor data veracity grow, with critical implications for business and society. For instance, when the state of Indiana made changes to its automated system that flags individuals who might be registered to vote in more than one state — immediately removing flagged individuals from registered voter rolls rather than submitting them for additional review, as it previously did — the system amplified data veracity risks, generating inaccurate fraud alerts 99% of the time. As a result, many legally registered voters were wrongly removed from voter rolls.

But companies needn’t accept the risks of poor data. They can address this vulnerability by building a “data intelligence” practice, drawing from existing capabilities and focusing on three key data tenets: provenance, or verifying the history of data throughout its life cycle; context, or considering the circumstances around its use; and integrity, or securing and maintaining data.

Businesses don’t have to start from scratch; some of key elements of a data intelligence practice revolve around ramping up existing efforts: embedding and enforcing data integrity and security throughout the organization, while adapting existing investments in cybersecurity and data science to address data veracity issues.

The basics, however, will take companies only part of the way. Grading data also requires developing an understanding of the “behavior” around it. There’s an associated behavior around all data origination. By building the capability to track this behavior as data is recorded, used and maintained, companies can provide cybersecurity and risk management systems with a baseline of expected behavior.

Some companies are already doing this. Siemens offers its oil and gas customers anomalous behavior detection for their industrial systems, comparing aggregate data generated from sensors onboard its industrial equipment with historical norms and trends. And SpaceX uses a consensus-based system to mitigate risks around data veracity: each Dragon Capsule uses six computers, operating in pairs, to validate calculations, with each pair checking its calculations against the others’; the spacecraft only proceeds when at least two pairs return the same result.

A data intelligence practice also must consider data within available context, as Petrov did when he realized that the attack alert didn’t fit with accepted knowledge. Thomson Reuters, for instance, has developed an algorithm that uses streams of real-time data from Twitter to help journalists classify, source, fact-check, and debunk rumors faster than before. And Google is using machine learning to remove from its PlayStore apps with overreaching permissions.

By using tools to monitor behavior and context around data’s provenance, organizations can mitigate risks and begin to address issues that might be incentivizing deceit in the first place. Individual instances of manipulated data might have minimal impact, but a bevy of deceptions can skew business outcomes. Researchers at the University of Warwick have studied the way some Uber drivers organize simultaneous sign-offs to cause a shortage of drivers — and trigger surge pricing. Knowing that they’re participating in systems managed by algorithms, these drivers are trying to make the system work in their favor — at the expense of Uber’s efficiency.

Dynamic pricing algorithms also demonstrate the growing need for companies to understand motives for disclosing, or disguising data. For instance, product reviews on Amazon became subject to data manipulation when third-party sellers began paying people to submit fake reviews to inflate their product and seller ratings. Amazon’s response? Giving more weight to verified reviews from customers who had definitively purchased the item and banning reviews from people who received free or discounted products outside the program’s curated process.

The presence of bad data isn’t always due to malicious intent; it could simply be a sign that a process isn’t working the way it was intended. Uncovering processes that inadvertently incentivize deceit is a key step to improving the truth in data and helping ensure that the data is trustworthy enough to drive critical decisions in the future.

Because data is the lifeblood for digital companies, ensuring its veracity becomes a cornerstone of strong leadership. Failure to do so can have grave consequences. The question is: Where is your Stanislav Petrov?

Michael Biltz is a managing director with Accenture Labs responsible for leading Accenture's annual technology vision process. Michael defines Accenture’s perspective on the future of technology beyond the current conversations about IoT, social, cloud, mobility and big data to focus on how technology will impact the way we work and live.

About the Author(s)

Guest Commentary

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT professionals in a meaningful way. We publish Guest Commentaries from IT practitioners, industry analysts, technology evangelists, and researchers in the field. We are focusing on four main topics: cloud computing; DevOps; data and analytics; and IT leadership and career development. We aim to offer objective, practical advice to our audience on those topics from people who have deep experience in these topics and know the ropes. Guest Commentaries must be vendor neutral. We don't publish articles that promote the writer's company or product.

See more from Guest Commentary

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice