How to Define Accuracy in Analytics for Business - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
5/11/2010
11:46 AM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

How to Define Accuracy in Analytics for Business

For information retrieval and analytics purposes, we need a broad definition of "accuracy."

Esteban Kolsky, a customer-strategies research analyst, has blogged for semantic-technologies vendor Attensity on How Accuracy in Analytics Matters for Businesses. It's a thought-provoking article, yet a central statement of his calls for exploration, that "The only way to measure accuracy is by comparing the results of the computer analysis to similar analysis done by humans." There's more to accuracy, and more to computer analysis, than you might think.Kolsky's topic is social-media analytics. His focus is the subjective content of on-line text: feelings rather than facts. Subjective content -- attitude, opinion, and even emotion -- is different from objective facts. Subjectivity is uniquely human, often situational, and culturally linked. We all know that no two people will always agree on any matter of opinion or attitude. No two people "pick the same" (in Kolsky's words), even when it comes to a classification as seemingly simple as positive/negative/neutral/mixed sentiment polarity, 100% of the time. Scientific studies and practical tests I've seen suggest that people agree at a 80%-90% rate when it comes to sentiment classification.

Given measured rates of human-human non-agreement, and with the age of intelligent(-seeming) machines looming, is "Did the computer pick the same a human would've picked?" -- which human? -- the only, or even the best, accuracy criterion? Surely there's much to be learned in comparing, or even working from the consensus of, different machine methods, in contrasting and compiling machine-machine results.

Further, implicit in Kolsky's analysis, in my reading, is an incomplete understanding of "accuracy." Any accuracy definition that looks primarily at precision -- in this context the same as "correctness" -- just one of three components of accuracy, is incomplete.

Kolsky focuses on the task of determining the sentiment of "a specific word or combination of words," on "the computer's perception that a tweet or blog post has positive or negative inclination." His definition would cover very discrete tasks adequately -- taking the SAT, reading single blogs or tweets -- but competitive businesses can not afford over-focused insularity. They must concern themselves with a huge swathe of social and news media. So what of the other two components of information retrieval-analysis accuracy, recall and relevance?

"Recall" is the proportion of pertinent material that is retrieved. On the recall front, there's no contest: machines can operate 24/7, they can parse material in and across multiple human languages (where no one person can handle more than a handful), and they can sift through vast volumes of material very quickly. The machines win hands-down.

As for relevance, well, I won't argue that machines perform better than humans in rank-ordering lists to respond to differing business or other criteria. I will argue, however, that machines can outperform humans in discovering obscure or even hidden relationships in large volumes of data. This ability is what data mining is all about: fitting models to data for predictive purposes. Those models may be hard to understand -- they lack explanatory transparency -- but we use them nonetheless because they work. Relationships are key to social-network analysis, as are measure-driven model for quantities such as impact, velocity, and authority. These quantities may factor into relevance. And relevance matters -- alongside precision and recall -- to a complete accuracy picture.

Finally, Kolsky's focus on the link between accuracy (however defined) and the business bottom line is spot-on. He recommends removing biases from analytics, improving accuracy, and looking at multiple customer-data sources and cross-referencing them. These are important steps that can quantifiably contribute to meeting cost, efficiency, profitability, satisfaction, and other business goals. Accuracy in analytics does indeed matter for businesses.


If you'd like to further explore information retrieval and analytics methods and applications, consider attending the 6th annual Text Analytics Summit, slated for May 25-26 in Boston. I'll reprise my role as chair and teach a pre-summit Introduction to Text Analytics the afternoon of May 24.For information retrieval and analytics purposes, we need a broad definition of "accuracy."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
Commentary
If DevOps Is So Awesome, Why Is Your Initiative Failing?
Guest Commentary, Guest Commentary,  12/2/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll