TDWI Selection Bias: It Depends Whom You Ask

The saying "There are three kinds of lies: lies, damned lies, and

Seth Grimes, Contributor

February 20, 2008

3 Min Read

The saying "There are three kinds of lies: lies, damned lies, and statistics" is attributed to Benjamin Disraeli, and it's nicely illustrated in a couple of Intelligent Enterprise reality-check photos from this week's TDWI conference.

Check out the TDWI Image Gallery photos posted by Intelligent Enterprise Editor-in-Chief Doug Henschen. Executive Summit attendees used the "dots method" to identify Important BI Technologies and Biggest BI Challenges. You get simple histograms showing what's hot and what's not. And you get clear illustrations of "selection bias": not TDWI's fault, but an effect to keep in mind when you assess formal and informal research findings.The Important BI Technologies results: 1 Dashboards/Scorecards; 2 Predictive Analytics; 3 Operational BI; and 4 BI Portals. My favorite topic, Text Analytics, is way down the list in a third tier. That's the reality check. (I hope this won't jeopardize the Text Analytics session planned for the August TDWI Executive Summit.) Open Source BI is even lower on the list.

These results aren't lies; no, they're statistics. Selection bias comes into play because a BI-themed session will attract folks grappling with technologies and solutions that are, well, in the solid BI mainstream. If you're out in front looking at text analytics, a BI-themed summit is probably not the place for you. Similarly, as I have reported, it's Java developers and not IT execs who are most interested in Open Source BI. It's a good guess that text-analytics types and OSBI types tend to self-select away from TDWI.

TDWI recognizes this effect. Philip Russom's 2007 report, BI Search and Text Analytics: New Additions to the BI Technology Stack, includes the explanation:

In an Internet survey conducted in late 2006, TDWI asked each respondent to estimate "the approximate percentages for structured, semi-structured, and unstructured data across your entire organization." Averaging the responses to the survey puts structured data in first place at 47%, trailed by unstructured (31%) and semi-structured data (22%). Even if we fold semi-structured data into the unstructured data category, the sum (53%) falls far short of the 80-85% mark claimed by other research organizations. The discrepancy is probably due to the fact that TDWI surveyed data management professionals who deal mostly with structured data and rarely with unstructured data. All survey populations have a bias, as this one does from daily exposure to structured data.

You'll find similar, similarly understandable examples of selection bias in the Biggest BI Challenges dots results, where #1 is "Gaining consensus on data definitions." Yup, that sounds like a BI/DW manager or exec speaking; the rest of us are stuck in our cubes, waiting for the execs to get back from TDWI in Vegas, keen to consensusize us.The saying "There are three kinds of lies: lies, damned lies, and statistics" is attributed to Benjamin Disraeli, and it's nicely illustrated in a couple of Intelligent Enterprise reality-check photos from this week's TDWI conference. Check out the TDWI Image Gallery photos for clear illustrations of "selection bias": not TDWI's fault, but an effect to keep in mind when you assess formal and informal research findings.

Read more about:

20082008

About the Author(s)

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights