Search for Meaning - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
12:00 PM
Connect Directly

Search for Meaning

Beyond searching and hyperlinks, Web collaborations get to the heart of the matter: meaning.

I've seen suggestions that Google Analytics will kill established business intelligence (BI) vendors, and that Google Base content hosting will accelerate a trend Craigslist started and snuff newspaper classified advertising for good. But these and other new services are not that threatening (or promising, depending on your point of view). Google's hope is chiefly to sell more online ads. To do that, it has created services that expand beyond indexing and search into managing and adding value to content.

Where is added value most needed? It's too hard for information consumers to get the answers they want. Search is still dumb, and despite reams of research, dreams of a sophisticated Semantic Web — with a common syntax that will enable software-agent bots to communicate, book flights and otherwise do your bidding without human assistance — remain unrealized. Google,, flikr and others firms are responding to opportunities created by shortcomings of the first-generation Web. Google Analytics aims to complement and extend its money-making AdWords and AdSense, which match ads to searches and content. Google Base is less an innovative means of publishing and more a way to ensure personally published content will be found by searchers. They're part of a belated effort by a host of software and service providers to enhance usability and underpin a still-chaotic labyrinth with machine-processable meaning.

Individuals are willing participants in this effort. I'm fascinated by collaboratively authored content: by mash-ups that display geolocated user data on maps, by tagging that attaches keywords to everything from blogs to photos to user pages on social-networking sites, by Wikis and, especially, Wikipedias that collect knowledge by consensus rather than by fiat. These collaborations create interconnectedness that goes beyond what's possible with hyperlinks and relevance greater than you'll find in algorithmically ranked lists of search results. They present a sense of the Web as a whole greater than the sum of a few million servers.

These collaborations fill the gaps left by conventional authoring tools and search. Text mining is supposed to bridge that content-meaning gap, and the articles I've written on the topic prove I'm a big fan. Your choice of search engine will help you find those articles, but only if you search on "text mining" and my name, pick through the hits returned and give the promising-looking articles a quick read. Sorry, software that can grok value-laden concepts such as being "a big fan" — software that identifies and extracts and weighs opinions and offers up highlights, TiVo style — isn't ready for prime time.

Forty years after Joseph Weizenbaum demonstrated natural-language conversation with the ELIZA computer program, and half a century since Alan Turing posed his famous test of artificial intelligence, figures I've seen suggest that a well-tuned text-mining system will give you 85- to 90-percent accuracy — B+ marks, and that at high cost. The theory is that a combination of linguistic and statistical analysis and machine learning will go where no machine has gone before. Yet Turing's statement, "We can only see a short distance ahead, but we can see plenty there that needs to be done," remains true.

Collaboratively authored, networked and manually tagged content are a user-driven response to search shortcomings, and they conveniently provide enterprises grist for the information mill. "Total information awareness" was a Defense Department dream that's now an enterprise imperative. Enterprises most need and can best afford part-way-there solutions like monitoring news and user-generated content and then using text mining to extract sentiment. It behooves organizations to pursue these solutions because network effects mean that news and opinions travel farther and faster than ever before (following Metcalfe's Law that the value of a network increases as the square of the number of connected nodes). Quick response in the name of reputation management is mandatory.

Web creator Tim Berners-Lee saw that the second-generation Web would be bound by semantic interoperability. Poor usability and findability have fed the demand for machine-exploitable meaning. That meaning is being created from the bottom-up, by text-mining and content hosting and by end-user collaborations such as mash-ups and Wikis, tagging and linking: by analytics and by intention.

Seth Grimes is a principal of Alta Plana Corp., a Washington, D.C.-based consultancy specializing in large-scale analytic computing systems. Write to him at [email protected].

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

11 Things IT Professionals Wish They Knew Earlier in Their Careers
Lisa Morgan, Freelance Writer,  4/6/2021
Time to Shift Your Job Search Out of Neutral
Jessica Davis, Senior Editor, Enterprise Apps,  3/31/2021
Does Identity Hinder Hybrid-Cloud and Multi-Cloud Adoption?
Joao-Pierre S. Ruth, Senior Writer,  4/1/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Successful Strategies for Digital Transformation
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll