Semantic search is like porn: I'm pretty sure I'll know it when I see it. So when semantic search upstart Truevert came by for a visit, I got all googly (I think I might have even screamed "yahoo"). The Truevert system, powered by OrcaTec's discovery toolkit, is narrowly defined around green, but it's definitely an eye-opening, fresh approach to an elusive problem.Here is Part 1 of our video discussion with Truevert, including a demonstration of the technology.
Here's Part 2, where we discuss competitors (namely Powerset, now owned by Microsoft) and the nature of other ontological approaches to semantic search.
To be fair, whenever I hear about the semantic Web, I think of a magic, omniscient elf scurrying around squillions of sites, assigning meaning based on the context of, well, everything. So my expectations are high. But frankly, so is my disappointment with traditional search, even if it's changed how most of us view and use the Web. On the one hand, I don't want to become a concatenation expert, but neither do I want Aunt Millie's musings on managing her household budget when I search Google for microfinance. These seem to be my only two choices for better results.
OrcaTec co-founder Herbert Roitblat began by saying that ontology, often thought of as the way toward a semantic Web, is flawed. (He also began by saying that Google's page rank is a popularity contest.) There are lots of ways to categorize and almost no agreement, and the people designing these schemas are not the same people looking for the information.
Even if you were precise in your search terms on a normal search engine, Roitblat summarized, you're really narrowing by exclusion rather than precision. If you enter Green Toilets in Yahoo in an attempt to find more energy-efficient commodes, you would, instead, find avocado or sea-foam green colored toilets.
A true semantic-based approach trusts a context, rather than a categorization. OrcaTec started Truevert with a more vertical approach, namely "green." So everything gets searched through that filter. It uses Yahoo BOSS to gather a Web search, but it then re-ranks the results based on its own language model derived from understanding the association and context of words from 6,000 green-tagged documents in Delicious (which it can do on a mere laptop in less than 15 minutes). Google's terms of service, Roitblat says, don't allow re-ranking of pages the way Truevert does it.
Roitblat says the company chose green because it wanted to start out doing some good, but also because it's a category people can easily understand. The approach can be applied to any vertical using the same approach. You could even apply it to enterprise content management, given that most corporations have their own jargon -- you just train the engine on the documents that you index.
You also can imagine that if you can get more precise in your search results, a decent amount of ad revenue, in the form of better matching, might result.
Truevert competes with a growing list of other new players, like Hakia, Powerset, and Thomson/Reuters Calais. Microsoft recently purchased Powerset. I haven't talked with any of these companies. Yet. I'm sure they'll find me.