The GATE Way to Open Source Text Analytics

Hamish Cunningham is benevolent dictator of the GATE team, "researching human language computation." Their work is realized in a highly capable, open-source, text-analysis platform, the General Architecture for Text Engineering. Hamish's replies to my questions regarding Text Analytics Opportunities and Challenges for 2010 hold many insights about text analytics and open source...

Seth Grimes, Contributor

February 18, 2010

4 Min Read

As part of a recent solution-provider survey, I posed the question, "What do you see as the 3 (or fewer) most important text-analytics technology, solution, or market challenges [or opportunities] in 2010?" to Hamish Cunningham, Research Professor of Internet Computing at the University of Sheffield (UK). Hamish is benevolent dictator of the GATE team, "researching human language computation." Their work is realized in a highly capable, open-source, text-analysis platform, the General Architecture for Text Engineering. I've used it myself!

Hamish is effectively GATE's CEO. While GATE work is funded in part by a number of sponsors and partners, Hamish is not beholden to VCs or shareholders and is decidedly uncorporate, that is, he'll tell you what he really thinks and what he's up to, openly. Check out his blog, Computing Text. One thing he's doing is shifting the GATE team to focus on users and support, by nurturing the GATE community and via a number of carefully conceived industry alliances. (Disclosure: I am a paid consultant to a GATE partner, Matrixware, a Vienna based services firm that is working with the Univ. of Sheffield and Bulgarian semantic-technologies developer Ontotext to build GATE into a commercially friendly product suite. My enthusiasm for GATE led to the consulting assignment, not the reverse.)

Hamish's reply to my Text Analytics Opportunities and Challenges for 2010 question didn't comfortably fit the model I had in mind for my article, but it's full of insight all the same. Here's what Hamish had to say on GATE and text-analytics futures:

A decade ago, at the start of the naughties, the majority of text-analysis systems came from research labs. At the start of the tennies, we can look back on an explosive growth of startups in the area, followed by acquisitions and consolidation, and latterly the arrival of a healthy market supporting a variety of commercial offerings. The drivers of this expansion included:

  • The Web. (Yawn! I routinely skip the first paragraph of the papers in my field these days, as they all start by making this point.) More recently and more interestingly, social networking.

  • Cost cutting. Replacing costly market research departments with cheap(er) text-mining departments has become the basis of a whole family of text analysis products that mine the "voice of the customer."

So, what prospects for the tennies? More growth, partly because the drivers that arose in the naughties are still with us, but also because of two new factors:

  • Maturity of open source text mining. Replacing costly proprietary software licenses with open source is a trend which we've seen in many other areas. A big sticking point for text analytics hitherto has been the lack of a Red Hat or a Canonical to provide enterprise-level support and training, but that's changing now as more companies sign up to support open source, better training programs become available and so on.

  • Fear and trembling at the data suppliers. The pressure from Google in this area is relentless ("We'll give away the data that you sell to drive use of our tools"). The data suppliers can see that they have to offer a higher level of service in order to hold onto their customer base, and text mining is pretty much the only game in town. So, for example, Thompson buys ClearForest, sinks large amounts of resource into "Open" Calais, etc. etc.

In GATE's case we're also now seeing faster growth in demand that we attribute to our repositioning in 2009. We have new commercial partners who have funded a raft of new features, we have new products to complement the long-standing developer-oriented offering, and a new training and certification program. The tennies look like being a busy decade.

I asked Hamish if he could quantify the increase in interest that he mentioned. His response:

It seems to be something like 3 new commercial-walkins per week instead of the previous rate of 1 a month (though today [January 26] is only Tuesday and we've had several already this week...) A week ago I did this summary (confidential) of the week's new contacts:

  • (top-3 insurance corporation), looking for GATE support with SLA

  • (major US IT contractor), an existing ClearForest customer, looking for GATE training

  • (SME), a startup doing sentiment stuff for marketing, looking for GATE Teamware

  • (big IT corporation), leading CAD supplier, looking for terminology extraction for translation

The week after was similarly productive so the new message seems to be working.

Hamish added final thoughts yesterday:

The interest rate shows no signs of slowing BTW; and various other positive indicators have come my way. It really seems like text analysis and semantics are taking off! Strange. I feel like asking people what's wrong with them... but then I've been waiting for this for 15 years.

Good things come to those who wait (and work to make them happen)!Hamish Cunningham is benevolent dictator of the GATE team, "researching human language computation." Their work is realized in a highly capable, open-source, text-analysis platform, the General Architecture for Text Engineering. Hamish's replies to my questions regarding Text Analytics Opportunities and Challenges for 2010 hold many insights about text analytics and open source...

Read more about:

20102010

About the Author(s)

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights