BI on Content Feeds, a.k.a. Continuous (Twitter) Transformation - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
12/8/2008
11:13 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

BI on Content Feeds, a.k.a. Continuous (Twitter) Transformation

The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.

The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. I wrote in September on a leading-edge implementation at Thomson Reuters. But twitter messaging is both faster and, given social-network mediation, more focused: instant messaging text gone public. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.CEP is complex-event processing, in-memory analysis of data and event streams via continuous queries. CEP is an uncomfortable category for some vendors, who would prefer to focus on applications or distinctive capabilities. SQLstream's forte is continuous data integration and transformation; heavy-duty analytics would be done against a data warehouse. SQLstream's open-source underpinnings, close links with BI vendor Pentaho, and integration with the open-source Mondrian relational OLAP tool are other distinguishing elements. (I'll write more on these points below, after the screenshots.) The Mondrian integration in particular allows SQLstream to support real-time OLAP by feeding stream-computed aggregates to a backend data warehouse with as-needed invalidation of the Mondrian cache. Lastly, the company has worked to stay true to standards such as SQL-2003, SQL/MED for access to external databases, and XMI, and to use freely available tools where possible such as the Eclipse platform.

Continuous transformation is about replacing batch ETL processing with on-the-fly data acquisition, aggregation/processing, and action/DW-loading. Continuous transformation reduces latency, the lag between data arrival and availability for analysis. Business logic is programmed with SQL — SQLstream uses views to construct a processing pipeline and INSERT INTO, extended for streams, to export data — with C or Java coded adapters for data input and output and user-defined functions (UDFs) and transformations (UDXes). And that's where twitter content acquisition comes into play.

SQLstream certainly isn't unique in the ability to harvest RSS and Atom feeds, but so far it's the only tool I've seen that consume twitter messages, via an API. (I wrote a bit on twitter-BI recently.) Here's a screenshot that shows it in action, followed by one that shows the SQL table-like definition of a twitter feed and the SQL to work from it. Click on each image for a larger version.

Three content feeds in a SQLstream studio interface (built on Eclipse):SQLstream screenshot
A stream defined like a table, using SQL/MED for twitter-data definition:SQLstream screenshot
A continuous-transformation query, filtering on a regular-expression/keyword match:SQLstream screenshot

SQLstream is only one example of a CEP(ish) tool that consumes content feeds. I asked folks on the CEP-Interest list to tell me about other examples. Siva Kumar Tangudu let me know that "Gnip makes it easy to consume content feeds. It supports twitter, digg, delicious, etc."

Marco Seiriö replied, "A year ago or so I did a demo on RuleML 2007 where we showed ruleCore processing a feed from flickr. We had this rule which triggered when there were a large number of photos posted in any area. Then we lit up a marker on a Google map to show that there is currently high posting activity in that area. The idea was to show off our location aware event processing and show how we could use location from a feed of geotagged images to trigger rules."

And Alexandre Vasseur wrote me about use of the Esper open-source CEP engine by DataComplex "to power their SaaS based offering on the Amazon EC2 cloud; I think now have a twitter feed support," which I have not been able to verify.

But back to SQLstream and its Mondrian and Pentaho connections. One of the company leads is Julian Hyde, who created Mondrian. I first "met" him four years ago when I was researching an article on open-source BI. Julian subsequently signed-on with Pentaho, but as a part-timer. He also helped create Eigenbase, "an extensible open-source platform for building specialized data management systems in a wide variety of application spaces." SQLstream is built on Eigenbase, as is LucidDB, the open-source DBMS that back-ends the LucidEra SaaS BI platform. There's a lot to like and admire in this work.

SQLstream and other, similar products are turning content feeds into BI. Those feeds are now one more type of source that enterprises can and should consider in the quest for competitive advantage.The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
Strategies You Need to Make Digital Transformation Work
Joao-Pierre S. Ruth, Senior Writer,  11/25/2019
Commentary
Enterprise Guide to Data Privacy
Cathleen Gagne, Managing Editor, InformationWeek,  11/22/2019
News
Watch Out: 7 Digital Disruptions for IT Leaders
Jessica Davis, Senior Editor, Enterprise Apps,  11/18/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Getting Started With Emerging Technologies
Looking to help your enterprise IT team ease the stress of putting new/emerging technologies such as AI, machine learning and IoT to work for their organizations? There are a few ways to get off on the right foot. In this report we share some expert advice on how to approach some of these seemingly daunting tech challenges.
Slideshows
Flash Poll