Aster nCluster Builds on Open Source PostgreSQL - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
7/15/2008
04:18 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Aster nCluster Builds on Open Source PostgreSQL

I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. OS's greatest BI/DW contribution to date has instead been in enabling developers. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, here's why —

I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. I've long seen that OS's greatest BI/DW has instead been in enabling developers to build BI into line-of-business applications and create specialized analytical tools. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, the launch of Aster nCluster supports my point.NCluster starts with PostgreSQL. According to Mayank Bawa, CEO and co-founder of Aster Data Systems, nCluster uses PostgreSQL as a data store on each node of a hardware cluster. Aster-built distributed database technology coordinates the nodes to deliver shared-nothing, parallelized database processing (MPP). According to Bawa, nCluster relies on "a series of patent-pending algorithms and processes that optimize the placement, partitioning, balancing, replication, and querying across a cluster of intelligent nodes." Bawa calls PostgreSQL "a very stable foundation/abstraction on which we build our algorithms."

PostgreSQL is, of course, a free-standing, open-source RDBMS. As I wrote back in June, a variety of organizations have taken advantage of its hyperfree, "Do with me what you will," BSD open-source license to, variously, build it up and strip it down. On the one hand, we have EnterpriseDB, whose aim seems to be to deliver better PostgreSQL than PostgreSQL.org does in the form of an enterprise-ready distribution with a set of integrated, open- and closed-source extensions. On the other, we have companies such as ParAccel, Netezza, and Greenplum that have taken those portions of the source code they need and stripped out the rest, building out from those PostgreSQL components into robust solutions for large-scale data warehousing. Those latter two companies have company in Dataupia and Truviso, and more power to 'em.

I asked Aster what differentiates nCluster from more established MPP systems such as Greenplum's, which also runs on commodity hardware. CEO Mayank Bawa replied that "nCluster is different in that it efficiently optimizes network bandwidth for distributed analytics." I can't say his elaboration was satisfying, but here's more —

If you look at the reference architectures of several alternatives, you will see that many tend to emphasize $/TB of disk (by using nodes with a large number of disks), at the expense of ... key metrics that relate to query performance and analytics. In contrast, the Aster nCluster achieves a much higher ratio of processing power and memory to disk, which is enabled by our network optimizations. With a more efficient network, we are able to spread our work across more nodes, which keeps those query performance ratios much more attractive.

Bawa pointed me for technical detail to a blog write-up by David Cheriton, an Aster investor, who leads the Distributed Systems Group at Stanford University.

Aster lists MySpace as a production customer with a 100-node cluster hosting over 100 TB of data with a terabyte of data added each day. The company claims other, not yet announced paying customers that include advertising networks, recommendation engines, and other social-networking companies.

Not every OS-reliant data warehousing vendor will succeed as a free-standing company. I guarantee we'll see vendor consolidation in the next year, even as new entrants emerge. Nonetheless, nCluster is yet more proof of the enormous value PostgreSQL — not even considering open-source MySQL, MonetDB, LucidDB, and Ingres — has to offer the data warehousing world.I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. OS's greatest BI/DW contribution to date has instead been in enabling developers. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, here's why —

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Slideshows
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Commentary
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
News
What's Next: AI and Data Trends for 2020 and Beyond
Jessica Davis, Senior Editor, Enterprise Apps,  12/30/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll