ParAccel announced top TPC-H benchmark performance numbers with Sun at the end of October, beating out the former leaders in both the price and price-performance. Not by a little, but by four times in performance with a big drop in cost. I haven't seen much discussion of these results.
The fact that a little startup like ParAccel can enter the market with a database to support business intelligence that beats the TPC-H results of all the major vendors on both performance and price should wake people up. Particularly when the performance increase is so large while significantly decreasing cost.What's different about ParAccell's database? They're using a columnar data store rather than the row-oriented data storage model most vendors use. This results in significant IO reductions and allows for more effective use of compression. Their model is similar to (but not the same as) Sybase IQ. The TPC-H numbers demonstrate a difference pretty clearly.
It's surprising that there aren't more columnar storage engines out there, particularly since this is not visible to users yet has such a significant performance advantage for query workloads. People always ask about Vertica whenever I mention "columnar database." Vertica has been floundering around for quite a while now with very little to show in the way of accomplishments.
The other interesting element in ParAccel is that they run in a shared-nothing configuration, which most in the VLDB arena agree is the only way to scale to very large data volumes. It also makes scaling more cost-effective, which is why appliance vendors like Netezza, new database vendors like Greenplum and the old guard at Teradata are all running shared nothing architectures (these are all row-oriented data stores)
ParAccel is offering three modes of deployment - a straight database, a virtual appliance, and a preconfigured hardware appliance (though I don't have a feel for how "appliancey" their offering is).
Appliances are helping to overcome the resistance to alternative databases. Demand for BI performance is exceeding the ability of traditional platforms to keep pace. The problem with one standard database for both transaction and analysis workloads is the constantly rising data volumes and users repeating the mantra of "faster queries."
Enterprise IT has been trying to consolidate database vendors for years, but data warehousing workloads add complexity to the traditional database model. Over time, the database connection and SQL standards have improved, along with database manageability, to make having multiple databases less of a concern in IT.
We already understand that different schema designs are required. The special requirements that BI and analytics bring to the database are leading people to the realization that different database platforms can make sense.
The TPC-H announcement by ParAccel shows that a different database is viable, just like Netezza and Datallegro did in the appliance space.
Personally, I'm not a big fan of TPC benchmarks because they don't relate well to real-world performance or configurations. However, they are useful for determining what databases or hardware-database combinations are in roughly the same class, and how realistic it is to run them at that configuration.
One big problem is the optimization for one of the two TPC-H metrics (performance or price-performance). Vendors run different configurations for these two numbers to get the best metric, which means the configuration that's best in performance may be completely unreasonable from a price perspective. For example, few people are going to blow $11 million on hardware for a 3 TB data warehouse configuration.
ParAccel and Sun's benchmarks were run at the 100GB, 300GB and 1TB scale factors and hold the top slot in all three. They did this with the same hardware configuration. That's unusual in the TPC-H, as is holding the top slots for both performance and price-performance.
I haven't heard from them about whether they will run the 3TB, 10TB or 30TB configurations. I suspect price-performance won't be as impressive at those scales because they may need to shift from internal storage to more costly external storage arrays and bump up the cost.
By the way, the Sun-ParAccel benchmark was run on Linux. Go penguins!
Mark Madsen is president of Third Nature, a consulting and research firm focused on business intelligence, data integration and data management. He is a principal author of Clickstream Data Warehousing and speaks about data warehousing and emerging technology. Write him at [email protected].ParAccel announced top TPC-H benchmark numbers with Sun at the end of October, beating out the former leaders in both the price and price-performance. Not by a little, but by four times in performance with a big drop in cost. The fact that a little startup like ParAccel can enter the market with a database to support BI that beats the TPC-H results of all the major vendors should wake people up.