I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla's home base of SLAC, which used to stand for "Stanford Linear Accelerator Center"). Things I learned included that eBay's 6.5-petabyte Greenplum database has turned into a >10-petabyte Teradata database, which will grow 2.5 x further in size...

Curt Monash, Contributor

October 6, 2010

3 Min Read

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla's home base of SLAC, which used to stand for "Stanford Linear Accelerator Center"). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included:

  • eBay has thrown out Greenplum. (Edit: Oliver Ratzesberger responds that the "thrown out" part "could not be further from the truth. The answer to a casual question over lunch was: Do you still use vendor XYZ? And my response was a simple 'No.' ... we have simply selected a different vendor for V2 or our Singularity project... " See comments here for more detail.) eBay's 6.5 petabyte Greenplum database has turned into a >10 petabyte Teradata database, which will grow 2.5 x further in size soon.

    • Specifically, Oliver told me there are 8 petabytes of spinning disk, with 80% compression. So that's 40 petabytes before you multiply by a reducing factor to cover mirroring, temp space, and so on. My low end for that factor would be 25-28%; my high end would be 35-40%; either way, we're talking about >10 petabytes of true user data.

    • The 8 petabytes of spinning disk are headed to 20 petabytes next year.

    • Oliver gave the impression that Greenplum got thrown out more for reliability reasons than performance. (While eBay saw a major performance difference between Teradata and Greenplum, Oliver previously indicated he was inclined to attribute this more to specific Sun Thumper hardware/storage choices than to software.)

  • That database, called "Singularity," has some interesting aspects -- notably, a character field that's a string of name-value pairs -- on which you can do views and so on for virtual tables -- in a table that otherwise has dozens of conventional relational columns.

    • The system ingests log data in the form of lots and lots of name-value pairs.

    • The most commonly found ones go into columns in the usual way.

    • The rest are strung together into, well, a character string.

    • Teradata has developed some features for eBay that make it easier to index, query, etc. on that character string of name-value pairs.

    eBay's more EDW-like (Enterprise Data Warehouse) multi-petabyte Teradata database continues to grow, with the main system apparently up to 4.5 petabytes from the previous 2.5. I took the opportunity to ask what kinds of data marts (virtual or otherwise) were spun out in practice.In Oliver's ranking,#1 was derived data based on other data already in the data warehouse. #2 was other data within eBay that had never been put into the data warehouse in the first place. #3 was data truly from outside data.Todd Walter chimed in to point out that at other Teradata customers, who perhaps didn't have as fully fleshed out an EDW, #1 and #2 could be reversed. eBay sees Hadoop as an interesting tool for certain special purposes.eBay likes Hadoop for certain tasks such as image analysis. (Edit: And analysis of search results.) eBay doesn't like Hadoop for anything that requires data movement, such as a join. Similarly, eBay doesn't like HBase. eBay is enamored of the idea to do "social networking around analytics."This is something that has been built but not rolled out yet. It seems more focused on actual business intelligence than on the underlying data, unlike Greenplum Chorus, which seems more focused on the databases themselves. Since it hasn't been rolled out yet, we don't know which (if any) of activity streams, forums, or whatever will actually get significant adoption.

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla's home base of SLAC, which used to stand for "Stanford Linear Accelerator Center"). Things I learned included that eBay's 6.5-petabyte Greenplum database has turned into a >10-petabyte Teradata database, which will grow 2.5 x further in size...

About the Author(s)

Curt Monash

Contributor

Curt Monash has been an industry, product, and/or stock analyst since 1981, specializing in the areas of database management, application development tools, online services, and analytic technologies

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights