The State of In-Database SAS Analytics - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
06:43 AM
Curt Monash
Curt Monash

The State of In-Database SAS Analytics

Which MPP DBMS vendors are supporting in-database SAS data mining and what's the big deal anyway? A SAS product manager I recently spoke with addressed these and other questions.

I routinely am briefed way in advance of products' introductions. For that reason and others, it can be hard for me to keep straight what's been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute's multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.However, I chatted briefly last week with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is:

  • On Teradata, SAS is shipping in-database scoring today. SAS also is shipping a limited amount of in-database modeling on Teradata, the count recently having gone up from 4 "procs" to 10.
  • On Netezza Twinfin(i), SAS is shipping in-database scoring, and this was recently announced. I can't actually find much evidence of this announcement by searching the Web or the SAS website, but Michelle was pretty clear on the point even so. Further confusing matters, SAS' website seems to say in-database scoring is supported on Netezza's old generation of products but not its latest one, even though SAS CTO Keith Collins told me exactly the opposite would be true.
  • On Aster Data nCluster, SAS will ship in-database scoring by the end of 2010. If I understood correctly, this will be for "limited" rather than "general" availability, but Michelle framed that as a distinction without a difference. I.e., if you want to buy in-database SAS scoring on Aster nCluster, you'll be able to.
  • (More) in-database SAS modeling is expected on all of Teradata, Netezza Twinfin(i), and Aster Data nCluster in the vague future. (The concept of 2011/2012 came into play.)
  • SAS/Teradata integration, developed first, involved more hand-coding. SAS has subsequently developed some kind of a more general parallelism/in-database capability, akin to what it has in the DBMS-less grid, that either is or isn't a good match for DBMS vendors' native way of supporting parallel processing. (Obviously, I'm still pretty unclear on this part.)
  • SAS technology is a good fit for Aster Data's MapReduce-centric way of doing parallelism.
I also took the opportunity to ask Michelle a question I've had a heck of a time getting answered: What's the big-deal about in-database data mining scoring anyway? After all, the most common form of in-database data mining scoring is just to take a weighted sum of specific fields in a row, where the weights are the regression coefficients. You can do that in generic SQL, with performance that superficially should be at least as good as that for any alternative strategy. Michelle's answers seemed to be twofold:
  • There are other kinds of scoring too -- neural networks, etc.
  • Coding the scoring in SQL isn't that easy. Michelle gave the example of a specific user (default Netezza reference account, with initials resembling mine) that spent 400 hours writing and testing something you now get for free with SAS/Netezza integration.

Edit: In response to this post, SAS wrote in with further clarification about in-database and/or MPP SAS.Which MPP DBMS vendors are supporting in-database SAS data mining and what's the big deal anyway? A SAS product manager I recently spoke with addressed these and other questions.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
10 Cyberattacks on the Rise During the Pandemic
Cynthia Harvey, Freelance Journalist, InformationWeek,  6/24/2020
IT Trade Shows Go Virtual: Your 2020 List of Events
Jessica Davis, Senior Editor, Enterprise Apps,  5/29/2020
Study: Cloud Migration Gaining Momentum
John Edwards, Technology Journalist & Author,  6/22/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Flash Poll