The Largest Data Warehouse In The World? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Enterprise Applications
04:43 PM

The Largest Data Warehouse In The World?

A project of the Windber Research Institute to combine clinical information with volumes of scientific data about genes and proteins will collect as much as 50 terabytes of data every nine months.

Windber Research Institute, an advanced biomedical research facility, is assembling a massive data warehouse combining clinical information with volumes of scientific data about genes and proteins to help understand the cause of--and find a cure for--breast cancer, human reproductive cancers, and cardiovascular disease.

The data warehouse is an ambitious effort to pull clinical and scientific data into a single system, giving researchers an unprecedented opportunity to study the relationship between genes, proteins, and disease. The database will collect as much as 50 terabytes of data every nine months and over time could become the largest data warehouse in the world.

"No one has put all this information onto a single database platform," says Dr. Richard Somiari, chief operating officer and chief scientific officer at the institute, based in Windber, Pa. The system is based on data-warehouse hardware and software from NCR Corp.'s Teradata division. Details about the project are being disclosed this week at Teradata's user conference in Seattle.

Clinical data, from patients at the Windber Medical Center with which the research institute is affiliated, already has been loaded into the data warehouse. That includes data from tissue biopsies (each of which adds 166 Mbytes of data to the system), family histories, radiology (including X-ray images) and histopathology data, and patient DNA, RNA, and protein information.

The next step will be to add data from other research databases, including DNA data from GenBank, protein data from the Swiss-Prot database in Europe, metabolic pathway data from Kyoto University's KEGG (Kyoto encyclopedia of genes and genomes) database, and protein-protein interaction data from the DIP (database of interactive proteins) database at UCLA.

Linking this basic research data with clinical information will allow researchers at Windber to examine multiple variables when investigating the causes of disease, Somiari says. The goals are to develop new strategies for managing patient conditions, discover new "markers" that help doctors diagnose diseases much earlier, and ultimately develop cures for the diseases.

Windber chose the Teradata system because of its scalability and parallel processing capabilities, says Nick Jacobs, the institute's president and CEO. He adds that Windber sought the same kind of technology that Wal-Mart and other commercial companies use to build their own massive data warehouses. The system uses analysis tools from Amersham Biosciences, Genomax Technologies, and Spotfire. Partners in the project include the U.S. Army's Walter Reed Army Medical Center, universities such as the University of Pennsylvania and Creighton University, and research institutes in the U.S., Europe, and Japan.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll