Talend Takes on High-Volume Data Integration - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management

Talend Takes on High-Volume Data Integration

MPx suite incorporates MapReduce architecture and parallel processing to handle up to one million records per second.

It's not every company that needs to handle data integration at speeds of up to one million records per second. But there are more than a few telcos, financial services firms, retailers, medical researchers and others that handle super-high volumes of data every day. Enter Talend Integration Suite MPx, a new, highly scalable data integration product incorporating MapReduce architecture and massively parallel processing to extract, transform and load vast data sets within tight time constraints.

Based on the open-source Talend Integration Suite, MPx adds two sets of features to support extreme scalability. First, Talend's FileScale technology is said to use a MapReduce architecture to perform arithmetic functions and sort, filter, merge, aggregate and transform data with optimized performance on supported hardware platforms.

"FileScale lets you take advantage of the entire [hardware] stack -- multi-CPU architectures and multicore processors -- to execute extremely fast operations on data sets," says Yves de Montcheuil, Talend's vice president of marketing. "The product also uses MapReduce, which is the technology Google uses to process Internet search queries very rapidly. The sorting, aggregation, calculation and transformation of data [during integration] are not that different than the processing Google does on Web page indexes."

MPx is also said to employ multiple levels of massive parallelization to break down data sets into many parallel-processing streams while also exploiting parallel database loaders.

"Once you've processed the data extremely quickly and need to load into, say, Teradata or Oracle, MPx lets you take advantage of their multithreaded loaders," de Montcheuil explains.

MPx will compete with high-end data-integration vendors including Ab Initio, with its Co>Operating System, and IBM, which offers DataStage PX. In contrast to MPx, which can process integrations developed on the standard Talend platform, DataStage PX is not compatible with conventional DataStage integration routines, de Montcheuil asserts.

MPx-supported hardware platforms include 32-bit and 64-bit Windows servers, Solaris and OpenSolaris (SPARC and Intel x86), IBM AIX, HP-UX and 32-bit and 64-bit Linux servers. Benchmark tests performed on a high-end but far-from-exotic Sun Blade X6270 server featuring two Xeon 5520 quad-core processors at 2.26 GHz and 24 GB of RAM reportedly yielded impressive performance levels. Sorting, aggregation and averaging speeds ranged from 200,000 to 400,000 records per second when accessing data from disk and up to one million records per second when processing data in memory.

"These speeds were achieved on a single server with a dual CPU, and it was done with standard MPx software that was not fine-tuned for the data processed," de Montcheuil points out.

There are more than 250,000 users of vendor's open-source integration software and 500 licensed corporate customers (with thousands of users) of the commercially supported software, according to Talend.

Talend Integration Suite MPx is available immediately and is said to cost about $100,000 for a typical deployment. Pricing depends on the number of users.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Can Low Code Measure Up to Tomorrow's Programming Demands?
Joao-Pierre S. Ruth, Senior Writer,  11/16/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Flash Poll