Big Data Hot Job: Data Engineer - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
9/10/2014
12:10 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%

Big Data Hot Job: Data Engineer

Big data teams need data engineers as well as data scientists, says Trifacta's CTO. Here's how to distinguish between the two.

10 Big Data Online Courses
10 Big Data Online Courses
(Click image for larger view and slideshow.)

It's been said many times that the key to a successful big data strategy is to hire a team of data-savvy individuals, each with a particular skill set, rather than trying to find a single data scientist adept at multiple disciplines, including computer science, mathematics, and domain expertise.

Make sure one of your hires is a data engineer, said Sean Kandel, cofounder and chief technical officer of Trifacta, a big data software startup whose data transformation platform enables less technical users to quickly visualize and analyze large and varied data sets. Trifacta's business partners in the Hadoop space include Cloudera, Hortonworks, and Tableau Software.

In a phone interview with InformationWeek, Kandel explained how the role of data engineer is essential to the success of an organization's big data effort. To understand what a data engineer does, it's important to distinguish the position from that of data scientist.

"In many companies, the data engineer is responsible for setting up systems and processes that other data workers -- including in many cases data scientists -- need to use and rely on to be successful to work with data,"  said Kandel.

[Can your Excel-wielding staffers slice through data? See Analytics For All, No Data Scientists Needed.]

A lot of the data engineer's work is focused on building out systems, architectures, and platforms. 

"The data engineer will look at [ways] to take insights and operationalize them so that you can have day-to-day impacts on your business," Kandel said.

He added: "In a lot of organizations, data engineers are oftentimes responsible for finding data that's relevant for analysis... in a way that's meaningful and suitable for that specific task." In addition, they're in charge of integrating data from a variety of sources.

Sean Kandel
Sean Kandel

Data scientists often have engineering backgrounds, too, but their work is generally geared toward discovering new insights or building models. A data scientist sometimes fills the role of data engineer as well, although that approach may not deliver the best ROI.  

On a data science team, however, individual roles aren't always set in stone. Team members may perform duties based on their individual skills, background, availability, and other factors.

"A lot of times it's fairly fluid," Kandel said. "You see teams of people working together" rather than performing rigidly defined tasks.

Trifacta's data transformation tools are designed to help simplify the data engineer's job of culling relevant data from a number of different sources, he added.

"A lot of times that's still done today through writing code and scripting languages like Pig, Hive, or Python. Our tools enable data engineers to quickly perform those types of data transformations in a much more visual, graphical interface, while still getting all of the benefits, such as scale, that you'd get by writing code by hand."

Then again, a data scientist may be handling these tasks instead.

"In some organizations you'll see data scientists perform things that might be done [elsewhere] by data engineers," Kandel said. "But when you look across an organization -- at all of the different use cases and how quickly use cases for data are popping up -- it really requires a dedicated team or role that's focused on enabling multiple end users to work with data quickly."

Do you need a deeper leadership bench? Send your most promising leaders to our InformationWeek Leadership Summit, Sept. 30 in New York City, for a day of peer learning and strategic speakers.

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
pfretty
50%
50%
pfretty,
User Rank: Ninja
9/12/2014 | 10:59:34 AM
Data culture
When it comes down to it, the more organizations embrace and leverage data to enhance operations and improve customer experiences (top two goals according to an IDG SAS survey), the more important it is to spread the focus and knowledge around to all the information users. I think there are benefits to having data scientists and data engineers, but where data is actually applied is far closer to the front line. These are the decision makers who need the skills and tools the most.

Peter Fretty

 
Gigi3
100%
0%
Gigi3,
User Rank: Ninja
9/11/2014 | 1:07:12 AM
Re: Puffed-up title or truly different skills?
"I've seen some cases of people just upgrading their titles without upgrading their skills. But people who have truly graduated to data engineer can manage Hadoop clusters, handle data processing on the platform, and identify, move and, perhaps, cleanse and normalize subsets of data of interest for deeper analysis by the data scientists."

Doug, in career growth skill updation is more important than title. Titles imply only about responsibility and it can get changed at any instant. Moreover job can be executed through reporting subordinates. But skills matter and important as long as they are with industry.
Gigi3
100%
0%
Gigi3,
User Rank: Ninja
9/11/2014 | 1:03:02 AM
Data Scientist
"It's been said many times that the key to a successful big data strategy is to hire a team of data-savvy individuals, each with a particular skill set, rather than trying to find a single data scientist adept at multiple disciplines, including computer science, mathematics, and domain expertise."

Jeff, that's a good option. So peoples with different skill set can work together for a single goal.
Laurianne
50%
50%
Laurianne,
User Rank: Author
9/10/2014 | 4:24:59 PM
Re: Puffed-up title or truly different skills?
Self-upgrading of titles happens when a field is hot.
Doug Henschen
50%
50%
Doug Henschen,
User Rank: Moderator
9/10/2014 | 12:43:38 PM
Puffed-up title or truly different skills?
In the legacy enterprise realm this data engineer/data scientist split would be akin to the data-management types (DBAs and ETL/data-integration professionals) versus the data analysts and analytics professionals (handling BI and data mining, respectively).

I've seen some cases of people just upgrading their titles without upgrading their skills. But people who have truly graduated to data engineer can manage Hadoop clusters, handle data processing on the platform, and identify, move and, perhaps, cleanse and normalize subsets of data of interest for deeper analysis by the data scientists. Data scientists, meanwhile, can write algos and develop data-driven applications from scratch whereas old-school data miners are more likely familiar with SAS, SPSS and perhaps R-based algorithms that can be called and tested in building models on supported workbenches or studios.

The old skills are still in demand, but the new skills are much rarer and sought after by pioneering data-driven organizations exploiting varied and high-scale data types not stored in old-school data warehouses. 
Slideshows
Reflections on Tech in 2019
James M. Connolly, Editorial Director, InformationWeek and Network Computing,  12/9/2019
Slideshows
What Digital Transformation Is (And Isn't)
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/4/2019
Commentary
Watch Out for New Barriers to Faster Software Development
Lisa Morgan, Freelance Writer,  12/3/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll