Trulia transformed its data stack to accommodate real-time high-volume data collection, and to provide customers with recommendations. Now, to gain greater reliability and elasticity, the organization plans to migrate its data operations to AWS.

Jessica Davis, Senior Editor

December 13, 2016

4 Min Read
<p align="left">(Image: Andy Dean Photography/Shutterstock)</p>

Digital transformations are at top of mind for many traditional enterprises today as they look to replicate some of the practices that have made upstarts like Uber and Netflix successful.

Real estate web site Trulia isn't that old by corporate standards. Founded in 2004, Trulia could be considered a digital native, since its primary presence has always been on the web. But a lot has changed since 2004. For instance, the iPhone was introduced just three years after Trulia was founded. And the widespread use of mobile phones, apps, and the consumer behavioral information they generate, has created both headaches and opportunities for data engineers, IT organizations, and business analysts.

So when Deep Varma joined Trulia as VP of Data Engineering about 2 and a half years ago (the same year that Zillow announced the acquisition of Trulia) Varma's charter was to transform Trulia to be more data-driven -- to use that data to be proactive rather than defensive.

"Our goal remains the same -- how do we provide an amazing experience to our consumer," he said. "With more consumer growth and more engagement of consumers, we were collecting so much more data." Varma brought years of experience at companies like Yahoo, IBM, and a host of startups, to the job.

The ingredients to make that data-driven transformation were already in place, Varma told InformationWeek in an interview. The company had started using big data technology about 6 years ago, including Hadoop and Java.

[Trulia parent Zillow explained its data stack at a recent Strata + Hadoop event. Read Zillow Uses Analytics, Machine Learning To Disrupt With Data.]

But Trulia's new strategy to use that data proactively meant going beyond just providing a real estate search service to consumers. Varma's team wanted to provide more of a recommendation engine, giving consumers the personalized results they wanted before those consumers had even searched for them.

"We have built our own recommender system which surfaces listings to the consumer," Varma told InformationWeek. "We have built click-through models, which helps us measure their success."

Specifically, Trulia is looking to leverage AI and computer vision "to provide unique insights to consumers at their fingertips," Varma said.

In this case, computer vision refers to the effort to train computers to think and act like human beings when it comes to visual information, Varma said. It includes image recommendation systems. To help get Trulia to this corporate vision, Varma augmented Trulia's existing stack of big data technologies.

Since Varma joined Trulia, his team of over 70 data engineers, data scientists, software engineers, and DevOps pros have introduced the use of Apache Kafka, Apache Spark, and microbatching for real-time processing. The organization also uses noSQL databases such as Redis and Apache Solr for search. The team has also transitioned from Python to Cython, which enables writing C extensions for Python. The team also implemented SQL search engine, Presto, and has migrated processing from CPUs (central processing units) to GPUs (graphics processing units), Varma said.

Around the time when Varma joined, Trulia created its own collocated data center. But in 2016, the company embarked on another big change -- migrating to AWS. Varma said Trulia is looking to gain the scalability, reliability, elasticity and innovation that comes with a cloud-based system.  That means moving the operation of the data center away from Trulia's internal IT and into an infrastructure-as-a-service outsourcing scenario.

"We want to keep innovating faster rather than depending on operational people," Varma said.  The goal is to eventually move the entire data engineering operation to Amazon's cloud, but that will take time, he said.

"I believe for a while we will have a hybrid solution," he said. "These are not simple systems when you have millions of consumers."

The biggest challenges in transforming to a data-driven business has been scaling the systems, Varma said. Other challenges have been aligning the teams to move in the right direction, building the personalization platform, and thinking from a consumer point of view.

"I think 2016 has been an amazing foundational year," Varma said. "Our challenges will be greater in 2017 when we are going to make sure our personalization platform extends our footprint across all product development."

Varma's plan for the future of the product includes augmented reality, too, all part of the goal of "making our consumer's experience amazing."

About the Author(s)

Jessica Davis

Senior Editor

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: @jessicadavis.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights