Adding external data sources to your analytics and machine learning initiatives can provide new dimensions of insights. Here are some sources of data you can tap.

Jessica Davis, Senior Editor

April 12, 2019

4 Min Read
Image: vegefox.com - stock.adobe.com

Finding data for your analytics and machine learning initiatives has generally not been a problem for most organizations. Enterprise organizations collect data as an operational part of doing business. There are transactions, customer records, ERP, CRM, financials, human capital management, and more. Your organization has gathered metrics from web site visits and marketing email responses. There's plenty of data you already have that can fuel your data, analytics or machine learning initiatives.

But if you are using the same data sources you've always used, you may not be getting the range and dimensions of insights that could be available to you. It may be time to consider going beyond those traditional in-house sources to tap alternatives.

"Your largest sources of data aren't those you own," said Lydia Clougherty Jones, a research director in the Gartner data and analytics group, speaking at the recent Gartner Data and Analytics Summit in Orlando, Florida. "They are the ones that are out there in the data ecosystem."

There's a whole world of unexplored data sources out there, if you haven't yet made use of anything from outside your organization. A Gartner survey from about a year ago revealed that just under half of organizations were tapping external data sources.

Jones categorizes alternative data sources into seven different sources -- enterprise, dark, open, web, social, partner, and syndicated. Here's how she defines each of those.

Enterprise data is actually the kind of data you already have -- data about customers, suppliers, partners, and employees that is already readily accessible. This could be transactional data or manufacturing supply chain data.

Dark data is also data that is already available internally to your organization. This is data that was used for a single purpose and then forgotten about or archived. It includes emails, contracts, documents, multimedia, system logs, or other intellectual property.

"Parsing, tagging, linking, or otherwise structuring or extracting usable data from these sources can offer the greatest immediate opportunity," said Jones. A potential use case for using this data for analytics purposes is to help identify insurance fraud.

Open data is another alternative data source. Jones said governments have begun opening their data up to the public as a matter of principle or mandate. There's an estimated 10 million such data sets available worldwide. These data sets can include data about the economy, labor, the population, health and welfare, citizen services, infrastructure, and more. This data may also have commercial value, particularly if you combine it with your own data or other external data sources. A potential use case here is retail chains leveraging this data to determine the best locations for new stores.

Web data is data that you scrape from websites, often to track the activities of competitors, partners, suppliers, and others, Jones said. You may want to track a competitor's pricing, for instance, or keep track of their job postings. Jones noted that there is a growing marketplace of web content harvesting tools including Connotate, Mozenda, Kofax, Import.io, and DeiXTo.

Social media data is another growing source of data for organizations. This can include content from posts on Twitter, Facebook, LinkedIn, Instagram, Pinterest, YouTube, Reddit, blogs, review sites, and more. Organizations can use these sources to focus on consumer sentiment and trends and get a better sense of brand awareness and consumer engagement. Plus, they can monitor their reputation. Tools to help include Meltwater, Clarabridge, Synthesio, Brandwatch and Zoho, among others.

Partner data comes from suppliers and resellers and may include data about sales, inventory, capacity, forecasts, product or equipment specifications, and customers. Jones said that many companies give this data away already, but some are considering charging for it or using it to barter.

Syndicated data is the final source of the seven. This data comes from data brokers or marketplaces and could include consumer data, financial data, weather data, images, market intelligence, product master or reference data, and industry-specific data, according to Jones. She said there are thousands of data brokers now and the market for data exchanges is just getting started. These will connect buyers and sellers of proprietary data in the years to come.

Jones recommends that enterprises set up a practice for identifying data sources and procuring them. Such an organization inside the larger group can navigate legal questions, ownership, and rights. Plus, they will have the expertise to determine the value of the data.

"Identify the range of internal and external data sources of value to your organization," Jones said. "Explore the variety of potential use cases for these data sources."

Read more about data and analytics here:

How to Buy External Data to Fuel Analytics, AI Insights

CDOs Need To Change Their Firms, Then Change Jobs

The Future of AI in America: What All Leaders Should Consider

Planning a Trustworthy Citizen Data Science Initiative

About the Author(s)

Jessica Davis

Senior Editor

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: @jessicadavis.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights