Graph-Based AI Enters the Enterprise Mainstream

Machine Learning & AI

Graph AI is becoming fundamental to anti-fraud, sentiment monitoring, market segmentation, and other applications where complex patterns must be rapidly identified.

James Kobielus, Tech Analyst, Consultant and Author

February 16, 2021

8 Min Read

Image: DIgilife - stock.adobe.com

Artificial intelligence (AI) is one of the most ambitious, amorphous, and comprehensive visions in the history of automated information systems.

Fundamentally, AI’s core approach is to model intelligence -- or represent knowledge -- so that it can be executed algorithmically in general-purpose or specialized computing architectures. AI developers typically build applications through an iterative process of constructing and testing knowledge-representation models to optimize them for specific outcomes.

AI’s advances move in broad historical waves of innovation, and we’re on the cusp of yet another. Starting in the late 1950s, the first generation of AI was predominantly anchored in deterministic rules for a limited range of expert systems applications in well-defined solution domains. In the early years of this century, AI’s next generation came to the forefront, grounded in statistical models -- especially machine learning (ML) and deep learning (DL) -- that infer intelligence from correlations, anomalies, and other patterns in complex data sets.

Graph data is a key pillar of the post-pandemic “new normal”

Building on but not replacing these first two waves, AI’s future focuses on graph modeling. Graphs encode intelligence in the form of models that describe the linked contexts within which intelligent decisions are executed. They can illuminate the shifting relationships among users, nodes, applications, edge devices and other entities.

Graph-shaped data forms the backbone of our “new normal” existence. Graph-shaped business problems encompass any scenario in which one is more concerned with relationships among entities than with the entities in isolation. Graph modeling is best suited to complex relationships that are flattened, federated, and distributed, rather than hierarchically patterned.

Graph AI is becoming fundamental to anti-fraud, influence analysis, sentiment monitoring, market segmentation, engagement optimization, and other applications where complex patterns must be rapidly identified.

We find applications of graph-based AI anywhere there are data sets that are intricately connected and context-sensitive. Common examples include:

Mobility data, for which graphs can map the “intelligent edge” of shifting relationships among linked users, devices, apps, and distributed resources;
Social network data, for which graphs can illuminate connections among people, groups, and other shared content and resources;
Customer transaction data, for which graphs can show interactions between customers and items for the purpose of recommending products of interest, as well as detect shifting influence patterns among families, friends, and other affinity groups;
Network and system log data, for which connections between source and destination IP addresses are best visualized and processed as graph structures, making this technology very useful for anti-fraud, intrusion detection, and other cybersecurity applications;
Enterprise content management data, for which semantic graphs and associated metadata can capture and manage knowledge among distributed virtual teams;
Scientific data, for which graphs can represent the physical laws, molecular structures, biochemical interactions, metallurgic properties, and other patterns to be used in engineering intelligent and adaptive robotics;
The Internet of Things (IoT), for which graphs can describe how the “things” themselves -- such as sensor-equipped endpoints for consumer, industrial, and other uses -- are configured in nonhierarchical grids of incredible complexity.

Graph AI is coming fast to enterprise data analytics

Graphs enable great expressiveness in modeling, but also entail considerable computational complexity and resource consumption. We’re seeing more enterprise data analytics environments that are designed and optimized to support extreme-scale graph analysis.

Graph databases are a key pillar of this new order. They provide APIs, languages, and other tools that facilitate the modeling, querying, and writing of graph-based data relationships. And they have been coming into enterprise cloud architecture over the past two to three years, especially since AWS launched Neptune and Microsoft Azure launched Cosmos DB, respectively, each of which introduced graph-based data analytics to their cloud customer bases.

Riding on the adoption of graph databases, graph neural networks (GNN) are an emerging approach that leverages statistical algorithms to process graph-shaped data sets. Nevertheless, GNNs are not entirely new, from an R&D standpoint. Research in this area has been ongoing since the early ‘90s, focused on fundamental data science applications in natural language processing and other fields with complex, recursive, branching data structures.

GNNs are not to be confused with the computational graphs, sometimes known as “tensors,” of which ML/DL algorithms are composed. In a fascinating trend under which AI is helping to build AI, ML/DL tools such as neural architecture search and reinforcement learning are increasingly being used to optimize computational graphs for deployment on edge devices and other target platforms. Indeed, it’s probably a matter of time before GNNs are themselves used to optimize GNNs’ structures, weights, and hyperparameters in order to drive more accurate, speedy, and efficient inferencing over graph data.

In the new cloud-to-edge world, AI platforms will increasingly be engineered for GNN workloads that are massively parallel, distributed, in-memory, and real-time. Already, GNNs are driving some powerful commercial applications.

For example, Alibaba has deployed GNNs to automate product recommendations and personalized searches in its e-commerce platform. Apple, Amazon, Twitter, and other tech firms apply ML/DL to knowledge graph data for question answering and semantic search. Google’s PageRank models facilitate contextual relevance searches across collections of linked webpages that are modeled as graphs. And Google’s DeepMind unit is using GNNs to enable computer vision applications to predict what will happen over an extended time given a few frames of a video scene, without needing to code the laws of physics.

A key recent milestone in the mainstreaming of GNNs was AWS’ December 2020 release of Neptune ML. This new cloud service automates modeling, training, and deployment of artificial neural networks on graph-shaped data sets. It automatically selects and trains the best ML model for the workload, enabling developers to expedite the generation of ML-based predictions on graph data. Sparing developers from needing to have ML expertise, Neptune ML supports easy development of inferencing models for classifying and predicting nodes and links in graph-shaped data.

Neptune ML is designed to accelerate GNN workloads while achieving high predictive accuracy, even when processing graph data sets incorporating billions of relationships. It uses Deep Graph Library (DGL), an open-source library that AWS launched in December 2019 in conjunction with its SageMaker data-science pipeline cloud platform. First released on Github in December 2018, the DGL is a Python open source library for fast modeling, training, and evaluation of GNNs on graph-shaped datasets.

When using Neptune ML, AWS customers pay only for cloud resources used, such as the Amazon SageMaker data science platform, Amazon Neptune graph database, Amazon CloudWatch application and infrastructure monitoring tool, and Amazon S3 cloud storage service.

Graph AI will demand an increasing share of cloud computing resources

Graph analysis is still outside the core scope of traditional analytic databases and even beyond the ability of many Hadoop and NoSQL databases. Graph databases are a young but potentially huge segment of enterprise big data analytics architectures.

However, that doesn't mean you have to acquire a new database in order to do graph analysis. You can, to varying degrees, execute graph models on a wide range of existing enterprise databases. That’s an important reason why enterprises can begin to play with GNNs now without having to shift right away to an all-new cloud computing or database architecture. Or they can trial AWS’ Neptune ML and other GNN solutions that we expect other cloud computing powerhouses to roll out this year.

If you’re a developer of traditional ML/DL, GNNs can be an exciting but challenging new approach to work in. Fortunately, ongoing advances in network architectures, parallel computation, and optimization techniques, as evidenced by AWS’ evolution of its Neptune offerings, are bringing GNNs more fully into the enterprise cloud AI mainstream.

Over the coming two to three years, GNNs will become a standard feature of most enterprise AI frameworks and DevOps pipelines. Bear in mind, though, that as graph-based AI is adopted by enterprises everywhere for their most challenging initiatives, it will prove to be a resource hog par excellence.

GNNs already operate at a massive scale. Depending on the amount of data, the complexity of models, and the range of applications, GNNs can easily become huge consumers of processing, storage, I/O bandwidth, and other big-data platform resources. If you're driving the results of graph processing into real-time applications, such as anti-fraud, you’ll need an end-to-end low-latency graph database.

GNN sizes are sure to grow by leaps and bounds. That’s because enterprise graph AI initiatives will undoubtedly become increasingly complex, the range of graph data sources will continually expand, workloads will jump by orders of magnitude, and low-latency requirements will become more stringent.

If you’re serious about evolving your enterprise AI into the age of graphs, you’re going to need to scale your cloud computing environment on every front. Before long, it will become common for GNNs to execute graphs consisting of trillions of nodes and edges. All-in-memory massively parallel graph-database architectures will be de rigeur for graph AI applications. Cloud database architectures will evolve to enable faster, more efficient discovery, processing, querying, and analysis of an ever-widening range of graph data types and formats.

Conceivably, as quantum AI platforms gain adoption in this decade, GNNs could become their showcase applications.

Follow up with these articles by James Kobielus:

Predictions for AI in 2021

Predicting How Biden Might Tackle Tech Policy

The Upside to Deepfake Technology