Data visualization is undergoing a revolution, making complex data sets easier to understand and helping both experienced and inexperienced analysts form better conclusions and takeaways from those numbers.
A notable side effect of increased capabilities for data visualization is a push toward more complex modes of data collection and processing; if we’re able to understand complex data sets without needing substantial training or experience, we can apply those data processing standards to more areas.
Enter high dimensional analytics
In the era of big data, we’ve been able to collect and store more data points than ever before. Rather than relying on simple bits of information about key demographics and behaviors, we have access to hundreds, and sometimes thousands of variables related to a given problem or outcome. For example, in medical research fields, characteristics include genetic predispositions, lifestyle factors, and demographic information may all play a role in whether a patient develops a condition (and how they respond to treatment). Each of these hundreds of variables may interact with any of the other variables, making it impossible to do a simple correlational analysis in variable pairs or triplets.
It's difficult to imagine anything in more than three dimensions, but for computers, it’s relatively easy. In physics and computer science, mathematical models can be used to make calculations in higher dimensions, sometimes hundreds of dimensions, allowing us to crunch the numbers and uncover patterns. There’s only one significant obstacle to making this practical: visualizing the results.
Visualizing high dimensions
The simplest model of data visualization is also the first one most of us are introduced to: the bar graph, in which one set of variables is plotted on the horizontal x-axis, and another is plotted on the vertical y-axis. This is highly effective, but only extends to two dimensions of data.
Researchers have developed multiple techniques to push the limits of what we can visualize, and most of them focus on reducing the number of presentable dimensions, in some way, to three or four. It’s exceedingly difficult for humans to think conceptually in dimensions beyond what we’re familiar with (three spatial dimensions and one time-like dimension), so the solution is to find a way to efficiently translate high-dimensional findings into those dimensions. Sometimes, that means using analytics to filter out the “noise” within the variables, reducing them to only what’s most important. Other times, that means clustering variables together.
So how do three- and four-dimensional projections work? In three dimensions, you can add a third axis, perpendicular to both x and y, known as the z-axis, to turn your graph into a three-dimensional representation. Virtual systems allow for more in-depth interaction with these projections, especially when you layer in elements of augmented reality, allowing participants to see individual data points in a three-dimensional cross section the way they might see fish in an aquarium. If you use the progression of time to layer in a fourth dimension, you can introduce even more complexity.
As an illustrative practical example, Google developers have used high dimensional analytics and visualization experimentally to “teach” a computer the meaning of language. Rather than giving the system any information about how words relate to each other, researchers “fed” it millions of examples of writing, and the system started mapping relationships in high dimensions to associate different types of words with one another. Researchers then used simplified three-dimensional models to visualize different areas of its findings, realizing it had successfully grouped words of similar meanings. For example, words that describe colors were grouped together, and words that describe numbers were grouped together.
Challenges for high dimensional data visualization
Before you get too excited about being able to “see” how your customers change over time, or how productive your employees are, you should know there are some key limits and challenges for high dimensional data visualization:
Still, high dimensional data is our greatest asset in learning from data sets with hundreds of variables (or more). Once we learn to visualize it effectively, we’ll be able to intuit conclusions far easier and more naturally.
The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio