Nvidia CEO Teases ‘Revolutionary’ AI PDF Reader Release

As data-hungry artificial intelligence continues to place high demands on enterprises, Nvidia will soon release a reader that makes the ubiquitous PDF file format easier for digital ingestion, a development experts say could be an AI game changer.

Shane Snider, Senior Writer, InformationWeek

October 24, 2024

3 Min Read
Nvidia CEO Jenson Huang runs out on stage to address the crowd at the Gartner IT Symposium/Xpo 2024.Photo by Shane Snider

Nvidia CEO Jensen Huang slyly dropped a big piece of news during his keynote Tuesday at Gartner’s IT Symposium/Xpo. The GPU-maker plans to release an AI-enabled PDF (portable document format) reader.

“Nvidia is just about to announce a revolutionary PDF reader … PDF, as you know, is really, really difficult to understand for an AI … so there’s a whole bunch of things you can do there,” Huang said during the talk. A Nvidia spokesperson declined to comment beyond Jensen’s statement.

Adobe created the proprietary PDF format in 1993 and in 2008, it became an open standard. The format was meant to create a universal electronic document format, and it was quickly adopted by many users. Everyone from students, researchers, institutions, businesses, and governments likely has a huge stockpile of PDF documents.

In 2015, Adobe estimated that there could be as many as 2.5 trillion PDF files in existence.

That’s a lot of data that could be very useful for machine learning and AI. While tools exist that can extract images and text from the format, AI has been unable to directly scrape information from PDFs. If Nvidia has accomplished that goal, the implications for institutions and businesses could be monumental and could be a massive leap forward for data science and machine learning.

Related:How AI Is Changing Political Campaigns

For a company, synthesizing data is a crucial step toward AI adoption and mountains of usable data could be bound up in PDF format.

The Value of PDF to CDOs

PDFs are currently categorized as unstructured data, which is not immediately usable for use cases like business analytics. Enterprises are currently racing to synthesize both structured and unstructured data for multiple uses as they adopt GenAI.

While it’s possible for tools like GPT-4, large language models in general have difficultly synthesizing PDFs without major mistakes and hallucinations. There are techniques to extract the data, but they are time consuming and labor intensive. An effective and speedy solution for PDF synthesis would save major cost and time in using this unstructured data.

Allison Sagraves, the former CDO of M&T Bank who now runs her own consultancy, sees tremendous value in advancements in AI’s capabilities to read PDF files.

“We’re on the verge of a ‘Let there be light’ moment in the world of data,” she tells InformationWeek in an email interview. “The most valuable information we possess is often ‘dark data’ -- the vast, untapped insights hidden within documents, contracts filings, and financial statements. This overlooked data has the power to reshape how we understand everything, from customer behavior to market risks and emerging opportunities. Until now, much of it has been in the shadows.”

Related:How to Find and Train Internal AI Talent

Other companies have made gains in unstructured data. Salesforce recently released a product to sift thought unstructured data, through its newly launched Data Cloud Vector Base. But further advancements could push faster results that lead to a better return on investment.

“With cutting-edge AI, we’re beginning to shed light on these obscured insights,” Sagraves says. “Nvidia is at the forefront, developing sophisticated models capable of drawing connections across entire document ecosystems The potential impact is profound … The capabilities we’ve long dreamed of -- like true hyper-personalization at scale -- are now withing reach.”

Beyond Enterprise

While the implications of a truly revolutionary advancement in unstructured data capabilities are exciting to enterprises, the impact on humanity at large shouldn’t be understated, experts say.

Disha Harjani, a GenAI consultant, formerly with Shutterstock and Adobe, says a new technology what would easily read PDFs would be beneficial to businesses and the general public alike.

"This can lead to more technology that could ingest large text documents and parse them out by header, footer, body, main point of the paper, and more,” she tells InformationWeek in an email interview. “This could help build products that can 'templatize' various types of writing projects or even products for industries that have enormous historical archives, such as medical and anthropological."

Related:Defining an AI Governance Policy

About the Author

Shane Snider

Senior Writer, InformationWeek

Shane Snider is a veteran journalist with more than 20 years of industry experience. He started his career as a general assignment reporter and has covered government, business, education, technology and much more. He was a reporter for the Triangle Business Journal, Raleigh News and Observer and most recently a tech reporter for CRN. He was also a top wedding photographer for many years, traveling across the country and around the world. He lives in Raleigh with his wife and two children.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights