Data Debate

Security or privacy? Darpa's Total Information Awareness program tests the boundaries

John Foley, Editor, InformationWeek

March 4, 2003

6 Min Read

More details about one of the world's most ambitious--and controversial--database projects will be released this week when the Defense Advanced Research Projects Agency issues a report to Congress on its Total Information Awareness program. The initiative has been a point of contention for privacy groups and others worried that Darpa or other government agencies will collect personal information on U.S. citizens in a national security effort to thwart terrorists through sophisticated data-analysis and collaboration techniques. If Darpa can overcome the privacy concerns, some experts say, its bleeding-edge prototype system just might work. "We're at the cusp of actually being able to do it," says data-warehousing consultant Bill Inmon, citing advances in storage, database, and microprocessor technologies, accompanied by continuing declines in hardware costs, that make such a large-scale undertaking conceivable.

Total Information Awareness represents a data-management challenge as bold as it is controversial. The goal, as Darpa's Information Awareness Office (headed by John Poindexter, former national security adviser in the Reagan administration) envisions it, is to sift through vast quantities of data contained in government and business databases, detect suspicious patterns of activity, identify the shady characters behind those actions, then find them before they can do any harm. The five-year research and development project would test the limits of database integration and scalability and require breakthroughs in language translation, pattern matching, and agency-to-agency collaboration. "It is anticipated this will require revolutionary new technology," Darpa said last year in a document soliciting bids from potential suppliers.

How would it work? According to public comments made by Darpa officials and information on the agency's Web site, the goal is to create an architecture capable of building a huge database that gets populated, with the help of automated processes, from existing databases. A database crawler would be used to recognize the data structures of source databases, facilitating the flow of data. Data-mining algorithms would be run against the data collected and models generated in an attempt to predict terrorist behavior. Collaboration tools would link experts together for quick action when a terrorist's "signature" is suspected.

"One goal is to develop ways of treating the worldwide, distributed legacy databases as if they were one centralized database," Poindexter said last year. Total Information Awareness was launched in fiscal 2003 with $10 million in initial funding, but related projects predate the terrorist attacks of Sept. 11, 2001. Darpa will spend an estimated $240 million on the combined projects from 2001 through 2003. Prototypes of the system would be turned over to other Department of Defense agencies for adoption.

Illustration by Michael Morgenstern
While the project relies heavily on experimental technologies, such as "awareness-enabled coordination software" for intelligence analysts to be developed by Telcordia Technologies Inc., Darpa is also borrowing technologies and best practices from the business world. Documents obtained in February by the nonprofit Electronic Privacy Information Center under the Freedom of Information Act show that Darpa plans to use off-the-shelf collaboration software from Groove Networks Inc., instant messaging, Microsoft's Office applications on the desktop, Web services to tie data and applications together, an information portal for users, and a services-based architecture to hold everything together. The agency is soliciting the help of some two dozen developers, integrators, and researchers, including Booz Allen Hamilton, Lockheed Martin Information Systems, Raytheon, SAIC (and its Hicks & Associates subsidiary), and Xerox's Palo Alto Research Center.

Work done for Darpa has found its way into Groove's commercial products, Helfrich says.

"There are definite parallels" between Total Information Awareness and business-technology infrastructures, says Michael Helfrich, Groove's VP of applied technology. Darpa latched onto Groove's technology early. When Groove disclosed the first shipments of its software in the spring of 2001, Poindexter was quoted in the press release and hinted at a "full-scale project" to come. Darpa uses Groove software, which joins teams of people in a secure, shared application over the Web, to "swarm" experts together after upstream technologies have detected patterns in data that may be of concern. (Groove's software is being supplied to Darpa by other vendors.)

Darpa has tasked Veridian Corp. with implementing technologies that support cross-organizational teams of intelligence analysts developing terrorist-threat models and response scenarios. The technologies that will come into play include Veridian software, data objects based on XML's Policy Markup Language, software agents that link Groove work spaces with server-based data and services, Web-services standards, and IBM's Cynefin knowledge-management framework.

Work done on behalf of Darpa is already finding its way into commercial products. Groove's support for Web-services standards and "smart" agents, both done at the behest of Darpa, are available to Groove's business customers, Helfrich says. And Xerox's PARC is developing a privacy appliance for Total Information Awareness that, if all goes as planned, will eventually lead to a commercial product, researcher Teresa Lunt says. The device will make it harder for intelligence agencies or other government users to get personally identifiable information--names or addresses, for example--from private-sector databases.

Photo of Michael Helfrich by Jason Grow
Illustration by Michael Morgenstern
Before any of that happens, though, Darpa will have to convince Congress it can pull off the project without trampling civil liberties. Congress passed a law earlier this year that gave Darpa three months to submit a report detailing how funds would be spent on the project, provide an R&D schedule, and explain how Darpa intends to deal with privacy implications. Mihir Kshirsagar, an analyst with the Electronic Privacy Information Center, which opposes the Total Information Awareness project, says part of the concern is that any security layers used by Darpa to protect data "can always be stripped away."

Darpa has tried in recent months to diffuse concerns. The project "is not an attempt to build a 'supercomputer' to snoop into the private lives or track the everyday activities of American citizens," the agency writes in a Q&A posted on its Web site. It adds, "All TIA research complies with all privacy laws, without exception."

Darpa researchers reached a milestone last month when they completed the first set of test data to be used in Total Information Awareness. Speaking at a conference on data privacy, Lt. Col. Doug Dyer, a Darpa program manager, described the test data as "synthetic," or artificial in nature. Darpa also has indicated it could use public-domain data from the media in its experimental system and says that real-world intelligence data from other government agencies, such as the FBI, might eventually be used, too.

The prototype won't scan "irrelevant" personal information about Americans, such as medical records, Dyer says, but it might consider records of over-the-counter drug purchases, which could indicate planning of a bioterrorist attack. Tests so far have resulted in "a large number of false positives," he says.

False positives, and how intelligence agencies might respond to them, are among the things that worry some people about the project, setting the backdrop for this week's report to Congress.

-- with Aaron Ricadela

Illustration by Michael Morgenstern

Read more about:

20032003

About the Author(s)

John Foley

Editor, InformationWeek

John Foley is director, strategic communications, for Oracle Corp. and a former editor of InformationWeek Government.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights