Data.gov Heads For Overhaul

The government is looking to make significant improvements to its open data repository, adding features for developers and consumers and prodding agencies to participate.

J. Nicholas Hoover, Senior Editor, InformationWeek Government

December 9, 2009

4 Min Read

In documents circulating among federal agencies and released to the public on Tuesday, the Office of Management and Budget has laid out plans to move Data.gov out of "beta" phase and into "government-wide execution," as federal CIO Vivek Kundra put it in an interview last week.

Released at the same time as the Obama administration's wider Open Government Directive, a memo and draft concept-of-operations document encourage agencies to post more data on Data.gov, with an eye toward ensuring data is machine-readable, high-quality, and useful while also protecting privacy and security interests.

"Our key principles focus on making sure that we democratize as much data as possible and that that data is targeted towards high-value datasets," Kundra said.

The White House has repeatedly held out the government data portal as a hallmark of its open government strategy. Until now, though, while broadly written policy missives from President Obama and the Office of Management and Budget have encouraged federal agencies to be more open, there's been little formal guidance on exactly how federal agencies should use Data.gov as a forum for their transparency.

In many ways, it shows. Though Data.gov now houses more than 110,000 individual datasets, almost all of those are geodata on administrative and political boundaries. Of the non-geodata raw data feeds, 411 of 728 are toxics release inventory datasets. To be fair, Data.gov also houses 353 data tools, many of which house tons of datasets themselves, but much of that data is locked up inside those tools in non-machine-readable formats. Many federal agencies which presumably have wealths of useful data, meanwhile, have posted very few datasets on Data.gov.

Now, however, OMB is setting a formalized policy, and has begun asking the public for its input via a non-government Web site powered by crowdsourcing platform IdeaScale.

The new formal policy on Data.gov isn't just some high-level guidance without any teeth, either. OMB plans to actually rate federal agencies on their participation, keeping track of qualitative and quantitative metrics on everything from the number of datasets published by each agency, to citizen ratings of that data, to how well agencies attach metadata to their datasets. OMB will rate itself, too, via usage metrics and measuring feedback.

Data.gov has already gone from 7 staffers working largely at the personal direction of Kundra to more than 200 points of contact across the federal government. The concept of operation further formalizes those roles, and instructs or encourages agencies to set up training, participate in Data.gov working groups to create best practices, and establish Data.gov "data stewards' advisory groups." Structured data from a number of other government data Web sites like USAspending.gov and FBO.gov will be integrated into Data.gov. In addition to the requirements of the forthcoming guidance, the Open Government Directive also requires federal agencies to publish at least three "high value" datasets on Data.gov by the end of January.

OMB recently launched a self-service data publishing tool for agencies called the Dataset Management System, and plans to overhaul Data.gov with new features, though the timeline is unclear. For users, it will add collaboration tools, more opportunity for feedback, and improved search. The search tool will be integrated with USASearch.gov and could include a number of ways to improve search usability via features like top queries and tag clouds. Data.gov will also include a new hierarchical topic structure, user tagging, and the ability to search within datasets for keywords. Eventually, OMB plans to add geospatial search capabilities and potentially a data visualization platform.

For agencies, it will add shared data hosting and metadata storage services, a performance tracking system, and potentially an audit tool to help agencies evaluate their data management practices and to integrate already public data into Data.gov. For developers, OMB will create multiple APIs and a way for developers to submit usage statistics and feedback to the government to help improve Data.gov.

Kundra also plans to begin issuing prize money to developers who have the best ideas. However, unlike some of the other early open government application development prizes, such as Apps for D.C., which paid out lump sums to prize winners only later to see developers move on to other things, the government will "move away from a one-time prize to a way of running operations," ostensibly by creating business relationships with the prize winners to continue making their tools more useful.

Finally, OMB plans to embark on a new pilot to put in place forward-thinking semantic Web features via semantic.data.gov, which will incorporate "semantically enabled techniques within the sites and the datasets" by leveraging cross-domain data models and "curating" that data a la Wolfram Alpha to help users create meaningful results and new, user-created data computations.

Read more about:

2009

About the Author(s)

J. Nicholas Hoover

Senior Editor, InformationWeek Government

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights