Experts: Government Data Unseen Online
By Swartz, Nikki
During a December U.S. Senate panel, experts urged lawmakers to make the content of government databases more accessible to the public by linking them to mainstream search engines such as Google, Yahoo, and Microsoft. Currently, government websites such as USA.gov enable online users to search within the site for specific information, but their content is not always accessible via search engines, making the information in government databases hard to find. For example, a Google search for “small farm loans” brings up commercial offers for loans and government loan statistics, but does not include data on federal government programs designed to help fund small farmers, Ari Schwartz, deputy director of the Center for Democracy and Technology (CDT), told the Senate Committee on Homeland Security and Governmental Affairs. Similarly, a search for “New York radiation” does not find basic FEMA and Department of Homeland security information about current conditions and monitoring.
“Unquestionably, the E-Government Act has changed the way that the public interacts with the government,” Schwartz said. “Unfortunately, despite the availability of an easy technological fix, many key governmental information sources remain ‘hidden in plain sight,’ from the very search engines that the public is most likely to use.”
In December, CDT and OMB (Office of Management and Budget) Watch released a joint report on the availability of government data on commercial search engines. “Hiding in Plain Sight: Why Important Government Information Cannot Be Found Through Commercial Search Engines,” (www.cdt.org/righttoknmv/search/Searchability.pdf) reveals that vital government information seems “invisible” to millions of Americans who are combing the Internet and looking for answers via the most popular search engines. According to the report: “Many federal agencies operate websites that are simply not configured to enable access through popular search engines. These websites don’t allow search engines to ‘crawl’ them, an industry term for indexing online content, and sometimes even block sites from being found by search engines.”
Part of the problem is the sheer volume of data produced by the government, said lohn Lewis Needham, manager of public sector content partnerships at Google, who told the committee of data buried on some 2,000 federal sites. “It is hard to disseminate efficiently.”
One solution, Needham said, is to adopt Sitemap Protocol, a technical standard developed by Google that helps a website owner list, or map, all the agency’s web pages and database records. That data map can then be read and indexed by search engines. Implementation is free and takes, at most, a few days, he said.
In 2006, Google, Microsoft, and Yahoo jointly announced their support for the standard, which also has been deployed by at least six state governments. According to Needham, Google has already implemented Sitemap for several agencies, including the Office of Scientific & Technical Information, the Government Accountability Office, Library of Congress, the National Agricultural Library, the National Archives and Records Administration, and GovBenefits.gov.
Recently, the Office of Personal Management announced that it would make 60,000 job vacancies in its database available to commercial search engines, according to a PC Magazine report.
Sen. Joe Lieberman (I-Conn.), who chairs the committee, introduced a bill (S. 2321) in November 2007 that would extend the E- Government Act of 2002 for five more years and require government agencies to link their databases to commercial search engines within one year. The bill also would enable members of Congress to post Congressional Research Service reports on their websites, which would be free to the public.
In the meantime, the CDT/OMB Watch report suggests ways federal agencies can help ensure content is more searchable, including:
* Adopting an information policy that makes public accessibility of online content and resources a priority
* Creating sitemaps of content on their sites, with special attention given to materials stored in databases and accessible only through drop-down menus
* Ensuring that robots.txt files are used in the least restrictive way possible.
Copyright ARMA International Mar/Apr 2008
(c) 2008 Information Management Journal. Provided by ProQuest Information and Learning. All rights Reserved.
