October 3, 2005
Yahoo backs new digital book group
By Eric Auchard
SAN FRANCISCO (Reuters) - Yahoo Inc. is set on Monday to unveil a consortium that aims to make books, audio and video more easily accessible online, while addressing publishers' objections about the dangers to copyright.
The non-profit Internet Archive, libraries at the University of California and the University of Toronto and technology suppliers Hewlett-Packard Co. and Adobe Systems Inc. are among the founders of the group.
The organization, known as the Open Content Alliance (OCA), plans to create a unified storehouse of both public domain and copyrighted materials, hosted by the Internet Archive.
This potentially vast library would be searchable and freely available to anyone, whether individual Web surfers or commercial sites, its promoters said.
"The goal is to really spur the expansion of (books, audio and video) being made available online through this alliance," said David Mandelbrot, Yahoo's vice president of search content -- in charge of licensing the media featured on Yahoo's site.
The Yahoo-backed consortium poses a challenge to Google Inc., which has been working for the past year on an ambitious project to scan the contents of five of the world's great academic libraries to make the books freely available online -- unless copyright holders first object.
The Google and Yahoo projects are just the latest in a long list of projects to digitize academic collections. The pioneering Project Gutenberg, which scans literary works in the public domain, has been underway since the early 1970s.
The University of Pennsylvania has put some 20,000 books online at http://onlinebooks.library.upenn.edu/new.html/, while Cornell University offers a digital archive of 1,500 home economics books.
The San Francisco-based Internet Archive, together with the library of Carnegie Mellon University, is working on a project to make 1 million books available online. (http://www.archive.org/details/millionbooks).
AN UMBRELLA ORGANIZATION
Initially, several OCA members will work digitizing some 18,000 works of American literature that have been defined as the "canon collection' by the University of California.
These works -- which include many of the writings of Mark Twain, Henry James and Edgar Allen Poe, as examples -- will begin appearing on the OCA site by the end of this year, with the entire collection set to be online by the of next year.
The European Archive and the National Archive in Britain have also signed on as founders of the OCA, but are determining what material to contribute, Mandelbrot said.
High-tech publisher O'Reilly Media Inc. and the Prelinger Archives, a library of industrial films, are also taking part.
"What took them so long?" asked Tara Calishain, publisher of the Web search newsletter ResearchBuzz.com based in Raleigh, North Carolina. "There are already so many universities digitizing their own collections."
Calishain said the race by Google and Yahoo to outdo each other in supporting digital book projects reflects a growing focus on quality instead of quantity in Web search.
Which search site has the most unique content rather than the biggest database is what's now important, she said.
While similar in many respects, the OCA differs from Google by only accepting copyrighted material from publishers who "opt-in" to the program.
"We are only including copyrighted content with the express permission of the copyright holder," Mandelbrot said.
Google's program only excludes material from publishers who contact it to "opt-out" -- a policy that has drawn opposition from commercial publishers and led to a lawsuit by The Authors' Guild and several of its member authors.
Google was surprised to learn of the consortium. But in a statement released over the weekend, a spokesman said, "We welcome efforts to make information accessible to the world."
Yahoo's Mandelbrot said all materials stored on the OCA can be searched and used by any other site, including Google. "We would be very excited to have Google participate and contribute to the Open Alliance," he said.
Yahoo will supply its search technology for use on the OCA archive. It also plans to make the contents of the OCA digital media archive searchable through its own Yahoo search site.
Academic and commercial publishers praised the concept.
"The initiative seems to respect the rights of creators to determine how their works will be used, and this has been our concern and objective all along," Pat Schroeder, CEO of the Association of American Publishers, said in a phone interview.
Gary Price, an analyst with SearchEngineWatch, envisions a literature professor being able to build a custom search system of OCA's archive that would have students link only to the contents of specific books assigned by the professor.
"Not only is the material in the database open but also the database itself," said Price who was briefed on Yahoo's plan.