Good and Evil in the Garden of Digitization

By Koehler, Wallace

Google and Fair Use In a January 2008 Searcher article, Beth Ashmore and Jill Grogg discuss the Open Content Alliance (OCA) and’s book digitization projects. They point out that Amazon and OCA did not invent book digitization and acknowledge the dragon in the corner – Google Book Search. As important and interesting as all projects by Google are, Google, by virtue of its size and leverage and because of its digitization model, has assumed dinosaur proportions with Google Book Search. An interesting history of its activities and reactions to them appears in an article by Ken Auletta for a January 2008 issue of the New Yorker. Auletta’s interview with Eric Schmidt, Google’s CEO, is intriguing. Schmidt recognized his critics and acknowledged that Google would be challenged, in part, because of its size. Auletta concludes his article and Schmidt’s interview with a quote: “What kills a company is not competition but arrogance. We control our fate.”

Google practices have raised the occasional hackle here and there in the copyright and fair use domains. Even back in 2003 (see Olsen), questions were being asked as to whether Google’s caching policies represented copyright violations. In our book, Fundamentals of Information Studies, June Lester and I raise the question – as do many others – that Google’s book digitization project pushes the very limits of fair use (2007: 303-4). I strongly suspect that fair use doctrine is insufficiently elastic to withstand Google’s assault, at least as copyright and fair use are now understood. And in what some have seen as a quixotic exercise and others as a necessary challenge to “Googlepower,” Siva Vaidhyanathan weighed in against Google Book Search back when it was still called Google Print. He worried that a finding against Google could destabilize intellectual property concerns. He questioned Google and Google’s university and public library partners’ rights to exercise fair use so broadly.

Win, lose, or draw, court decisions about the Google Book Project could have a chilling effect on the very concept of retrospective digitization (First Monday 2007):

[W]hat I’m afraid of is that Google will certainly lose in court, and what will happen is courts will generate an indelicate view of fair use, a highly restricted view of fair use and will ultimately reign in a lot of future experiments. That’s problem number one and that’s the legal problem I have with Google’s experiment.

The Google book digitization project has caused something of fervor, perhaps even a fire storm in the realm of intellectual property management. This issue is not solely for lawyers and academics; it can touch all of us in the information professions. On the one hand, Google may well provide researchers, users, and readers with an ever widening and invaluable resource. I just downloaded Thomas Greenwood’s 1902 Edward Edwards. The “meatspace” copy of this particular PDF version comes from the University of Michigan. Thank you, Google. Go, Wolverines.

On the other hand, it also may mean that a single economic for- profit entity could gain effective centralized control over much of the world’s information. Google’s intent may be quasialtruistic today, but, in the absence of oversight and regulation, that intent could morph into an Orwellian vision.

Centralization of Knowledge

The Google project to copy, digitize, and render documents to the world in snippets, if copyrighted, or full-text, if public domain, is the most recent manifestation of a long-held desire to centralize knowledge. Denis Diderot and other French Encyclopedists of the 18th century and Paul Otlet and Henri La Fontaine in the early 20th century sought to develop what H. G. Wells called a “World Brain” in 1938. The urge traces back as far as the 1st century B.C. with the Library at Alexandria and echoes in the development of national libraries in the 18th and 19th centuries. Sir Thomas Bodley can perhaps be credited with the idea of the deposit of newly published books at a central library with the library he established at Oxford in the early 17th century. Sir Anthony Panizzi gave teeth to book deposit in the mid19th century. In the U.S., Thomas Jefferson revitalized and doubled the size of the Library of Congress after its destruction by the British in the War of 1812.

Of course, the Google project is not the first one to digitize books or other documents. Important services such as Westlaw and LexisNexis have provided digital access together with sophisticated indexing to legal and government documents for a quarter-century or more. Other online data vendors and database aggregators have provided similar services to a wide range of clients for almost half a century, each year with greater and more sophisticated service. These services, such as Dialog, Ovid, STN, and a growing array of specialized Thomson and OCLC products, all offer access to specialized literatures with sophisticated search and retrieval features at a price.

Books have been digitized and made available over the internet by a variety of producers. Perhaps first and best known of these is Project Gutenberg []. Drama EServer [http://] provides play scripts. Many academic libraries provide access to a variety of collections (e.g., University of Pennsylvania’s Online Books Page, http://online

The idea of digitizing “books” is, of course not original to Google. What is original to Google is the taking of “knowledge products,” in copyright without permission of the copyright holder. This taking is sometimes shrugged off because (a) copyright holders of so-called orphaned works may be difficult to identify or contact and (b) because identifiable copyright holders have the right to opt out of digitization. This taking, according to Google, is acceptable under fair use provisions because Google proposes to only make snippets garnered from the digitized collection based on the end user’s keyword query available to users.

Intellectual Property

Is Google’s argument valid? To answer that question, we need to turn to the law and to history. First question, what is the law on fair use? That depends in large part on where you are. Is Google a U.S. company? Will its takings from Oxford University be guided under U.K. law, U.S. law, international law?

The history of copyright is an interesting one. The first copyright law as such was the British Statute of Anne of 1710. Prior to the Statute of Anne, publishing was regulated by patents and licenses granted by the state. The first publication license was granted to a Venetian publisher of classical authors in 1469. Licenses had two primary purposes: censorship and legal deposit.

The Statute of Anne represented an important shift in European thinking on intellectual property. It individualized ownership of intellectual products. Before Gutenberg, copying was common practice. Historians of the book have shown that there was little regard for authorship; that scholars and other authors often “borrowed” quite liberally from the works of others, often with the aid of scissors and a paste pot. Miguel de Cervantes Saavedra was unprotected in early 17th century practice and law from others who sought to profit from his intellectual creation, Don Quixote de la Mancha. Two hundred years later, important literary figures such as Charles Dickens, Victor Hugo, and Edgar Allen Poe campaigned for bilateral and multilateral copyright protection. The U.S. and the U.K. did not enter into a bilateral copyright treaty until 1891. The U.S. did not become party to a multilateral copyright agreement until 1952. And it was not until 1989 that the U.S. became a party to the Berne Convention, which first came into effect in 1886.

The Law and Fair Use

Copyright coverage and its exceptions, such as fair use, are complex. Fair use as a term of practice has moved from little or no regulation over the use of the intellectual property of others to an interesting morass of sometimes conflicting ideas and jurisdictions. Today, fair use is defined in the U.S. by the U.S. Copyright Act of 1976 and by federal court decisions that have interpreted it. The Act provides a four-part test for fair use. Under section 107:

[T]he use made of a work in any particular case is a fair use the factors to be considered shall include

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

2. the nature of the copyrighted work;

3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

4. the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

In the U.S., defining case law includes Basic Books, Inc. v. Kinko’s Graphics Corp. (1991); Maxtone-Graham v. Burtchaell (1987); Encyclopaedia Britannica Educational Corp. v. Crooks (1982); and American Geophysical Union v. Texaco, Inc. (1994, 1995). The Basic Books case found that Kinko’s infringed copyright when it created student course packs without appropriate payment of royalties. In Burtchaell, the court found that extensive quotations from another work did not per se represent an unfair taking. The use of appropriately cited material was, in fact, a fair use of copyrighted material. In the Encyclopaedia Britannica Educational Corp. case, the court held that wholesale copying and distribution of educational television programming, even for purely educational purposes, was beyond fair use and an illegal use of copyrighted material. In the Texaco case, the court found that the making of multiple copies of journal articles for distribution to Texaco researchers from a single subscription was an unfair taking of copyrighted materials. The Purpose and Character test is an interesting one. It includes not only whether the fair use taking is for economic purposes but also why the taking was done. Thus Purpose and Character justifications include criticism, parody, and artistic expression. I believe that Google cannot claim that its purposes include criticism, parody, or artistic expression. And since it has an economic interest in the project (although, to Google’s credit, it has never included its standard advertising as part of Google Book Search results – yet), its purposes are not purely altruistic.

The Nature of the Copyrighted Work test is concerned with the motive behind the creation of the copyrighted work. It also addresses the kind of work taken. Scholarly publications may have, by their very nature, a different level of protection than entertainment. “Sweat of the brow” is also important. Is the original work merely a collection of facts or does it represent an interpretative effort? In the U.S., the Feist case requires some intellectual effort (U.S. Supreme Court [1991], Feist Publications v. Rural Telephone Services Co. 499 US 340 [1991]. 499 US 340). According to the Desktop Marketing rule in Australia, the mere effort of compiling a list (sweat of the brow) is sufficient to convey copyright protection (Australia, Federal Court of [2002]), Desktop Marketing Systems PtyLtdv Telstra Corporation Limited [2002] FCAFC 112 (15 May 2002).

As Google proposes to copy everything, the nature of the copyrighted work test would seem to work against them. The Amount Taken test is an argument that the taking of very small parts of a copyrighted work is fair use. Google proposes to take all and offer little. The taking of all is not a small part. Again, a problem?

The Market Impact test addresses the economic damage a fair use taking might have on the copyright holder. Under U.S. law, market impact can be claimed as a fair use consideration. If a taking adversely impacts the value of the infringed work, then an unfair use may result. American law differentiates between commercial and noncommercial takings. In “noncommercial” actions, the plaintiff (copyright holder) must demonstrate the damage. But in “commercial” action, the burden lies on the defendant. Google has a commercial interest in providing snippets of books. Though to what extent keeping sticky eyeballs sticking to Google constitutes a commercial interest, most likely only the Supreme Court will tell.

I Am Not a Lawyer

Here comes the standard disclaimer: I am not a lawyer. Both tradition and the law guide us in our understanding of copyright and fair use. The Google initiative will have far-reaching implications for the definition and use of intellectual property. In one sense, Google’s project is a welcome one and represents an interesting return to the treatment and taking of the works of others in 16th century Europe and in some parts of the world today. Remember that the U.S., the largest producer and exporter of intellectual property today, was not an enthusiastic participant in global intellectual property regulation until very recently.

Yet many of us are uncomfortable with the Google plan. It looks like an unfair taking of intellectual property. Google tells us that it will only serve up snippets. But we need to remember that Google is serving up snippets from the whole thing copied from the original. That “whole thing” may well be a copy of a work still in copyright. Though most of the library partners for Google Book Search limit participation to public domain content, some – e.g. the University of Michigan – have provided Google digitizers access to in-copyright material as well.

Second, and far more importantly, according to Tom Turvey (2006:1), partnership head for Google Book Search, Google tells us it wants to be the information provider for the world. Given its size and economic leverage, will Google become the only important repository of the world’s literature? What are the implications for information stewardship if Google were to be successfully hacked? Or could Google some day become the instrument of an Orwellian plot where the collection is edited at will to serve another purpose? What will become of the collection if Google were to fail? Recall the quote from Eric Schmidt at the beginning of this paper.

I’m of two minds on the Google project. I like having access to “the literature” online, though I still turn to Project Gutenberg for the classics. I like the ability to search documents. We can acquire most materials fairly easily by purchase – online booksellers have made this virtually painless. Libraries and services such as interlibrary loan are excellent but not immediate.

The Google project represents a new ripple for copyright practice and law, maybe a new tsunami. Laws and customs change as the needs of societies change. So perhaps the Google project is simply a giant nudge to the future. It has us thinking and discussing intellectual property with a new enthusiasm. Is the project legal? I don’t think so, but in the end it will be for legislatures and particularly courts in many places to work it out. Is it good for us as information professionals or as citizens? It has the potential of both good and evil. That is for us all to work out.

Doom and Gloom

In an editorial discussion about this article, Barbara Quint and I debated the likelihood that Google might fail or fall prey to some sinister happenstance. Perhaps, as she feels, these are unlikely in the third millennium. We must remember we are thinking in historical time. Many human institutions have come and gone or been redefined. Who is to say, except perhaps for Walter Miller (A Canticle for Leibowitz) how the human record will be maintained into the fourth millennium?


Beth Ashmore and Jill Grogg, “The Race to the Shelf Continues. The Open Content Alliance and,” Searcher, vol. 16, no. 1, January 2008, pp. 18-23, 55-56.

KenAuletta, “The Search Party: Google Squares Off With Its Capitol Hill Critics,” New Yorker, Jan. 14, 2008 [http://]. Accessed Jan. 18, 2008.

First Monday, “Siva Vaidhyanathan” First Monday Podcast Transcript September 2007 [ transcripts/transcripts_siva07.html]. Accessed Jan. 18, 2008.

June Lester and Wallace Koehler, Fundamentals of Information Studies: Understanding Information and Its Environment, 2nd ed. New York: Neal-Schuman, 2007.

Stephanie Olsen, “Google Cache Raises Copyright Concerns,” C/net News.Com, 2003 []. Accessed Jan. 16, 2008.

Tom Turvey, “A Perspective of Google Book Search. Viewpoint: Google Offers Its Side,” Information Today, vol. 23, no. 1, January 2006, pp. 1, 25.

Siva Vaidhyanathan, “Siva in Chronicle of Higher Ed: A Risky Gamble with Google,” 2005 [ 002445.html].

H.G.Wells, World Brain, Meuthuen & Co. Ltd., 1938.


Wallace Koehler


Master of Library and Information Science Program

Odum Library, Valdosta State University

Copyright Information Today, Inc. Jun 2008

(c) 2008 Searcher. Provided by ProQuest Information and Learning. All rights Reserved.