Could a university chemistry department routinely scan copyrighted scientific journals in their entirety to create an electronic, searchable database that puts their contents at the fingertips of professors and students, so they could use the data compiled by others in perfectly appropriate ways in their own scholarship? I think the answer is clearly “no.” As a result, most universities will subscribe, for a fee, to commercial databases that provide access to such materials under license from the publishers. Is the answer different if the English department wishes to scan all newly published novels in their entirety, so that scholars engaged in “Digital Humanities” can more readily “understand individual texts, the connections between texts, and the evolution of literary language”? That would seem to be the implication of an interesting but ultimately unavailing brief recently filed by a group of such scholars as amici curiae (friends of the court) in the Google Library Project case.As noted here previously, the Google Library Project case has moved out of settlement mode and is now wending its way toward what promises to be a landmark fair use ruling. Although the plaintiff-authors complain that Google’s initial scanning and digitalization of in-copyright books constitutes copyright infringement, the real gravamen of the case is Google’s display of up to three “snippets” of a book (roughly 1/8th of a page each) in response to each search query. As the plaintiffs argue in their brief seeking summary judgment on the fair use issue: “With multiple searches, a single user of Google’s search engine can see multiple snippets from the same book, and all of Google’s users through their collective searches can view over time the substantial majority of that book.”
Given the emphasis on the snippets display, the Digital Humanities scholars’ amicus brief may be somewhat academic, no pun intended, in the larger scheme of things. They argue that mass digitization of texts for the purposes of compiling non-expressive “metadata” about the texts is fair use. (Examples of metadata they give include “word frequencies, syntactic patterns, and thematic markers in the metadata-enriched context of author nationality, author gender, and time period.”) The brief first argues, at unnecessary length, that the end product of such research does not infringe the copyrights of the underlying works because it does not copy any expressive content. Thus, the J.K. Rowland sentences “Goblin-made armor does not require cleaning, simple girl. Goblins’ silver repels mundane dirt, imbibing only that which strengthens it” are not infringed when a Digital Humanities scholar writes that Rowland’s sentences “contain twenty words, and other than ‘Goblin’ no word is repeated.”
The brief is rather cursory and unpersuasive when it comes to the more salient question in the Google case – is it a fair use of the underlying works to create the digital database that can be mined for such nuggets? The amici argue that because the type of metadata mining they engage in “creates value by facilitating the advancement of our collective knowledge” and is itself noninfringing, the mass digitization that is required to carry out such research should be considered fair use. This argument proves too much. Why, under this rationale, wouldn’t it also be a fair use to make unauthorized hard copies of thousands of texts, whole libraries, for purposes of creating traditional, non-infringing works of scholarship? The Oxford English Dictionary, for instance, shows that human beings unaided by computers can produce some rather astounding feats of scholarly concordance — were the readers who compiled it entitled to free books for this purpose?
As the Digital Humanities scholars would discover by mining the public domain records of courts and Congress, the history of copyright law is replete with instances of special pleading by parochial interests for free use of copyrighted works to advance some public good. (My book, Unfair to Genius, draws upon such sources to recount the nascent broadcasting industry’s pleas for “free music.”) Such pleas have mostly been turned aside, and the invisible hand (sometimes with a regulatory assist) has usually fashioned a workable accommodation.