Tuesday, March 29, 2011

HathiTrust/Summon Deal Increases Search Access to In-Copyright Works

By Josh Hadro Mar 28, 2011

With the Google Books corpus on hold, many have turned their attention to other possible venues of research access for students and scholars, including the HathiTrust digital archive. Initially built upon a foundation of Google book scans, HathiTrust (based at the University of Michigan, Ann Arbor) has grown to encompass more than eight million volumes from a variety of sources. Now, a partnership with Serials Solutions (a ProQuest business unit) presents a new option for academic libraries seeking to give researchers an entre into a massive collection of research.

The Summon integration is set to go live this summer, and will allow subscribing institutions to link their print holdings to the heavy-duty search indexing that's been done by the HathiTrust for works in its collection. Asked how tight the integration between the two would be, Michael Gersch, ProQuest Senior Vice President and General Manager, Serials Solutions, told LJ a Hathi-enabled Summon setup "will search the full text of HathiTrust volumes and point users to the correct place for their institution. That could be a book on the shelf, an ebook through a vendor such as ebrary, or an open access source (freely available in the case of public domain works) version such as Hathi itself or Project Gutenberg."

The HathiTrust corpus is already searchable via two interfaces, including a WorldCat prototype built by OCLC, and the native HathiTrust interface, slated to be upgraded this summer as well. With a Summon implementation, librarians can also choose to include the entire HathiTrust search index, or just the public domain materials

Most recently published works ...

From: Library Journal.com 3/28/11

Wednesday, March 23, 2011

Open Book Alliance applauds rejection of Google Books settlement

The New York Federal District Court’s rejection of the Google Book Settlement is a victory for the public interest and for competition in the literary and Internet ecosystems. The U.S. Department of Justice and the State Attorneys General who fought to protect consumers and competition should be applauded. Judge Denny Chin’s reasoned and thoughtful analysis was worth the wait.

In his decision, Judge Chin confirmed that the proposed settlement “would give Google a de facto monopoly over unclaimed works” and concluded that the proposed settlement “is not fair, adequate, and reasonable.”

In his conclusion, Judge Chin gave voice to the authors and creators who have long opposed this proposed settlement by urging the parties to consider revising the settlement to an “opt-in” structure. While opt-in is a preferred structure, the Open Book Alliance (OBA) believes it requires complex changes to the proposed settlement and would not address the severe antitrust and privacy problems that the court describes in the decision.

“The ruling ratifies the objections of a diverse cross-section of voices who stood up to Google and its partners – from the Justice Department and State Attorneys General to authors and independent publishers to consumer and privacy advocates and members of the academic and library communities,” said Gary Reback, Counsel to the OBA. “We urge the Justice Department to remain vigilant and continue in its role as a leader in protecting consumers and competition from an entrenched monopoly in online search.”

The Open Book Alliance looks forward to participating in a collaborative process that will focus on developing an open digital public library created to serve the public interest that respects the rights of creators while promoting innovation and competition.

From the Open Book Alliance web page, (http://www.openbookalliance.org/), 3/22/11

Friday, March 18, 2011

MARC Record Guide for Monograph Aggregators

From a message by Ms. Kate Harcourt, Director for Original and Special Materials Cataloging at Columbia University, sent to the OCLC Cataloging list on March 10:

"The Program for Cooperative Cataloging published a guide for vendors. I
think vendors will only improve the quality of records if we insist on
better records. Please share this guide with your vendors as it maps out in
detail how to construct a good quality MARC record.


This is the second edition, updated in fall 2010. Yael Mandelstam and George Prager are among the four authors. Maybe we all should send the guide to the legal publishers who produce record sets!

Thursday, March 10, 2011

Talis exits the library automation industry to focus on the Semantic Web

In a move that allows Talis to concentrate on its growing semantic web business, which has been its strategic focus in recent years, the company has divested its library automation unit to Capita Group. In a transaction valued at about $32 million dollars that closed on March 3, 2011, Talis Information Limited, a business unit of Talis Group, was acquired by Capita Group, a large UK-based outsourcing company. Capita Group ranks as one of the largest UK companies specializing in business process outsourcing and professional services. Talis, based in Birmingham, England, provides its Alto ILS to about 109 public and academic libraries in the United Kingdom. Its library automation products have not been implemented in libraries outside the UK, but the company recently opened a subsidiary in the United States to develop and support products related to its semantic web technologies. See Library Technology Guides for details on this transaction.


Wednesday, March 9, 2011

Breaking Down Link Rot: The Chesapeake Project Legal Information Archive's Examination of URL Stability

This article by Sarah Rhodes focuses on the highly significant impact of "link rot" among titles harvested through the Chesapeake Project. "Link rot" refers to the loss or removal of content at a particular Uniform Resource Locator (URL) over time. When an attempt is made to open a documented link, either different or irrelevant information has replaced the expected content, or else the link is found to be broken, typically expressed by a 404 or "not found" error message. This is not an uncommon occurrence; web-based materials often disappear as URLs change and web sites are changed, updated, or deleted.

In an effort to quantify both the progress and relevance of the Chesapeake Project, an evaluation of the project's efforts has been conducted on a regular basis. Among the parameters used to evaluate the project, project participants have measured the prevalence of link rot among the original URLs for titles preserved in the archive, an analysis designed to demonstrate both the need for the project within the law library community and the instability of open access, web-published law- and policy-related materials.

This article analyzes these evaluations in order to answer the following questions:

1. What percentage of original URLs are impacted by link rot within two years of being harvested and archived, based on a sample of titles harvested through the Chesapeake Project from 2007–2008?

2. What percentage of original URLs representing the entire digital archive collection are currently impacted by link rot, based on a sample of all titles harvested through the Chesapeake Project from 2007–2010, compared to samples from previous years?

3. What are the top-level domains (such as .gov, .com, .org, or .us) of original URLs that are most impacted by link rot?

4. What are the file format types (such as PDFs, X/HTML web pages, or MicroSoft Word documents) of original URLs that are most impacted by link rot?

The study explored the stability of URLs for legal, government, and policy-related web resources selected for preservation and harvested from the web for inclusion in the Chesapeake Project, which was initiated in late February 2007. The results demonstrate that among the original URLs from which content was harvested for the Chesapeake Project, link rot has increased steadily over time.

The results of this study are not meant to be broadly applicable or to provide a representation of link rot throughout the universe of web resources; rather, this study paints a portrait of the vulnerability of the original sources for the collections archived by the Chesapeake Project, while also providing insight into the vulnerability of law- and policy-related web resources selected by experienced law librarians from seemingly stable open-access web sites hosted by reputable organizations and state and federal governments.