Friday, April 13, 2018

Long-term Survival of PDF/A Files

PDF/A is widely marketed and regarded as a preservation file format. However, a recently published article, “PDF/A Considered Harmful for Digital Preservation” by Marco Klindt serves as a prudent reminder that the PDF/A file format is not a comprehensive solution for preservation in itself.  

For digital information to exist in the long term, the data that comprise the information content needs to remain discoverable, machine readable, and renderable for human consumption. If preserving digital content means that we are planning for the potential reuse of data, a computer needs to be able to read and extract this information in the future. However, there are significant challenges to preserving PDF files in the distinction between what the human eye can read and what a machine can interpret and extract.  

PDF/A is intended to serve as the long-term archival version of PDF files. However, as the author notes, while PDF/A is marketed and widely adopted as a preservation format, “comprehensive policies regarding the use of PDF in archives seem to be rare” and “using PDF/A as a container for files complicates preservation workflows and might be considered an additional risk.” PDF documents preserve the visual appearance, structure, and format of the original document, but this comes at a potential cost for the reusability of data. A PDF/A document created at Level A (accessible) conformance is designed to improve a document’s accessibility through the use of tagging to markup the structure and content of a document, which in turn should help support both visibility and reuse. However, in its current version, PDF/A-3 still presents multiple challenges.  

Klindt discusses the risks and shortcomings through observations of existing inadequacies and challenges with the creation and reuse of PDF/A documents. The risks identified here undermines confidence in the suitability of PDF/A for long-term preservation. A few of the challenges discussed include impediments to text and content extraction in addition to information loss during the creation and conversion process. It is worth noting that while the author acknowledges PDF/A validation issues has been largely addressed by the creation of veraPDF, an open-source PDF/A validator, he argues that validation is a “necessary condition” but does not mitigate risks to future reuse of content. 

An understanding of these risks and shortcomings of PDF/A for preservation purposes underscores the need for comprehensive strategies and policies at the institutional level to safeguard digital content within a flawed archival solution. There are a number of useful, previously published resources on PDF/A cited in this article, including the National Digital Stewardship Alliance (NDSA) report on The Benefits and Risks of the PDF/A-3 File Format for Archival Institutionsand a National Information Standards Organization (NISO) Information Standards Quarterly article, “Preserving the Grey Literature Explosion: PDF/A and the Digital Archive”.  The PDF/A-4 standard is expected to be published sometime in 2018. 

Tuesday, April 3, 2018

Revisions to Legal Requirements and Program Regulations (LRPR)


The current guide for Federal Depository Library Program (FDLP) libraries was designed to be an easier to use consolidation of the various rules and regulations as they relate to depository library requirements. Issued in 2011, the Legal Requirements and Program Regulations of the Federal Depository Library Program (or LRPR) has been showing its age. However, with changes to Title 44 on the horizon (see https://www.fdlp.gov/about-fdlp/23-projects/3353-title-44-revision for more information), there has been reluctance to completely rewrite this guide.

Because of this, I was a little surprised to see an email from the “FDLP Webmaster” announcing a revised edition of LRPR. The changes in this 2018 revision are minor, to be sure. They boil down to:
  • Rescindment of regulation 10 (tangible item selection requirement)
  • Updated FDLP email list information
  • Inclusion of regional discard policy
  • References to new FDLP decals
None of these changes are groundbreaking, but it always helps to have the most current information at hand when dealing with FDLP requirements. Get the latest edition at https://www.fdlp.gov/requirements-guidance/legal-requirements.

Wednesday, March 28, 2018

Project COUNTER publishes librarian's guide to COUNTER release 5

COUNTER release 5 is scheduled to replace COUNTER release 4 by January 2019. In preparation for this change, Project COUNTER has issued The friendly guide to release 5 for librarians.

COUNTER release 5 represents a revision of the COUNTER code of practice to better reflect changing reporting needs. Types of reports have been simplified to both assist providers in becoming COUNTER compliant and provide librarians with "usage statistics that are credible, consistent and comparable." Page 19 of the guide provides a table comparing Release 5 reports with covered Release 4 reports. Several sample usage scenarios are provided.

The COUNTER code of practice provides consistent guidelines for publishers reporting electronic resource usage, assisting librarians in evaluation of their usage.

Monday, March 26, 2018

Getting to Know TS Librarians: Lindsey Carpino



1. Introduce yourself (name & position).
Hello, my name is Lindsey Carpino and I am the Digital Resource Analyst at Sidley Austin LLP in Chicago, IL. I am newer to the Tech Services side as in the more recent past I was on the research team here at Sidley. I also have academic law librarianship experience as a Reference Assistant at both Loyola University Chicago School of Law and Northwestern University School of Law Pritzker Legal Research Center. 

2. Does your job title actually describe what you do? Why/why not?
I would say yes and no. While I do organize and provide access to our electronic resources, I do so much more than that. I also work on updating our practice collaboration sites, helped with the redesign of our new library intranet page, collaborate with IT in developing a database for all of our resources, assist with resource access issues and more.  

3. What are you reading right now?
I just read The Mothers by Brit Bennett for my book club. I am looking forward to reading All We Ever Wanted by Emily Giffin, who is my favorite author. Emily is a lawyer turned novelist. Maybe one day she will feature a law librarian as a character. 

4. If you could work in any library (either a type of library or a specific one), what would it be? Why?
I really think law librarianship is the perfect fit for me since I have both a J.D. and M.L.I.S. If I did not have this background, I would be interested in working in an art museum library like the Art Institute of Chicago. In college, I studied political science and art history and went toward the law side. I continue to have a love of art.

5. You suddenly have a free day at work, what project would you work on?
So many projects, so little time ☺ if I had a free day at work, I would devote even more time to going through our resource database. 

Wednesday, February 28, 2018

Using the Google Translate App to Assist in Cataloging: A Brief Case Study

As Technical Services staff sizes have shrunk over the past several years, so has in-house expertise with foreign languages. While our library collects materials mainly in English, with some Spanish, we do still get the occasional item in a language no one in the library speaks or reads.

This happened recently when we were given a gift copy of a book from a Japanese publisher. An accompanying letter, written in English, said that the book contained a reprint of an article by one of our faculty members. The letter mentioned the name of the faculty member but did not indicate which chapter was his. The book itself was completely in Japanese, so our cataloger had no way of knowing which chapter in the book was the reprint. We needed to know so that we could include that information in our catalog record for the book as well as update our internal database of faculty publications.

We were able to find a bib record for the book in OCLC with an ISBN search but it didn't contain a content note. After searching in vain in various places for a record that had a detailed content note, I remembered the Google Translate app on my phone.  After opening the app and telling it I need to translate from Japanese to English, I took a picture of the chapter titles.


The app scanned the photo, looking for Japanese text. I was then given the option to select the text to translate or translate the entire page. Since I was looking for chapter titles and authors, I just selected those.


A preview of the translation was displayed at the top of the screen. I could tap through to see the full translation. When I saw the faculty member's name, I knew that I had found his chapter.

While the translation was not 100% perfect (the word "the" was missing from the title and the faculty member's name was misspelled) it was more than enough for us to recognize that we had located the reprint. 

Google Translate allows for the downloading of translation files in all its supported languages. With downloaded translation files, the user can perform translations while offline. It also enables instant translation that show the translated text in situ, in real time. 

In my experience, even though the instant translation is super cool to look at, it's not as accurate, especially when translating languages that don't use the Roman alphabet. As shown above, the title is not as accurate, nor is the author's name. However, instant translation will cycle through various possible translations.
The Google Translate app, while not completely perfect, can at least point catalogers in the right direction when they are tasked with cataloging materials in languages they don't know. It also has many other features that we have not begun to explore, such as text to speech, verbal translation, and the option to hand write text on the screen to translate. 

Our cataloger is eager to try the app on a few Russian language books that have been sitting on her desk for a while. The app is certainly faster than tracking down a speaker of the language or trying to enter text in Google's web-based translation service. 

Tuesday, February 27, 2018

RA21 - Resource Access for the 21st Century

A recent series of posts in The Scholarly Kitchen discussed the pros and cons of RA 21: Resource Access for the 21st Century. What is RA 21, and how does it relate to more familiar means of authentication?

The current norm for authentication used by academic institutions is IP authentication. If a researcher originates within the IP range associated with an institution, s/he is presumed to be associated with that institution and entitled to resources provided by that institution. For a non-technical explanation of how IP (and other authentication methods) work see Understanding federated identity, RA21 and other authentication methods.

RA21 is a joint NISO/STM initiative "aimed at optimizing protocols across key stakeholder groups, with a goal of facilitating a seamless user experience for consumers of scientific communication." (https://ra21.org/index.php/what-is-ra21/). The basic assumption of this initiative is that IP range based authentication no longer works for users of scholarly information. In place of IP authentication and proxy servers, use of a federated authentication model is proposed.

Hinchliffe and Schonfeld express concerns about patron privacy, which may be ameliorated by  requirements of the EU General Data Protection Regulation (GDPR) scheduled for enforcement beginning in May 2018. Given some publisher's attempts to cover all aspects of scholarly work flow, would they be inclined to mine and monetize the information about a scholar's research patterns federated identity could generate? Additionally, access for walk-in users may also be an issue.

RA21 is hardly a done deal, but it certainly bears monitoring.

Hinchliffe, Lisa Janicke What will you do when they come for your proxy server?
https://scholarlykitchen.sspnet.org/2018/01/16/what-will-you-do-when-they-come-for-your-proxy-server-ra21/

Schonfeld, Roger C. Identity is everything 
https://scholarlykitchen.sspnet.org/2018/01/22/identity-everything/

Carpenter, Todd A. Myth busting: five commonly held misconceptions about RA21 (and one rumor confirmed)
https://scholarlykitchen.sspnet.org/2018/02/07/myth-busting-five-commonly-held-misconceptions-ra21/


Thursday, February 15, 2018

Artificial intelligence laboratories in libraries?

Libraries have often been the incubators of novel ideas and new technology. In an effort to share AI development to a wider array of students and the public, the University of Rhode Island is opening an AI lab in their university library. 


The University of Rhode Island is taking a very different approach with its new AI lab, which may be the first in the U.S. to be located in a university library. For URI, the library location is key, as officials hope that by putting the lab in a shared central place, they can bring awareness of AI to a wider swath of the university's faculty and student body.

The Dean of Libraries, Karim Boughida specifically mentions the lack of diversity of AI and its resulting issues of a biased algorithm as the reason to put an AI lab in the library, a place that values inclusivity. "Without explicit countermeasures, machine learning and AI could magnify existing patterns of inequality in our society", says Boughida.


Unlike a typical AI lab focused on research, the URI AI Lab will offer students and instructors the chance to learn new computing skills, and also encourage them to deepen their understanding of AI and how it might affect their lives, through a series of talks and workshops. The 600-square-foot AI lab will be located on the library’s first floor and will offer beginner- to advanced-level tutorials in areas such as robotics, natural language processing, smart cities, smart homes, the internet of things, and big data.