Monday, May 19, 2014

THIS is what linked data looks like?

On April 28, OCLC issued a press release ( announcing the availability of 197 million bibliographic work descriptions, formatted as linked data. Collectively referred to as "WorldCat Works," the 197 million sets of linked data represent a leap forward in the migration of library data from traditional library catalogs to the linked data world of the World Wide Web.

Richard Wallis, OCLC's "Technology Evangelist," posted an article ( the same day with further information about the project and its significance to libraries. Roy Tennant devoted a post on the Hanging Together blog ( to this development, calling it "the most important thing you haven't heard of." Both Wallis and Tennant point their readers toward an example, Gandhi's "Story of my experiments with truth" ( Looking at the example, one sees a set of, well, links. The links are separated into a number of categories, many of which will be familiar to catalogers (e.g., contributor, creator, genre). The links can be viewed as several different kinds of RDF serializations, in addition to HTML: Turtle, RDF/XML, N-Triples, and JSON-LD.

While recognizing the significance of what OCLC has done, I confess to some confusion. The "work" chosen as an example is actually what RDA calls an "expression." Gandhi's "work" in the example is a translation of his autobiography, which was originally written in Gujarati, yet this is nowhere apparent in the sample "work" description. The original Gujarati work has its own work description (; as far as I can tell, it includes no link to the English translation. Is it unrealistic to expect linked data to provide links between works and their translations? Tennant's blog post explicitly refers to this capability: "By aggregating various translations of works around a single identifier, we can then present the record that a particular user wishes to see given their language capabilities." Unless I am misunderstanding what linked data is supposed to do (entirely possible!), what I am seeing so far in OCLC's work descriptions does NOT live up to this promise.


Richard Wallis said...

Excellent post Jean.

"Is it unrealistic to expect linked data to provide links between works and their translations?" - No it is not unrealistic and, where the data is available, the intention is to provide those links and relationships. As I said in my post This is a major first step in a journey to provide linked data views of the entities within WorldCat - there is significant work underway to identify these relationships (see this post from Karen Smith-Yoshimura providing insight).

I understand your frustration in not seeing, at initial release, a perfect graph of all the entities and their relationships from all OCLC resources. I am eager to reach that point too. Like most of the worth while journeys it takes time to arrive.

If you have specific comments and suggestions please do not hesitate in making them on the site, and continue to emulate my children from past vacation trips who punctuated the trip with cries of Are we there yet!

Roy Tennant said...

Richard, who is more politic than I, said "where the data is available". And that is very often the rub.

The data one would like to make appropriate linkages simply isn't there in a number of instances. Fixing this will likely take a mix of sophisticated algorithms and manual editing. And many records will probably, and unfortunately, never make the grade. Not to put too fine a point on it, but our metadata is less...uh...standard than we might like. And less complete.

These issues have largely flown under the radar except in isolated instances. But when you attempt to do the kind of processing we are doing on such a massive scale, they become much more apparent. Welcome to our world.

Unknown said...
Karen Smith-Yoshimura said...

For those interested in the challenges of identifying translations and linking them to the original work, see my blog post, "Challenges posed by translations" at