Monday, November 25, 2019

Here Come the Bots: Six Tips When Designing Your IR's Metadata for Improved Discoverability


Last week I attended a webinar about "the science of discoverability". Although it was aimed at librarians working with institutional repository (IR) content, it was an excellent reminder that the many best practices I followed as a web developer for our law school's Drupal site were applicable not only with repositories but also with LibGuides (and any other pages we wanted Google to find). Here are six tips to deploy when designing metadata for the bots and increasing your site's discovery:

  1. Title Fields Are Important! In fact they are perhaps the most important field of any object or event metadata in your repository. Working as a web developer this was something we struggled with when other users would create webpages. The title did not always match or identify the content. Later on they inevitably call or email to ask why it isn't showing in Google's search results when they put in keywords that they think and assume will definitely retrieve their exact webpage that was literally just created - of course it doesn't work. Almost always the keywords they wanted Google to identify were not in their page title field (or URL). The same rings true for IR content. No matter how many other fields have the data or keywords, if the title doesn't it probably isn't good enough to be retrieved by Google (unless you have big bucks of course - then you can use Adwords to pay your way to the top of that results list as a sponsored item...but I doubt any of us have that kind of money for SEO, hah!).

  2. HUMAN-Readable Is Better. This is not your library catalog. Your ILS is a (mostly) closed-off system. It was engineered by ONLY librarians who have strict cataloging rules passed down over decades of meticulous fine-tuning with a field for literally every-single-possible-bit of data. IR's are not an ILS. In the same way Google is not your OPAC. They do not and will never function the same way. Sure, you can use some of the same operators, and you may even form similar strings in each of the search bars. The difference is that Google's algorithm is not a 100% known entity. Most of Google's users are performing natural language searches. Your I.T. or metadata librarian's cannot get into Google's back-end and tell it what you want, what fields to provide searches for, what weight to give certain types of results, or how to display your results list. Google's algorithm not only likes but craves HUMAN-readable, NOT machine-readable. Craft the content in your fields for any given item, event, or landing page with this in mind. You really should design the data carefully. And the key here is not to overdo it! 

  3. Don't Use Too Many Keywords. This relates to the last sentence of the last tip - don't overdo it. In addition to not getting overly wordy or technical in your fields, the field to especially watch out for is keywords. In Digital Commons there is a nice keyword field. When I first started adding content to our repository I no doubt went overboard with more keywords than I should have. Although too few could hinder discoverability, if the keywords are on point and you have two to four of them that are appropriate you will hit a sweet spot with Google's crawl. But beware of using too many. Google and other search engines will actually ping or potentially ignore your content (and in some cases as the webinar warned your entire site) for using too many keywords. Excessive metadata makes it assume this content isn't valid. So just be careful here. This doesn't mean you should never use more than four keywords. There may be occasions when less just won't cut it. Perhaps that one article or conference you just loaded is particularly interdisciplinary and really needs more terms. Keeping the majority of your content with three keywords or less will get search engines to take you more seriously and those few instances where you decided to use more keywords won't throw up red flags like twelve keywords for every single items in your repository would. 

  4. Frequency, Consistency & Longevity. I can't count how often I was asked as a web developer when Google would crawl our site. This is a mystery to most everyone, and while you can request through some of Google's Webmaster Tools for a re-crawl there is no guarantee the speed at which that will happen. One thing is for sure, you will be re-crawled more often the more frequently and consistently you update any site, no matter what site it is. Long periods of no activity may result in flagging you as a dead site so regular adding or refreshing of content is the key here. Another related factor is longevity. This is simply the idea that the longer a site exists the more time it has had to be crawled, to appear in search results, and as a result to increase site traffic. Then the cycle returns to the beginning since the more site visits you receive from organic Google searches the more your site should rise in the results list as your site and its content becomes more closely associated with a variety of searches over time. Obviously a brand new site will take time to get there, but after many repeats of this cycle (with the help of your frequent and consistent care and feeding) this will happen naturally. 

  5. Bots Like Quick Load Times. So since we don't really know when Google or other search engine bots will pay us a visit, how can we make sure that when they do they are finding us at our best? Load times are one big indicator. I know, I know... but there are SO many cool and flashy things we could embed into our content, right? Is that snazzy High-Res image of the latest guest lecturer too much for Google? What about our Issuu flipbooks of scanned symposia programs, or the YouTube video of the three hour panel? Each bit of multimedia needs a different approach here. If your IR system has native streaming this will help cut down on additionally embedded load times. If not, you may need to choose what is more important - the load time or the media keeping your traffic on your site. If traffic isn't a major factor, load times will increase by hyperlinking to the media instead of placing it on the page itself. The same could be true for embedded flip-book style PDFs. For images, as long as you use best practices for the proper resolution on the web you should not have to choose between a crisp, quality image and fast load times. Use the right format for image and other media files (choose MP3's for online streaming instead of WAVs of AIFFs). If you want or need to offer the highest quality original files to site visitors, hyperlink to that file's location instead of providing at their point of entry. This will keep load times up and still give visitors the option of access and retrieval. In the end, the faster your content loads, the more quickly it can be indexed. Bots are impatient - they are bots! Make them wait too long and they just keep moving. 

  6. Site Maps Are Critical, Especially for "Dead" Collections. So your content is now in tip-top shape! It has excellent human-readable title fields and abstracts. It has good keywords, but not too many of them. You've even managed to build a beautiful page of content enhanced with multi-media, but you've been careful to follow best practices for these files and your load time is great. Now there is just one problem - this collection is an archive! It just so happens as a librarian you have created a collection of items that will never grow again because it is historical. How can you possibly be frequent and consistent with this set of data? Will Google eventually forget about you (even if the collection exists over a long period of time) because there is nothing to update? No! Not necessarily - this is where your site's skeleton, the trusty site map, comes into play. Depending on the system you are using a site map may be generated for you as you create new content. It never hurts to revisit this though. Particularly for sites that have been around over a long period of time, the site map (generated for you or created by someone else) may be pulling titles and other structural and organizational information that is either no longer accurate or appropriate, or perhaps it is just not as good as it should be. Revisit your site map every so often as a regular maintenance task. It is essentially an outline of your site and all that it contains, and as such can indicate where a collection or series title is not descriptive enough, is too descriptive, or is just not human-readable. Think back to tip #1 and #2 for human-readable fields (especially titles). Page summaries can help here as well. When you conduct a Google search, if a result appeared but had no description at all for the page are you going to take your chances with clicking through to that result, or are you more likely to choose the result that tells you what you will find there? Make titles, related page summaries for what it is about, and if possible even URL strings make sense and describe what you will find there. Adjust your sitemap and related descriptive data as needed, and monitor how your site (hopefully) rises in results over time, as well as how your traffic (hopefully) increases over time. 
Have more tips to share with TechScans readers that were not touched on here? What has worked for improving your website or repository's metadata, and how do you optimize your content for search engines? Share with us in the comments! 

Monday, November 18, 2019

Getting to Know TS Librarians: Joy Humphrey


1. Introduce yourself.
My name is Joy Humphrey, and I'm the Associate Director of the Harnish Law Library at Pepperdine University's Rick J. Caruso School of Law. 

2.Does your job title actually describe what you do? Why/why not?
My job title is broad, so in that sense, it does describe what I do because I'm a bit of a generalist.  At any given moment I can be fielding a question about our noise policy, originally cataloging a board game, or contacting a vendor to pay an invoice. I'm tasked with overseeing the day-to-day operations of the law library, but I am specifically over the Public Services and the Technical Services Departments.

3.What are you reading right now?
I'm on a Ruth Reichl kick right now. I recently finished Save Me the Plums, Reichl's memoir of her time as the editor of Gourmet, then I moved onto her earlier memoir Tender at the Bone, and I'm currently reading the sequel, Comfort Me with Apples. I love food memoirs, especially at this time of year since they get me pumped to do all that Thanksgiving cooking.

4.If you could work in any library (either a type of library or a specific one), what would it be? Why?
I would love to work in an art museum library.  As someone who makes a point of visiting art museums in every city she travels to, actually working in one would be divine. (I expect the reality would not match my fantasy--budgets are probably small, art books are unwieldy and heavy--but at least the collection would be beautiful.)

5.You suddenly have a free day at work, what project would you work on?
I know this sounds prosaic, but I would clean my office. I mean, REALLY clean it. Go through every file, every drawer, every book shelf. I would Marie Kondo the heck out of it. Because clearing out all that is irrelevant is the best way to bring a fresh perspective to what one does every day.

Monday, November 11, 2019

Quick Question: What conferences should I attend?

It feels like every week or two I receive an email promoting a conference to attend. With limited budget and a desire to attend conferences centered around issues related to technical services and resource management, I reached out to the TS and OBS-SIS listservs for recommendations. Keep reading to discover your next favorite conference outside of the AALL Annual Meeting.

Stay local

  • Check out your local chapter of the American Association of Law Libraries for their annual meeting. Chapters tend to have more programs relevant to TS and OBS librarians.
  • Look for regional TS-based workshops or conferences. If you're a Northern California resident, check out the Northern California Technical Processes Group (NCTPG)
  • Your local ALA chapter may also have an annual conference with a technical services or library systems track.

ILS Conferences

E-Resource Management Conferences

  • North American Serials Group (NASIG)
    • Focuses on serials and electronic resources management.
    • Budget friendly: this conference is less expensive than a lot of other conferences - yay!
    • Many of our colleagues recommend this annual conference, so you can count on meeting some TS/OBS-SIS folks!
  • Electronic Resources and Libraries (ER&L)
    • Held annually in March in Austin, Texas and online.
    • The conference is organized into tracks/subtopics, so it's easy to see if there is an area of interest to you.

Other great recommendations

  • Charleston Conference: Issues in book and serial acquisitions
    • Held every year in Charleston, S.C., the Charleston Conference is devoted to technical services, electronic resources, serials, etc.
    • This conference is recommended by many of our colleagues.
  • Computers in Libraries 
    • Held annually in Arlington, Virginia
    • Focus on emerging and leading-edge technology. Great place to network with other information professionals from a variety of library backgrounds.
  • Acquisitions Institute at Timberline Lodge 
    • Geared towards acquisitions and collection development librarians
    • Single-track conference with limited enrollment.
  • Society of American Archivists (SAA)
    • Joint annual meeting of the Council of State Archivists and the Society of American Archivists 
  • Special Libraries Association (SLA)
    • Focused on special libraries and information professionals. 
    • Explores trends in knowledge and information management.
Is your favorite conference missing? Leave your recommendations in a comment below!

Monday, November 4, 2019

GLA Conference Review: Workshop on Digitization for Small Institutions


A while back I did a post called What About Conferences? aimed at newer members as part of our "Quick Question" series. In that post I specifically talked about memories and experiences from state and regional librarian organization annual meetings. One of those (perhaps the organization I am most fond of!) is the Georgia Library Association. I have been actively participating in this state association affiliated with ALA, ACRL, and SELA since I attended my first GLC (Georgia Libraries Conference) in the fall of 2014. It has always been a welcoming and lively group with a crazy awesome mixture of library types and individuals.

Co-presenting at GLC 2019 with colleagues
Szilvia Somodi and Marie Mize.
What I love perhaps most of all about GLC is that you will find all levels of librarians there (not just "faculty-level" with "Librarian" in their title). The very best sessions I have attended often come from library staff. As a librarian who worked a few public libraries while studying librarianship, and as one with past positions which until recently were entirely "paraprofessional" or I.T. titles this is where the on-the-ground, behind-the-scenes knowledge and skills are found and shared: at the local events candidly. GLC isn't pretentious or intimidating like some conferences and their crowds can feel. It is also not overly techy like many I.T. and web developer conferences I have experienced where all you hear is jargon that feels distant and mysterious. There is a beautiful happy medium at GLC where you can network, actually learn, and find encouragement to follow your interests and grow as a librarian without judgement. It is here that my love of libraries grew stronger, although it took me a few years to get comfortable enough to sign up for one of the pre-conference workshops.

Table of recommended project management
software from DLF's awesome wiki.
This year I finally did it, and I am so glad that I did! The workshop held Wednesday October 9 in Macon,GA gave myself and my colleagues an excellent excuse to spend more time together. It was a long day but definitely worth the trek. Digitization for Small Institutions was presented from 9 am to 12 noon with a short break in the middle of the session. The two presenters opened by talking about the Digital Library of Georgia (DLG) and right away shared links to resources including a toolkit for Project Managers from the Digital Library Federation (DLF): https://wiki.diglib.org/DLF_Project_Managers_Toolkit. For folks new to using a project management tool, this wiki has an excellent table of recommended software with summaries of each, links to them and pros and cons side by side. Many of the tools you expect to find are here (Jira, Asana, Trello, Slack, Google Suite) although I was personally disappointed that KanbanFlow was not included (insert sad-face emoticon here), there were a few I had not yet heard of or tested out which is ALWAYS exciting.

Photo from my messy notes of a favorite, useful visual.
In the first hour I quickly learned more about DLF, DLG and DPLA (Digital Public Library of America). This was an extremely interesting portion of the workshop that served as the backdrop for the rest of the session's more detailed "how to" segments. Although I had heard of and visited each of the aforementioned DL sites before it had been quite a while since I had taken a moment to just learn more about them and familiarize myself with the "why" of each site and their respective purposes. This seemed particularly relevant after I returned from the conference as we prepared for Open Access Week just a few weeks later. I did not realize how many wonderful resources DLF made available for free online. The project manager toolkit wiki is invaluable, and even if you are not working on projects that will eventually feed up into a DL site, the kit contains so many best practices and tips that it could be useful for many types of digitization projects. One such best practice was this 5-step process (as seen in my messy note photo here): 1. Selection & Planning, 2. Metadata Creation, 3. Prep & Scanning, 4. Post-Processing (crops & edits), 5. Ingest & Preservation (into institutional repository). Before we had a short intermission the attendees were divided into break-out groups of 3 to 4. In this form we discussed why we were there, what projects we were undertaking and what our role was at our institution. Another takeaway takes me back to what I love so much about GLC: there were more staff than librarians in attendance, and a surprising number of public library or museum attendees.

Slide dissecting "Title"
For the rest of the workshop we were shown workflow charts (I LOVE a good visual aid for wrapping my head around a process and grasping a project's big picture) and given what might as well have been a micro-course on metadata terms with a focus on descriptive data, and specifics on Qualified Dublin Core. There was even a little LinkedData talk! What was most helpful about this section were the slides that included specific examples of Title fields. You know a session is worthwhile when you can take that nugget of info back and start using it immediately at work when you return. This was that particular nugget for me!


Hands-on Digitization Station
I was able to share in my breakout group and with the entire group of presenters out loud the challenges of a certain project I have been collaborating on in our library for properly and efficiently archiving thousands of photographs. Lucky for me our project is dealing with media that is already digital, and I already have a space that exists and is ready for hosting the images and metadata (Digital Commons). It was super cool to hear the stories and challenges of others, including what types of media they are digitizing, organizing and archiving to make accessible to their patrons. Not everyone has a repository in place, and not everyone has the staff or tools to achieve their goals right away. This workshop also provided a hands-on station to practice digitization before you left the room. I love that the session enabled everyone, even those interested in the topic (lots of MLS students were there too) but not currently working in a place or role that allows them to get their hands dirty to do just that!

I left the workshop feeling inspired and with an added confidence for the project waiting for me back in the office. Many of the tips I gained from the workshop I am currently utilizing this very week. I had such a wonderful experience that I will certainly sign up for future pre-conference workshops next year! In particular I have enjoyed taking part in GLA's interest groups like Technical Services and Information Technology, and their division sponsored activities like the Academic Library Division, the New Members Round Table Division, and the Paraprofessional DivisionWhat local, state or regional organizations would you recommend to AALL TS-SIS and OB-SIS members who may be from the same area of the States that you are? Share with us in the comments below and link to the association, group or conference!