Specifications for the OLAC metadata display format and the OLAC-to-OAI_DC crosswalk

Date issued:2009-07-23
Status of document:Proposed Informational Note. This document is in the midst of open review by the community.
This version:http://www.language-archives.org/NOTE/olac_display-20090723.html
Latest version:http://www.language-archives.org/NOTE/olac_display.html
Previous version:http://www.language-archives.org/NOTE/olac_display-20060515.html
Abstract:

In addition to the olac metadata format, the OLAC Aggregator [OLACA] serves records in two other formats: olac_display and oai_dc. This document provides the specification for how an OLAC record is transformed into these other two formats. The first of these formats is a reader-friendly view of OLAC metadata that may be used by someone building a service that displays OLAC metadata; it translates coded values into their human-readable equivalents. The latter format is the standard format used by the Open Archives Initiative (OAI) for metadata interchange. Thus the OLAC Aggregator serves as a crosswalk that transforms the olac format records supplied by OLAC's participating archives to oai_dc format records that can be used by the wider OAI community.

Editors: Gary Simons, SIL International (mailto:gary_simons@sil.org)
Changes since previous version:

This draft describes a total reimplementation of the two formats that was completed in May 2009. The original implementation and specification maintained a one-to-one correspondence between the elements of the original olac record and the elements of the olac_display and oai_dc records. The philosophy of transformation is now very different in that a one-to-many mapping of elements is allowed. The result is oai_dc records that are more in keeping with best practice in the OAI community.

Copyright © 2009 Gary Simons (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Design principles
  3. The OLAC display format
  4. The OLAC-to-OAI_DC crosswalk
References

1. Introduction

In order to improve recall and precision in searching, the OLAC metadata format [OLAC-Metadata] defines an extension mechanism (involving the xsi:type and olac:code attributes) to support resource description using community-defined controlled vocabularies. Service providers may use these attributes to support precise search. However, those same service providers also need to be able to display metadata records to users in a manner that shows all available information in a form they can understand. This means, for instance, that coded attribute values (such as three-letter language codes) need to be translated into friendly display forms. Still other service providers, such as in the general Open Archives Initiative (OAI) community, will not be interested in the community-specific extensions and will prefer to work with metadata from OLAC participants in the generic oai_dc form without special codes or attributes.

In order to enhance the repurposing of OLAC metadata, the OLAC Aggregator [OLACA] offers such translation services. Neither the OLAC data providers nor potential service providers need worry about the problem of translation. Rather, OLAC data providers need only supply their records in OLAC metadata format to the OLAC Aggregator which in turn disseminates them to service providers in any of three formats: the olac format, the olac_display format, or the oai_dc format. For instance, the following request to OLACA retrieves a record from the Audio Archive of Linguistic Fieldwork (Berkeley, CA) in olac format as it was supplied by the archive:

http://www.language-archives.org/cgi-bin/olaca3.pl?verb=GetRecord&identifier=oai:blc.berkeley.edu:la.1&metadataPrefix=olac

By changing the requested metadataPrefix to olac_display, the same record is returned in a format that still conforms to the OLAC metadata standard, but which is enriched by the translation of community-specific codes to human-readable display forms:

http://www.language-archives.org/cgi-bin/olaca3.pl?verb=GetRecord&identifier=oai:blc.berkeley.edu:la.1&metadataPrefix=olac_display

Finally, changing the requested metadataPrefix to oai_dc causes the same record to be "dumbed down" into the simple Dublin Core (DC) format that serves as the standard for the OAI community:

http://www.language-archives.org/cgi-bin/olaca3.pl?verb=GetRecord&identifier=oai:blc.berkeley.edu:la.1&metadataPrefix=oai_dc

Section 2 of this document discusses general design principles that underlie the mapping process for the two formats. Section 3 then gives the specification for the mapping from olac format to olac_display format. Finally, section 4 gives the specification for the transformation to oai_dc format, which in fact is a mapping based on the olac_display format.

2. Design principles

The OLAC metadata format is an application profile based on the full set of DC metadata terms, also known as "qualified DC" [DC-Q]. The standard algorithm for "dumbing down" qualified DC into the 15 basic DC elements, or "simple DC" [DC-Simple], is:

  1. Translate dcterms elements (that is, the refinements) to their generic dc equivalent.

  2. Drop all attributes in the element tag (that is, xsi:type for naming encoding schemes and xml:lang for identifying the language of element content).

The OLAC metadata format adds another attribute, olac:code, to hold the value for one of the community-specific vocabularies [OLAC-Extensions]. This is essential information that cannot be simply discarded in a dumb down process. Thus, the crosswalk needs to augment the above rules to specify what to do with each instance of olac:code. There are five controlled vocabularies for which olac:code is used to hold the value:

In the first four cases, the value of olac:code is the primary value of the metadata element. Thus it must be moved to element content so that it is not lost in the dumb down to simple DC. In the fifth case, the value of olac:code is like a refinement of the metadata element. Thus, like other refinements, it is discarded in the dumb down process and so is not moved to element content.

Another general design principle is that a metadata element containing a value for olac:code may translate into multiple instances of the element. The olac:code and the element content translate to separate instances of the element. Furthermore, if the value of olac:code is an opaque code, an additional instance of the element is generated to hold a display label for the code value.

3. The OLAC display format

The purpose of the olac_display format returned by OLACA is to provide a feed that is optimized for metadata display. It is a bridge between the olac format and the oai_dc format. It performs the movement of the oai:code value to the element content and the generation of multiple instances of an element when that is needed. It stops short of the dumb-down process in which refined elements are translated to their generic equivalent and attributes are discarded.

The following principles apply in the transformation to the olac_display format:

The olac to olac_display transformation is done as follows. If the metadata element matches a pattern in the list below, then perform the operation specified below; otherwise, simply copy the element.

The olac_display format is the basis for the human-readable displays of metadata on the OLAC site. For instance, an HTML view of the catalog record for the archive item used above as an example in section 1 can be seen at this URL:

http://www.language-archives.org/item/oai:blc.berkeley.edu:la.1

The display is made from the olac_display form of the record by showing a label for the metadata element in the left column and the element content in the right column. An attribute, if present, is expressed in the parenthesized string following the metadata element label. If xsi:type="olac:role", then the string in parentheses is the label for the participant role (i.e. the value of olac:code). Otherwise, the string in parentheses is a transformation on the value of xsi:type which identifiers the encoding scheme for the element content. Click on the "OAI-PMH request for simple DC format" link toward the bottom of the page to view the oai_dc form of the record (as described in the next section).

4. The OLAC-to-OAI_DC crosswalk

In order to participate in the wider community of OAI service providers, OLAC data providers must also publish their metadata records in the simple Dublin Core format prescribed by the OAI [OAI_DC]. There is no need for OLAC data providers to store the records in both formats, however, since the information in the oai_dc format is a subset of the information in the olac format. An oai_dc record may thus be automatically derived from an OLAC record. A program that transforms a metadata record from one format to another is conventionally called a "crosswalk"; see [Zeng2007] for other examples of crosswalks and pointers to discussions of crosswalking issues.

The OLAC Aggregator also supports the oai_dc format. It thus functions as an OLAC-to-OAI_DC crosswalk since it harvests only OLAC metadata and performs the transformation to oai_dc format upon request. Transforming a metadata record from OLAC format to olac_display format goes most of the way toward implementing the OLAC-to-OAI_DC crosswalk. In order to complete the mapping and transform an eleemnt in an olac_display record to the corresponding element of the oai_dc record, the following special cases are observed:

Then the following two dumb-down rules apply in general:


References

[DC-Q]DCMI Metadata Terms.
<http://dublincore.org/documents/dcmes-qualifiers/>
[DC-Simple]Dublin Core Metadata Element Set, Version 1.1.
<http://dublincore.org/documents/dces/>
[DRIVER]DRIVER Guidelines 2.0: Guidelines for content providers — Exposing textual resources with OAI-PMH, Novermber 2008.
<http://www.driver-support.eu/documents/DRIVER_Guidelines_v2_Final_2008-11-13.pdf>
[ISO639-3]ISO 639-3 Downloads.
<http://www.sil.org/iso639-3/download.asp>
[OAI_DC]XML schema for OAI implementation of Dublin Core metadata.
<http://www.openarchives.org/OAI/1.1/dc.xsd>
[OLAC-Extensions]Recommended metadata extensions
<http://www.language-archives.org/REC/olac-extensions.html>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLACA]OLACA: The OLAC Aggregator.
<http://www.language-archives.org/cgi-bin/olaca3.pl?verb=Document>
[Zeng2007]Zeng, Marcia Lei. 2007. Metadata Crosswalks.
<http://www.slis.kent.edu/~mzeng/metadata/crosswalks.htm>