OLAC Repositories

Date issued:2008-07-28
Status of document:Standard. This document describes a standard that is currently followed by OLAC archives and services.
This version:http://www.language-archives.org/OLAC/repositories-20080728.html
Supersedes:http://www.language-archives.org/OLAC/repositories-20030917.html
Latest version:http://www.language-archives.org/OLAC/repositories.html
Previous version:http://www.language-archives.org/OLAC/repositories-20080531.html
Abstract:

This document defines the standards OLAC archives must follow in implementing a metadata repository.

Editors: Gary Simons, SIL International and Graduate Institute of Applied Linguistics (mailto:gary_simons@sil.org)
Steven Bird, University of Melbourne and University of Pennsylvania (mailto:sb@ldc.upenn.edu)
Changes since previous version:

This update to the standard describes the version 1.1 revision of the OLAC repository schemas. In addition to changing the version number from 1.0 to 1.1 throughout, the substantive changes are in the OLAC archive description and are as follows: the attribute for currentAsOf is added; the elements for <curator>, <curatorTitle>, and <curatorEmail> are replaced by a single, repeatable <participant> element; and an optional <archivalSubmissionPolicy> element is added. One of these changes manifests itself in a new requirement 2 in the requirements on both static and dynamic repositories, namely, that the person associated with the <adminEmail> must be identified in a <participant> element. Finally, the guidelines concerning relevance and granularity have been revised to define the standard for granularity in terms of shared provenance. (This version also incorporates corrections that were made in response to feedback during the Candidate testing phase.)

Copyright © 2008 Gary Simons (SIL International and Graduate Institute of Applied Linguistics) and Steven Bird (University of Melbourne and University of Pennsylvania). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. OAI identifier description
  3. OLAC archive description
  4. Requirements on static repositories
  5. Requirements on dynamic repositories
  6. Guidelines concerning relevance and granularity
References

1. Introduction

This OLAC standard on metadata repositories is based on the Open Archives Initiative protocol for metadata harvesting [OAI-PMH]. This document assumes familiarity with the OAI protocol. A metadata repository may take the form of a dynamic repository that implements a CGI interface to query a live database in response to protocol requests, or it may take the form of a static repository that has no interface of its own but is serviced through a static repository gateway [OAI-SR].

An OLAC metadata repository (whether static or dynamic) must answer two special description elements as part of the response to the Identify request. It must:

These elements are described in the next two sections. The final sections of the document describe:

2. OAI identifier description

The resource identifiers supplied by an OLAC metadata repository must comply with the OAI specification for the format of OAI identifiers as defined in [OAI-Ids]. The metadata repository must document its compliance with this format by including an <oai-identifier> element within a <description> container in the Identify response.

The schema for validating an OAI identifier description is found at:

http://www.openarchives.org/OAI/2.0/oai-identifier.xsd

The target namespace is: http://www.openarchives.org/OAI/2.0/oai-identifier

The schema specifies fixed values of oai for the scheme element and : (colon) for the delimiter element. In addition to being valid with respect to the schema, OLAC places these further requirements on the content of the OAI identifier description:

For example,

<description>
   <oai-identifier
         xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
             http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
      <scheme>oai</scheme>
      <repositoryIdentifier>ethnologue.com</repositoryIdentifier>
      <delimiter>:</delimiter>
      <sampleIdentifier>oai:ethnologue.com:aaa</sampleIdentifier>
   </oai-identifier>
</description>

3. OLAC archive description

The basic Identify request supplies minimal information about an archive, namely, its name, base URL, and administrator email. An OLAC metadata repository must augment the Identify response by including an <olac-archive> element within a <description> container. This element gives additional information that makes it possible for an OLAC service provider to supply its users with a basic description of a participating archive.

The schema for validating an OLAC archive description is found at:

http://www.language-archives.org/OLAC/1.1/olac-archive.xsd

The target namespace is: http://www.language-archives.org/OLAC/1.1/olac-archive

The <olac-archive> element has two obligatory attributes, type and currentAsOf. The type attribute must have one of two values:

The currentAsOf attribute records the date on which this <olac-archive> description was last updated or, if no changes needed to be made, the date on which it was verified as holding current information. The attribute is obligatory and takes a date in the W3C date format [W3CDTF] which is a ten character string in the following format: YYYY-MM-DD (e.g., 2008-04-19).

These are the elements that occur within an OLAC archive description, listed in the order in which they must appear:

archiveURL

Optional. The home page of the archive on the Web. It may be omitted only if the archive does not have a web page. This is the home page for human visitors, not the base URL for harvesting.

participant

Obligatory and repeatable. Use an instance of this element for each of the persons who plays a significant role with respect to the repository. This must include the system administrator whose email address is given in the <oai:adminEmail> element of the Identify response. It should also include the curator of the archive, and may include any others who play some role. Identifying a participant in the archive description has two functions: it provides contact information for the OLAC community and it creates a subscription to the automatically generated report on usage and quality metrics for the archive that is emailed quarterly. Thus anyone at the institution who wishes to receive this report should be listed as a participant.

name  

The name of the person who is associated in some way with the repository. Use the normal name form (i.e., uninverted).

role  

The job title of the participant, or a label for the role the person plays with respect to the repository.

email  

The email address for the participant.

institution

Obligatory. The name of the sponsoring institution (for an institutional archive) or the institution of affiliation (for a personal archive). If the curator of a personal archive has no affiliation, then a value of Unaffiliated should be given.

institutionURL

Optional. A URL for the home page of the institution.

shortLocation

Obligatory. A brief statement (not to exceed 50 characters) of the location of the institution or the person providing the metadata following the format "City, Country". Multiple locations may be connected with "and". This information is shown in the location column of the table of participating archives at http://www.language-archives.org/archives.php.

location

Optional. A single paragraph (of arbitrary length) describing where an archive that houses a collection of physical holdings is located (for instance, include building name, room number, street address). Other information relevant to visiting the collection, such as opening hours or restrictions on access, may also be described. If the archive is purely an on-line repository, do not use this element.

synopsis

Obligatory. A single paragraph (of arbitrary length) summarizing the purpose, scope, coverage, and so on of the archive.

access

Obligatory. A single paragraph (of arbitrary length) summarizing terms of access to the materials described in the metadata repository. The statement can describe restrictions on access, licensing requirements, costs, and so on. Individual metadata records should use the Rights element to document such things for particular archive holdings. The purpose of <access> is to broadly characterize the entire archive.

archivalSubmissionPolicy

Optional. A single paragraph (of arbitrary length) describing the institution's policy toward accepting archival submissions. The presence of this element indicates that the repository is an archive that accepts submissions of materials for long-term preservation. The element content should describe the collection policy of the archive (e.g., what kinds of materials are accepted from whom under what terms) so that a person looking for a place to archive a set of language resources may determine whether it would be appropriate to contact the curator about making a submission. A repository that does not accept materials for long-term preservation must not use this element. All institutions that provide an archival submission policy are listed with their policy statement in a page aimed at assisting those looking for a place to archive language resources: http://www.language-archives.org/submission-policies.php.

For example,

<description>
   <olac-archive type="institutional" currentAsOf="2008-04-19"
         xmlns="http://www.language-archives.org/OLAC/1.1/olac-archive"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.language-archives.org/OLAC/1.1/olac-archive
         http://www.language-archives.org/OLAC/1.1/olac-archive.xsd">
      <archiveURL>http://www.ethnologue.com/bibliography.asp</archiveURL>
      <participant name="Vurnell Cobbey" title="Archives director (acting)"
         email="archive_dallas@sil.org"/>
      <participant name="Joan Spanne" title="Database administrator"
         email="joan_spanne@sil.org"/>
      <institution>SIL International</institution>
      <institutionURL>http://www.sil.org</institutionURL>
      <shortLocation>Dallas, USA</shortLocation>
      <location>7500 W. Camp Wisdom Rd., Dallas, TX 75236, U.S.A.</location>
      <synopsis>The SIL International Language and Culture Archives holds 
         works authored or edited by members of SIL International or produced by
         a publishing unit of SIL. It houses over 13,000 books, journal articles, book
         chapters, dissertations, and other academic papers about languages and 
         cultures. It also has about 8,000 items written in the languages studied, 
         such as literacy primers, books on basic education topics (health, math, 
         social studies), story books, and translated works. The vast majority of 
         works are published. The materials date from 1935 to the present.
      </synopsis>
      <access>Links are given to publications that are directly accessible 
         via the Internet. Recent SIL publications may be purchased from the 
         International Academic Bookstore (Academic_Books AT sil.org), either 
         in paper or in electronic form. Out-of-print SIL publications may be
         obtained by special order. All materials may be viewed by visiting the 
         Archives by appointment during normal business hours.
      </access>
      <archivalSubmissionPolicy>The SIL International Language and 
         Culture Archives accepts submissions from active and retired SIL staff 
         in the areas of language and culture documentation and description, 
         and language-based development. Under some circumstances, the 
         Archives will also accept materials from former staff and persons 
         more casually associated with SIL language work, if such materials 
         relate to research done with the assistance of SIL or its staff, and there
         is not a more appropriate institution able to accept and curate the 
         materials long-term. Please address any questions to the Archives 
         by sending email to archive_dallas AT sil.org.
      <archivalSubmissionPolicy>
   </olac-archive>
</description>

4. Requirements on static repositories

A static repository is an XML document that describes the resources made available by a particular institution or individual. It is a convenient way to create a metadata repository for a relatively small collection (say, up to a couple thousand records). Such a document may be created and maintained manually by means of an XML editor. Alternatively, it might be generated periodically by a script that extracts information from an existing database.

The OAI specification for a static repository is given in [OAI-SR]. The schema for validating a static repository is found at:

http://www.openarchives.org/OAI/2.0/static-repository.xsd

In addition to being valid with respect to this schema, an OLAC static repository must also:

  1. Include an <oai-identifier> description and an <olac-archive> description in its <Identify> element.

  2. Include a <participant> element within the <olac-archive> description with an email address that exactly matches the <adminEmail> within the <Identify> element.

  3. Contain the following element within its <ListMetadataFormats> element:

    <oai:metadataFormat>
       <oai:metadataPrefix>olac</oai:metadataPrefix>
       <oai:schema>http://www.language-archives.org/OLAC/1.1/olac.xsd</oai:schema>
       <oai:metadataNamespace>http://www.language-archives.org/OLAC/1.1/</oai:metadataNamespace>
    </oai:metadataFormat>
  4. Contain a <ListRecords> element that specifies an attribute and value of metadataPrefix="olac" that contains at least one record, and in which every embedded record has a metadata description that conforms to the OLAC metadata standard [OLAC-Metadata].

A service for validating a repository for conformance to these requirements is found at:

http://www.language-archives.org/register/register.php

An example of a complete OLAC static repository that conforms to these requirements is found at:

http://www.language-archives.org/OLAC/1.1/static-repository.xml

5. Requirements on dynamic repositories

A dynamic repository is harder to implement since it requires the implementation of a CGI interface for the complete OAI protocol for metadata harvesting [OAI-PMH]. This is necessary, however, when the collection is large and needs to implement flow control to keep protocol responses to a reasonable size. The OAI community considers half a megabyte to be a reasonable response size. If the ListRecords response for all records in a repository would substantially exceed that size, then it may be necessary to implement a dynamic repository with flow control.

The implementation of a dynamic OLAC metadata repository has all the features of a minimal OAI repository implementation (as defined in [OAI-GRI]), except that a dynamic OLAC repository need not support the oai_dc metadata format. This is because the OLAC Aggregator [OLACA] provides that service for repositories that comply with this standard; see [OLAC-Display] for the specification of the olac to oai_dc crosswalk that is implemented by the Aggregator. In fact, unless the institution has reasons of its own to function independently as an OAI data provider, OLAC recommends that a dynamic repository not implement the oai_dc metadata format so that the translation of OLAC metadata to the oai_dc format will be done consistently across the community.

In addition to the requirements of a minimal OAI repository implementation, a dynamic OLAC metadata repository must comply with the following additional requirements.

  1. The Identify response must include an <oai-identifier> description and an <olac-archive> description.

  2. Include a <participant> element within the <olac-archive> description with an email address that exactly matches the <adminEmail> within the Identify response.

  3. The ListMetadataFormats response (when made with no additional parameters) must contain a specification for the olac metadata prefix that declares the schema and namespace for the version of OLAC metadata that is being used. For example,

    <oai:metadataFormat>
       <oai:metadataPrefix>olac</oai:metadataPrefix>
       <oai:schema>http://www.language-archives.org/OLAC/1.1/olac.xsd</oai:schema>
       <oai:metadataNamespace>http://www.language-archives.org/OLAC/1.1/</oai:metadataNamespace>
    </oai:metadataFormat>
  4. When the metadataPrefix argument to ListIdentifiers is specified as olac, the request must respond with at least one record.

  5. When the metadataPrefix argument to GetRecord is specified as olac, the <oai:metadata> element of the response must either be empty (when no OLAC metadata is available for the given identifier) or it must contain an <olac:olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. That element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the OLAC metadata schema that is being used.

  6. When the metadataPrefix argument to ListRecords is specified as olac, every <oai:metadata> element in the response must contain an <olac:olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. Each such element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the metadata schema that is being used.

6. Guidelines concerning relevance and granularity

When a request is made to register a metadata repository with OLAC, it is first tested for conformance to the requirements listed in the sections above. When these are met, the registration request is reviewed by the OLAC Council (see [OLAC-Process]) before final acceptance. The role of the Council in the registration process is to ensure that all registered archives meet the following guidelines concerning relevance and granularity.

Regarding relevance, in order to be eligible for registration as an OLAC archive:

Regarding the granularity of repositories, a repository is meant to catalog all the holdings of an archive, rather than having separate repositories for each of the collections within an archive. Thus,

Regarding the granularity of the records in a repository, the basic guideline is this:

For published resources, the publication unit typically constitutes the appropriate unit for the OLAC metadata record. For unpublished papers presenting findings of research, these closely parallel typical published works, and can be treated at a comparable level in an OLAC metadata record. For primary source materials (e.g., recordings, transcriptions, annotations, notes, data sets), the typical practice of archivists is to gather such materials into collections based on shared provenance—this is, based on having a common origin and history. These collections are then the primary units for description in OLAC metadata records.

See Section 5 of the OLAC Metadata Usage Guidelines [OLAC-Usage], for a more in-depth discussion of the principle of provenance as applied to collections and metadata within the OLAC context.


References

[OAI-GRI]Guidelines for Repository Implementers, Document Version 2002/06/09.
<http://www.openarchives.org/OAI/2.0/guidelines-repository.htm>
[OAI-Ids]Specification and XML Schema for the OAI Identifier Format, Document Version 2002/06/21.
<http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm>
[OAI-PMH]The Open Archives Initiative Protocol for Metadata Harvesting, Version 2.0 (2002-06-14).
<http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm>
[OAI-SR]Specification for an OAI Static Repository and an OAI Static Repository Gateway.
<http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm>
[OLAC-Display]Specifications for an OLAC metadata display format and an OLAC-to-OAI_DC crosswalk.
<http://www.language-archives.org/NOTE/olac_display.html>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLAC-Process]OLAC Process.
<http://www.language-archives.org/OLAC/process.html>
[OLAC-Usage]OLAC Metadata Usage Guidelines.
<http://www.language-archives.org/NOTE/usage.html>
[OLACA]OLAC Aggregator.
<http://www.language-archives.org/cgi-bin/olaca.pl>
[W3CDTF]Date and Time Formats, W3C Note.
<http://www.w3.org/TR/NOTE-datetime>