Archive Report Cards: User Guide


Table of contents

  1. Introduction
  2. Star Rating
  3. Archive Diversity
  4. Metadata Quality
  5. Core Elements Per Record
  6. Core Element Usage
  7. Code Usage
  8. Element and Code Usage

  9. References

1. Introduction

This document explains the statistical information contained in the Archive Report Cards, generated by archiveReportCard.php.

2. Star Rating

The archive star rating is a representation of the average item score for the archive. It is caluculated:

    round( (Average item score out of 10)/2 )

to give a star rating out of five.

3. Archive Diversity

For the subject and type fields, these percentages show:

    Diversity = (Distinct code values / Number instances of element) * 100

This gives an indication of the diversity of the information held by the archive.

4. Metadata Quality

Graph showing the frequency of record scores within the archive.

The quality of metadata is assessed against best practice guidelines as at http://www.language-archives.org/REC/olac-extensions.html as well as the existence of certain XML elements according to their usage statistics. Each item receives a score between 0 and 10, used for results ordering.

The scoring of metadata is contained in the source file metadataScoring.php.

For each element which has an associated extension code from a controlled vocabulary, one point is scored if a code attribute is used. This is converted into a proportion of elements which use codes against the total elements in a record which have an associated controlled vocabulary.

      Code exists score =
            ( Number of elements containing code attributes ) / ( Number of elements in the record of type with associated code )

This returns a fraction of code usage between 0 and 1.

Points are deducted when a record does not contain any instances of elements which are deemed important to any metadata record. The following elements have been deemed necessary in every record based upon element usage:

For each of these elements which is absent, a score of (1/5) is deducted from the record score. This implies equal weighting of the deduction of points for absence of any of the core elements.

      Element absent deductions =
            ( Number of core elements absent ) / ( Number of core elements )

This results in a score between 0 and 1.

These scores are then weighted:

      Score = 10 * ( (1/1) * (code exists score) - (1/5) * (element absent deductions) )

to return an integer score out of 10 for each record. These scores are held in a table relating each item to a score out of 10. At the time of searching, this score is combined with the element usage score to order search results.

See archiveReportCard.php for a summary of record quality scores across OLAC archives.

5. Core Element Per Record

The percentage of records which have n of the core elements present at least once.

6. Core Element Usage

Percentage of records which contain the named elements at least once. Red highlights elements which are not used in all records from this archive.

7. Code Usage

Displays the number of times a element (which has an associated code attribute) was used by the archive, and the percentage of those elements which used a code attribute. Red highlights elements which did not contain code attributes in all instances of that element.

8. Element and Code Usage

Number of times a element is used. Where applicable, the number of times that a code attribute is used with that element. Red highlights elements which do not use attributes in all instances of that element.

References

  Recommended metadata extensions
http://www.language-archives.org/REC/olac-extension.html
  Baden Hughes, 2004. Metadata Quality Evaluation: Experience from the Open Language Archives Community. Proceedings of the 7th International Conference on Asian Digital Libraries (ICADL 2004). Lecture Notes on Computer Science 3334. pp 320-329. Springer-Verlag.
 l Baden Hughes and Amol Kamat, 2005. A Metadata Search Engine for Digital Language Archives. DLib Magazine 11(2), February 2005. [Online]