OLAC Record
oai:catalogue.elra.info:ELRA-E0045

Metadata
Title:MAURDOR Evaluation Package
Access Rights: Rights available for: nonCommercialUse, evaluationUse, commercialUse
Date Available (W3CDTF):2015-02-26
Date Issued (W3CDTF):2015-02-26
Date Modified (W3CDTF):2015-02-26
Description:The MAURDOR project consists in evaluating systems for automatic processing of written documents. Collected written documents are scanned documents (printed, typewritten or manuscripts). In order to get images for the evaluation of automatic analysis systems, 10,000 original documents were collected and annotated (5000 in French, 2500 in English and 2500 in Arabic). This package contains 8,129 documents out of the 10,000 originally collected. Each of the 8129 documents belongs to one of the 5 following categories: C1: Printed form (completed in manuscript) C2: Commercial, private or professional document, printed or photocopied C3: Manuscript private correspondence C4: Typewritten private or professional correspondence C5: Others Once collected, those documents were submitted to a manual annotation. This human analysis is used as a reference, known as ground truth, for the training and evaluation of automatic processing systems. Annotations aim to highlight the following information: 1. How the document is structured (text zones, images...)? 2. Which writings are present, with their type (manuscript/typewritten) and their language (French, English, Arabic, other)? 3. What is the main information in the documents (author, recipient, subject, date...)? The MAURDOR evaluation campaign provides a common framework for the reporting of current performances of systems for automatic processing of digital documents. This package contains the material provided to the campaign participants: - Consistent development and test data corresponding to the application concerned; - Tools for the automatic measurement of system performances; - A common assessment protocol applicable to each processing stage, along with a complete automatic processing chain for written documents. The documents are provided in TIFF format and the annotations are provided in XML format. The aim of this evaluation package is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
Identifier:ELRA-E0045
ISLRN: 364-018-517-901-2
Identifier (URI):http://catalog.elra.info/en-us/repository/browse/ELRA-E0045/
Language:English
Arabic
French
Language (ISO639):eng
ara
fra
Medium:downloadable
Publisher:ELRA (European Language Resources Association)
Type (DCMI):MovingImage
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-E0045
DateStamp:  2015-02-26
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2015. ELRA (European Language Resources Association).
Terms: area_Europe country_FR country_GB dcmi_MovingImage iso639_ara iso639_eng iso639_fra olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-E0045
Up-to-date as of: Fri Jan 17 20:52:49 EST 2020