Sample Metadata Record

oai:www.clarin.si:11356/1044


XML format

<olac:olac>
<dc:title>MULTEXT-East "1984" document corpus 4.0</dc:title>
<dc:creator>Erjavec, Tomaž</dc:creator>
<dc:creator>Bruda, Ştefan</dc:creator>
<dc:creator>Dimitrova, Ludmila</dc:creator>
<dc:creator>Ide, Nancy</dc:creator>
<dc:creator>Kaalep, Heiki-Jaan</dc:creator>
<dc:creator>Krstev, Cvetana</dc:creator>
<dc:creator>Orav, Heili</dc:creator>
<dc:creator>Oravecz, Csaba</dc:creator>
<dc:creator>Paldre, Leho</dc:creator>
<dc:creator>Petkevič, Vladimír</dc:creator>
<dc:creator>Priest-Dorman, Greg</dc:creator>
<dc:creator>Simov, Kiril</dc:creator>
<dc:creator>Sinapova, Lydia</dc:creator>
<dc:creator>Sokolovsky, Paul</dc:creator>
<dc:creator>Sryvkin, Sergey</dc:creator>
<dc:creator>Tufiş, Dan</dc:creator>
<dc:creator>Utka, Andrius</dc:creator>
<dc:creator>Villandi, Viire</dc:creator>
<dc:creator>Vitas, Duško</dc:creator>
<dc:creator>Vuković, Olga</dc:creator>
<dc:date xsi:type="dcterms:W3CDTF">2015-06-15T08:56:08Z</dc:date>
<dcterms:available>2015-06-15T08:56:08Z</dcterms:available>
<dc:description>The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences. The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages.</dc:description>
<dc:identifier xsi:type="dcterms:URI">http://hdl.handle.net/11356/1044</dc:identifier>
<dcterms:bibliographicCitation>http://hdl.handle.net/11356/1044</dcterms:bibliographicCitation>
<dc:language xsi:type="olac:language" olac:code="bul"/>
<dc:language xsi:type="olac:language" olac:code="ces"/>
<dc:language xsi:type="olac:language" olac:code="eng"/>
<dc:language xsi:type="olac:language" olac:code="est"/>
<dc:language xsi:type="olac:language" olac:code="hun"/>
<dc:language xsi:type="olac:language" olac:code="lit"/>
<dc:language xsi:type="olac:language" olac:code="ron"/>
<dc:language xsi:type="olac:language" olac:code="rus"/>
<dc:language xsi:type="olac:language" olac:code="slv"/>
<dc:language xsi:type="olac:language" olac:code="srp"/>
<dc:publisher>Jožef Stefan Institute</dc:publisher>
<dc:rights>Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
<dc:subject>parallel corpus</dc:subject>
<dc:subject>multilingual</dc:subject>
<dc:subject>TEI</dc:subject>
<dc:type>corpus</dc:type>
<dc:type xsi:type="dcterms:DCMIType">Text</dc:type>
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
</olac:olac>

Display format

 Title  MULTEXT-East "1984" document corpus 4.0
 Creator  Erjavec, Tomaž
 Creator  Bruda, Ştefan
 Creator  Dimitrova, Ludmila
 Creator  Ide, Nancy
 Creator  Kaalep, Heiki-Jaan
 Creator  Krstev, Cvetana
 Creator  Orav, Heili
 Creator  Oravecz, Csaba
 Creator  Paldre, Leho
 Creator  Petkevič, Vladimír
 Creator  Priest-Dorman, Greg
 Creator  Simov, Kiril
 Creator  Sinapova, Lydia
 Creator  Sokolovsky, Paul
 Creator  Sryvkin, Sergey
 Creator  Tufiş, Dan
 Creator  Utka, Andrius
 Creator  Villandi, Viire
 Creator  Vitas, Duško
 Creator  Vuković, Olga
 Date  (W3CDTF)  2015-06-15T08:56:08Z
 Available  2015-06-15T08:56:08Z
 Description  The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences. The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages.
 Identifier (URI)  http://hdl.handle.net/11356/1044
 Bibliographic Citation  http://hdl.handle.net/11356/1044
 Language (ISO639-3)  Bulgarian [bul]
 Language (ISO639-3)  Czech [ces]
 Language (ISO639-3)  English [eng]
 Language (ISO639-3)  Estonian [est]
 Language (ISO639-3)  Hungarian [hun]
 Language (ISO639-3)  Lithuanian [lit]
 Language (ISO639-3)  Romanian [ron]
 Language (ISO639-3)  Russian [rus]
 Language (ISO639-3)  Slovenian [slv]
 Language (ISO639-3)  Serbian [srp]
 Publisher  Jožef Stefan Institute
 Rights  Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
 Rights  https://creativecommons.org/licenses/by-nc-sa/4.0/
 Subject  parallel corpus
 Subject  multilingual
 Subject  TEI
 Type  corpus
 Type (DCMI)  Text
 Type (OLAC)  Linguistic type: Primary text

Metadata quality analysis

OLAC metadata records are scored for metadata quality on a 10-point scale explained in OLAC Metadata Metrics. The score for the above record (along with comments on changes that could improve the score) is as follows:

Component + - Comments
Title   1   0 
Date   1   0 
Agent   1   0 
About   1   0 
Depth   1   0 
Content Language   1   0 
Subject Language   1   0 
OLAC Type   1   0 
DCMI Type   1   0 
Precision   0.67   0.33  For the full score, make use of at least one more encoding scheme in addition to the ones counted explicitly in other components of the score. For instance,
  • olac:role on dc:creator or dc:contributor
  • use dcterms:URI when the value of an element is a URL
  • use dcterms:IMT on dc:format
Quality score  9.67