Sample Metadata Record

oai:www.clarin.si:11356/1044

XML format

<olac:olac>
   <dc:title>MULTEXT-East "1984" document corpus 4.0</dc:title>
   <dc:creator>Erjavec, Tomaž</dc:creator>
   <dc:creator>Bruda, Ştefan</dc:creator>
   <dc:creator>Dimitrova, Ludmila</dc:creator>
   <dc:creator>Ide, Nancy</dc:creator>
   <dc:creator>Kaalep, Heiki-Jaan</dc:creator>
   <dc:creator>Krstev, Cvetana</dc:creator>
   <dc:creator>Orav, Heili</dc:creator>
   <dc:creator>Oravecz, Csaba</dc:creator>
   <dc:creator>Paldre, Leho</dc:creator>
   <dc:creator>Petkevič, Vladimír</dc:creator>
   <dc:creator>Priest-Dorman, Greg</dc:creator>
   <dc:creator>Simov, Kiril</dc:creator>
   <dc:creator>Sinapova, Lydia</dc:creator>
   <dc:creator>Sokolovsky, Paul</dc:creator>
   <dc:creator>Sryvkin, Sergey</dc:creator>
   <dc:creator>Tufiş, Dan</dc:creator>
   <dc:creator>Utka, Andrius</dc:creator>
   <dc:creator>Villandi, Viire</dc:creator>
   <dc:creator>Vitas, Duško</dc:creator>
   <dc:creator>Vuković, Olga</dc:creator>
   <dc:date xsi:type="dcterms:W3CDTF">2015-06-15T08:56:08Z</dc:date>
   <dcterms:available>2015-06-15T08:56:08Z</dcterms:available>
   <dc:description>The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. 

This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences.

The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages.</dc:description>
   <dc:identifier xsi:type="dcterms:URI">http://hdl.handle.net/11356/1044</dc:identifier>
   <dcterms:bibliographicCitation>http://hdl.handle.net/11356/1044</dcterms:bibliographicCitation>
   <dc:language xsi:type="olac:language" olac:code="bul"/>
   <dc:language xsi:type="olac:language" olac:code="ces"/>
   <dc:language xsi:type="olac:language" olac:code="eng"/>
   <dc:language xsi:type="olac:language" olac:code="est"/>
   <dc:language xsi:type="olac:language" olac:code="hun"/>
   <dc:language xsi:type="olac:language" olac:code="lit"/>
   <dc:language xsi:type="olac:language" olac:code="ron"/>
   <dc:language xsi:type="olac:language" olac:code="rus"/>
   <dc:language xsi:type="olac:language" olac:code="slv"/>
   <dc:language xsi:type="olac:language" olac:code="srp"/>
   <dc:publisher>Jožef Stefan Institute</dc:publisher>
   <dc:rights>Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)</dc:rights>
   <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
   <dc:subject>parallel corpus</dc:subject>
   <dc:subject>multilingual</dc:subject>
   <dc:subject>TEI</dc:subject>
   <dc:type>corpus</dc:type>
   <dc:type xsi:type="dcterms:DCMIType">Text</dc:type>
   <dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
</olac:olac>

Display format

Title	MULTEXT-East "1984" document corpus 4.0
Creator	Erjavec, Tomaž
Creator	Bruda, Ştefan
Creator	Dimitrova, Ludmila
Creator	Ide, Nancy
Creator	Kaalep, Heiki-Jaan
Creator	Krstev, Cvetana
Creator	Orav, Heili
Creator	Oravecz, Csaba
Creator	Paldre, Leho
Creator	Petkevič, Vladimír
Creator	Priest-Dorman, Greg
Creator	Simov, Kiril
Creator	Sinapova, Lydia
Creator	Sokolovsky, Paul
Creator	Sryvkin, Sergey
Creator	Tufiş, Dan
Creator	Utka, Andrius
Creator	Villandi, Viire
Creator	Vitas, Duško
Creator	Vuković, Olga
Date (W3CDTF)	2015-06-15T08:56:08Z
Available	2015-06-15T08:56:08Z
Description	The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences. The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages.
Identifier (URI)	http://hdl.handle.net/11356/1044
Bibliographic Citation	http://hdl.handle.net/11356/1044
Language (ISO639-3)	Bulgarian [bul]
Language (ISO639-3)	Czech [ces]
Language (ISO639-3)	English [eng]
Language (ISO639-3)	Estonian [est]
Language (ISO639-3)	Hungarian [hun]
Language (ISO639-3)	Lithuanian [lit]
Language (ISO639-3)	Romanian [ron]
Language (ISO639-3)	Russian [rus]
Language (ISO639-3)	Slovenian [slv]
Language (ISO639-3)	Serbian [srp]
Publisher	Jožef Stefan Institute
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Rights	https://creativecommons.org/licenses/by-nc-sa/4.0/
Subject	parallel corpus
Subject	multilingual
Subject	TEI
Type	corpus
Type (DCMI)	Text
Type (OLAC)	Linguistic type: Primary text

Metadata quality analysis

OLAC metadata records are scored for metadata quality on a 10-point scale explained in OLAC Metadata Metrics. The score for the above record (along with comments on changes that could improve the score) is as follows:

Component	+	-	Comments
Title	1	0
Date	1	0
Agent	1	0
About	1	0
Depth	1	0
Content Language	1	0
Subject Language	1	0
OLAC Type	1	0
DCMI Type	1	0
Precision	0.67	0.33	For the full score, make use of at least one more encoding scheme in addition to the ones counted explicitly in other components of the score. For instance, olac:role on dc:creator or dc:contributor use dcterms:URI when the value of an element is a URL use dcterms:IMT on dc:format
Quality score	9.67