OLAC Record

Title:Persian 1984 corpus (Multext-East framework)
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2010-09-27
Date Issued (W3CDTF):2010-09-27
Date Modified (W3CDTF):2010-09-27
Description:This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext-East framework (Multilingual Text Tools and Corpora for Eastern and Central European Languages). The aim of the Multext-East project was to develop standardized language resources.The package comprises: (i) the specifications for morphosyntactic encoding of Persian Language, based on the EAGLES/MULTEXT model and specific resources of MULTEXT-East, (ii) the annotated Persian version of Orwell’s 1984 corpus. The corpus contains extensive headers and markup for document structure, sentences, and various sub-sentence annotations in the XML-format following the TEI guidelines. Annotation includes POS (part-of-speech) and lemmas. The corpus contains approximately 100,000 words (6,604 sentences, 13,247 lemmas) and can easily be aligned with other corpora in the MULTEXT-East framework.
ISLRN: 851-240-629-673-1
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-W0054/
Language (ISO639):fas
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0054
DateStamp:  2010-09-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2010. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_fas olac_primary_text

Up-to-date as of: Fri Apr 19 6:29:01 EDT 2024