OLAC Record
oai:catalogue.elra.info:ELRA-W0120

Metadata
Title:NUM 5M Mongolian written corpus
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2017-07-12
Date Issued (W3CDTF):2017-07-12
Date Modified (W3CDTF):2017-08-17
Description:This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises:- 144 texts from laws until 2009, - 288 texts from literature that is currently being used in the primary and secondary school text books in Mongolia (including stories, novels, novelettes),- 1,134 editorals from the printed newspaper "Unen" dating from 1984 to 1989,- 2,477 online newswire texts dating from 2003 to 2009. Part of this corpus, about 2,800 sentences with 100,000 words, has been POS-tagged manually and stored in XML TEI format.
Identifier:ELRA-W0120
ISLRN: 492-817-146-504-9
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-W0120/
Language:Mongolian
Language (ISO639):mon
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0120
DateStamp:  2017-07-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2017. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_mon olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0120
Up-to-date as of: Fri Apr 19 6:29:31 EDT 2024