OLAC Record

Title:Helsinki Corpus of Swahili
Access Rights: Rights available for: commercialUse
Date Available (W3CDTF):2017-07-12
Date Issued (W3CDTF):2017-07-12
Date Modified (W3CDTF):2017-07-12
Description:This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The corpus contains prose text from fiction, news media and government documents domains, from the period between 1953 and 2016.This package contains:-the Helsinki Corpus of Swahili 2.0 Non Annotated Version, which contains the raw material formatted and corrected. -the Helsinki Corpus of Swahili 2.0 Annotated version, annotated with Salama Tagger and with metadata added to each file. The source texts were collected from the Web (texts in news media between 1988-2016 and open government webpages between 2004 and 2006) and from books (between 1953 and 1991, scanned and proofread). Part of the oldest news material before the time of scanners was manually typed. Old material contains material collected before 2003: Books and News New material contains a section Bunge (Hansards of the Tanzanian Parliament from the years 2004, 2005 and 2006) and a section News (from 2004-2015).A word in the annotated corpus contains normally the following types of information: token, stem, part-of-speech, morphological description, syntactic tag, rest of verb description.The corpus was prepared at the University of Helsinki, Department of Asian and African Studies under auspices of Prof. Arvi Hurskainen. It is available from ELRA for commercial use only. For academic use, it is accessible via Kielipankki - the Language Bank of Finland in Korp (https://korp.csc.fi/).A corpus version with English glosses, where each word in corpus is provided with one or more lexical equivalents, can be distributed upon demand (terms to be discussed on a case by case basis).
ISLRN: 941-187-059-145-7
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-W0119/
Language:Swahili (macrolanguage)
Language (ISO639):swa
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0119
DateStamp:  2017-07-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2017. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_swa olac_primary_text

Up-to-date as of: Fri Apr 19 6:27:31 EDT 2024