OLAC Record oai:www.ldc.upenn.edu:LDC2001T55 |
Metadata | ||
Title: | Arabic Newswire Part 1 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Graff, David, and Kevin Walker. Arabic Newswire Part 1 LDC2001T55. Web Download. Philadelphia: Linguistic Data Consortium, 2001 | |
Contributor: | Graff, David | |
Walker, Kevin | ||
Date (W3CDTF): | 2001 | |
Description: | *Introduction*
This publication contains the Arabic Newswire A Corpus, Linguistic Data Consortium (LDC) catalog number LDC2001T55 and ISBN 1-58563-190-6. The Arabic Newswire Corpus is composed of articles from the Agence France Presse (AFP) Arabic Newswire. The source material was tagged using TIPSTER-style SGML and was transcoded to Unicode (UTF-8). The corpus includes articles from May 13, 1994 to December 20, 2000.
*Data*
The data is in 2,337 compressed (zipped) Arabic text data files. There are 209 Mb of compressed data (869 Mb uncompressed) with approximately 383,872 documents containing 76 million tokens over approximately 666,094 unique words.
A template of the tagging is presented below.
One or More Paragraphs of Arabic Text | |
Extent: | Corpus size: 9728 KB | |
Identifier: | LDC2001T55 | |
https://catalog.ldc.upenn.edu/LDC2001T55 | ||
ISBN: 1-58563-190-6 | ||
ISLRN: 013-368-610-633-9 | ||
DOI: 10.35111/6at4-b624 | ||
Language: | Standard Arabic | |
Language (ISO639): | arb | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2001T55 | |
Rights Holder: | Portions © 1994-2000 Agence France Press, © 2001 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2001T55 | |
DateStamp: | 2024-02-06 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Graff, David; Walker, Kevin. 2001. Linguistic Data Consortium. | |
Terms: | area_Asia country_SA dcmi_Text iso639_arb olac_primary_text |