OLAC Record
oai:www.ldc.upenn.edu:LDC2009T29

Metadata
Title:ACL Anthology Reference Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Kan, Min-Yen, and Steven Bird. ACL Anthology Reference Corpus LDC2009T29. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:Kan, Min-Yen
Bird, Steven
Date (W3CDTF):2009
Date Issued (W3CDTF):2009-12-17
Description:*Introduction* ACL Anthology Reference Corpus is a digital archive of 10,291 research papers in computational linguistics sponsored by the Association for Computational Linguistics (ACL). Also available from the ACL, this release contains most of the papers that appear up to February 2007 in the web-based ACL Anthology, a dynamic repository that currently hosts over 16,500 articles drawn from a range of conferences and workshops as well as past issues of the Computational Linguistics journal. The ACL Anthology Reference Corpus is designed to be a standard, real-world digital collection testbed for experiments in bibliographic and bibliometric research. The ACL is the international scientific and professional society for scholars working on problems involving natural language and computation. Membership includes the ACL quarterly journal, Computational Linguistics, reduced registration at most ACL-sponsored conferences, discounts on ACL-sponsored publications and participation in ACL Special Interest Groups. Since 1988, Computational Linguistics has been the primary forum for research on computational linguistics and natural language processing. *Data* The material in the ACL Anthology Reference Corpus was scanned at 600dpi grayscale for archival storage, down-sampled to 300dpi black-and-white, assembled into articles and stored in the "PDF Image with Hidden Text" format. Author and title metadata was extracted from the OCRed text and used to build HTML index pages. Older materials, such as conference proceedings from the 1960s and early volumes of Computational Linguistics, were manually digitized from microfiche slides. ACL Reference Anthology includes: * 10,921 PDF files in the pdf/anthology-PDF tree. * 13,551 files with metadata described in the metadata/anthology-XML tree * 84,542 pages in the PDF files
Extent:Corpus size: 15728640 KB
Identifier:LDC2009T29
https://catalog.ldc.upenn.edu/LDC2009T29
ISBN: 1-58563-531-6
ISLRN: 150-170-243-077-5
DOI: 10.35111/rfeg-z495
Language:English
Language (ISO639):eng
License:Creative Commons Attribution-NonCommercial-ShareAlike 3.0 (FP): https://catalog.ldc.upenn.edu/license/creative-commons-attribution-noncommercial-sharealike-3-dot-0-fp.pdf
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 (NFP, Non-Member): https://catalog.ldc.upenn.edu/license/creative-comons-attribution-noncommercial-sharealike-3-dot-0-unported.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2009T29
Rights Holder:Portions © 1963-2006 Association for Computational Linguistics, © 2009 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2009T29
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Kan, Min-Yen; Bird, Steven. 2009. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009T29
Up-to-date as of: Mon Mar 25 7:20:24 EDT 2024