OLAC Record oai:www.ldc.upenn.edu:LDC2003T10 |
Metadata | ||
Title: | SAID | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Kuiper, Koenraad, et al. SAID LDC2003T10. Web Download. Philadelphia: Linguistic Data Consortium, 2003 | |
Contributor: | Kuiper, Koenraad | |
McCann, Heather | ||
Quinn, Heidi | ||
Aitchison, Therese | ||
van der Veer, Kees | ||
Date (W3CDTF): | 2003 | |
Date Issued (W3CDTF): | 2003-06-26 | |
Description: | *Introduction* SAID (A Syntactically Annotated Idiom Dataset) was produced by the Linguistic Data Consortium (LDC) and contains source files and annotations for 13,467 phrasal lexical items in English. The purpose if this corpus is to provide data for investigating the structural configurations in which English idioms are typically found. The assumption was that, since idioms are phrasal lexical items (PLIs), they would therefore have structural properties which are idiosyncratic. In order to study the structural properties of phrasal lexical items, the data is more useful if it is syntactically annotated. The hope is that this data set will be useful for linguists. Those working in parsing and machine translation might find the data useful for priming linguistic analysis of new data and cutting down the search space for non-compositional phrases in parsing and machine translation algorithms. Some teachers of English as a second or foreign language may also find the structural analyses useful for grounding grammar learning in idioms which are often themselves memorable or at least worth knowing if you are a foreign language learner. *Data* The data was originally drawn from four dictionaries of English idioms: Cowie, Mackin (1975), Cowie, Mackin, and McCaig (1983), Long (1979), and Courteney (1983). Only citation forms, suitably adapted for this purpose, were used. The citation files were amalgamated. The rationale for the selection was that these are among the biggest and most comprehensive listings of English idioms. An assumption was made that many of the structural types would be represented. The analysis of the phrasal lexical items was manual, while the bracketing symmetry was checked computationally. In order to facilitate machine manipulation of the annotated data, the manual analysis was converted to PROLOG format. This involved expansions of those PLIs which had optional constituents so that both the case with and the case without the options were made available. The files are provided in text format, which each record separated by a carriage return. *Samples* Please view these samples: * SAID 1 (txt) * SAID 2 (txt) * SAID 3 (txt) * SAID 4 (txt) *Sponsorship* The New Zealand Vice Chancellors' Committee The University of Canterbury *Updates* There are no updates available at this time. | |
Extent: | Corpus size: 3481 KB | |
Identifier: | LDC2003T10 | |
https://catalog.ldc.upenn.edu/LDC2003T10 | ||
ISBN: 1-58563-268-6 | ||
ISLRN: 567-782-098-693-0 | ||
DOI: 10.35111/msvm-t728 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2003T10 | |
Rights Holder: | Portions © 2003 Koenraad Kuiper, Heather McCann, Heidi Quinn, Therese Aitchison, Kees van der Veer | |
Type (DCMI): | Text | |
Type (OLAC): | lexicon | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2003T10 | |
DateStamp: | 2024-09-12 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Kuiper, Koenraad; McCann, Heather; Quinn, Heidi; Aitchison, Therese; van der Veer, Kees. 2003. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_lexicon |