OLAC Record
oai:www.ldc.upenn.edu:LDC2004T16

Metadata
Title:2001 Communicator Dialogue Act Tagged
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Prasad, Rashmi, and Marilyn Walker. 2001 Communicator Dialogue Act Tagged LDC2004T16. Web Download. Philadelphia: Linguistic Data Consortium, 2004
Contributor:Prasad, Rashmi
Walker, Marilyn
Date (W3CDTF):2004
Date Issued (W3CDTF):2004-06-15
Description:*Introduction* 2001 Communicator Dialogue Act Tagged was produced by the Linguistic Data Consortium (LDC) and contains approximately 1.15 million words of system and user interactions with entity and dialogue act tagging. This corpus is an addendum to the 2001 Communicator Evaluation (LDC2003S01) corpus produced by LDC in 2003. This addendum contains annotations on the transcriptions of the system and user utterances as taken from the corrected log files of the 2001 Communicator Evaluation corpus. Corrections were done manually for missing or misaligned time-stamps on turn/utterance boundaries. Dialogue Act Annotations are provided for system utterances in the dialogues. The dialogue act tags follow the DATE (Dialogue Act Tagging for Evaluation) scheme. In addition, both system and user utterances are tagged for named entities. For further description of the 2001 Communicator Evaluation corpus, please refer to the main publication from 2003 linked above. *Data* The complete Dialogue Act annotated corpus is available as a single XML text file totalling approximately 67 MB. Here is the breakdown for dialogues and dialogue acts: Dialogues Dialogue Acts Tagged Dialogue Acts Unique Tags 1,683 85,881 82,277 68 Dialogue Act tagging was done automatically using pattern matching with human-labeled dialogue utterances used by the nine different participating Communicator Systems. Named entity tagging also followed the same methodology. Each dialogue is segmented into system and user turns. Here is a breakdown of the distribution of turns, utterances, and words: System User Total Turns 39,419 39,299 78,718 Utterances 39,417 50,249 89,666 Words 1,048,311 103,019 1,151,330 *Samples* For an example of the data in this corpus, please view this sample (XML). *Sponsorship* This research was conducted using funding from the following grant number and funding agency: DARPA contract MDA972-99-3-0003. *Updates* None at this time.
Identifier:LDC2004T16
https://catalog.ldc.upenn.edu/LDC2004T16
ISBN: 1-58563-306-2
ISLRN: 137-996-514-791-4
DOI: 10.35111/r53v-7r46
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2004T16
Rights Holder:Portions © 2004 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2004T16
DateStamp:  2022-04-08
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Prasad, Rashmi; Walker, Marilyn. 2004. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2004T16
Up-to-date as of: Mon Mar 25 7:19:44 EDT 2024