OLAC Record
oai:www.ldc.upenn.edu:LDC2004T10

Metadata
Title:ISL Meeting Transcripts Part 1
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Burger, Susanne, Victoria MacLaren, and Alex Waibel. ISL Meeting Transcripts Part 1 LDC2004T10. Web Download. Philadelphia: Linguistic Data Consortium, 2004
Contributor:Burger, Susanne
MacLaren, Victoria
Waibel, Alex
Date (W3CDTF):2004
Date Issued (W3CDTF):2004-05-21
Description:*Introduction* ISL Meeting Transcripts Part 1 was produced by Linguistic Data Consortium (LDC) and contains transcripts for 18 meetings representing 54 hours of audio in English. The ISL Meeting Corpus Part 1 is a first subset of the ISL Meeting Corpus (112 meetings). It contains 18 meetings collected in 2000 and 2001 at the Interactive Systems Laboratories at Carnegie Mellon University in Pittsburgh, PA. The recorded meetings were either natural meetings where participants needed to meet in the real world, or artificial meetings, which were designed explicitly for the purposes of data collection but still had real topics and tasks. The duration of the meetings in this corpus ranges from eight to 64 minutes and averages 34 minutes. The audio files are available as ISL Meeting Speech Part 1. *Data* This corpus consists of 19 word-level transcripts of 18 meetings (one transcription file per meeting except m039, which has two parts: m039a and m039b), time synchronized to digitized audio recordings. There are approximately 116,200 word tokens and 5,850 unique word types in the transcripts. The meetings were recorded with lapel microphones. The transcriptions were based on the lapel microphones recordings. The focus of the transcriptions was on capturing the flow of audible events, especially the words which were spoken, and who spoke them. The transcriptions contain additional annotations for spontaneous speech events and disfluencies. Transcriptions were prepared by means of the TransEdit transcription application. This application was developed for the transcription of multi-channel recordings and displays a synchronized multi-track view for all channels of a meeting with listening and segmentation function for each single channel separately. There are a total of 31 unique speakers in the corpus, 17 males and 14 females. Meetings involved anywhere from three to nine participants with an average of five. The corpus contains a significant proportion of non-native English speakers, varying in fluency. *Samples* For an example transcript, please click here. *Sponsorship* The collection and preparation of this corpus was made possible in large part through funding from DARPA, both through the GENOA project and through ROAR. *Updates* Additional information, updates, bug fixes may be avaibale on the ISL Meeting Room project page.
Identifier:LDC2004T10
https://catalog.ldc.upenn.edu/LDC2004T10
ISBN: 1-58563-295-3
ISLRN: 751-401-034-298-8
DOI: 10.35111/edwz-f430
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2004T10
Rights Holder:Portions © 2000-2003 Interactive Systems Laboratories, Carnegie Mellon University, Pittsburgh, © 2004 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2004T10
DateStamp:  2024-03-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Burger, Susanne; MacLaren, Victoria; Waibel, Alex. 2004. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2004T10
Up-to-date as of: Fri Dec 6 7:46:54 EST 2024