OLAC Record: CSC Deceptive Speech

OLAC Record
oai:www.ldc.upenn.edu:LDC2013S09

Metadata

Title: CSC Deceptive Speech

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Columbia University, SRI International, and University of Colorado Boulder. CSC Deceptive Speech LDC2013S09. Web Download. Philadelphia: Linguistic Data Consortium, 2013

Contributor: Columbia University

International, SRI

University of Colorado Boulder

Date (W3CDTF): 2013

Date Issued (W3CDTF): 2013-11-15

Description: *Introduction* CSC Deceptive Speech was developed by Columbia University, SRI International and University of Colorado Boulder. It consists of 32 hours of audio interviews from 32 native speakers of Standard American English (16 male,16 female) recruited from the Columbia University student population and the community. The purpose of the study was to distinguish deceptive speech from non-deceptive speech using machine learning techniques on extracted features from the corpus. The participants were told that they were participating in a communication experiment which sought to identify people who fit the profile of the top entrepreneurs in America. To this end, the participants performed tasks and answered questions in six areas. They were later told that they had received low scores in some of those areas and did not fit the profile. The subjects then participated in an interview where they were told to convince the interviewer that they had actually achieved high scores in all areas and that they did indeed fit the profile. The task of the interviewer was to determine how he thought the subjects had actually performed, and he was allowed to ask them any questions other than those that were part of the performed tasks. For each question from the interviewer, subjects were asked to indicate whether the reply was true or contained any false information by pressing one of two pedals hidden from the interviewer under a table. *Data* Interviews were conducted in a double-walled sound booth and recorded to digital audio tape on two channels using Crown CM311A Differoid headworn close-talking microphones, then downsampled to 16kHz before processing. The interviews were orthographically transcribed by hand using the NIST EARS transcription guidelines. Labels for local lies were obtained automatically from the pedal-press data and hand-corrected for alignment, and labels for global lies were annotated during transcription based on the known scores of the subjects versus their reported scores. The orthographic transcription was force-aligned using the SRI telephone speech recognizer adapted for full-bandwidth recordings. There are several segmentations associated with the corpus: the implicit segmentation of the pedal presses, derived semi-automatically sentence-like units (EARS SLASH-UNITS or SUs) which were hand labeled, intonational phrase units and the units corresponding to each topic of the interview. Transcript files are in .trs format and audio files are .wav presented in flac-compressed form for this release. *Samples* Please view these audio and transcript samples for the interviewer side of a conversation.. *Updates* On May 22, 2014 an additional documentation file was added to explain the questions participants were asked.

Extent: Corpus size: 1597744 KB

Format: Sampling Rate: 16000

Sampling Format: pcm

Identifier: LDC2013S09

https://catalog.ldc.upenn.edu/LDC2013S09

ISBN: 1-58563-660-6

ISLRN: 030-491-638-667-9

DOI: 10.35111/q500-9a28

Language: English

Language (ISO639): eng

License: CSC Deceptive Speech: https://catalog.ldc.upenn.edu/license/csc-deceptive-speech.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2013S09

Rights Holder: Portions © 2013 The Trustees of Columbia University, Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2013S09

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Columbia University; International, SRI; University of Colorado Boulder. 2013. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2013S09
Up-to-date as of: Wed Oct 29 7:00:26 EDT 2025

Metadata
Title:		CSC Deceptive Speech
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Columbia University, SRI International, and University of Colorado Boulder. CSC Deceptive Speech LDC2013S09. Web Download. Philadelphia: Linguistic Data Consortium, 2013
Contributor:		Columbia University
		International, SRI
		University of Colorado Boulder
Date (W3CDTF):		2013
Date Issued (W3CDTF):		2013-11-15
Description:		Introduction CSC Deceptive Speech was developed by Columbia University, SRI International and University of Colorado Boulder. It consists of 32 hours of audio interviews from 32 native speakers of Standard American English (16 male,16 female) recruited from the Columbia University student population and the community. The purpose of the study was to distinguish deceptive speech from non-deceptive speech using machine learning techniques on extracted features from the corpus. The participants were told that they were participating in a communication experiment which sought to identify people who fit the profile of the top entrepreneurs in America. To this end, the participants performed tasks and answered questions in six areas. They were later told that they had received low scores in some of those areas and did not fit the profile. The subjects then participated in an interview where they were told to convince the interviewer that they had actually achieved high scores in all areas and that they did indeed fit the profile. The task of the interviewer was to determine how he thought the subjects had actually performed, and he was allowed to ask them any questions other than those that were part of the performed tasks. For each question from the interviewer, subjects were asked to indicate whether the reply was true or contained any false information by pressing one of two pedals hidden from the interviewer under a table. Data Interviews were conducted in a double-walled sound booth and recorded to digital audio tape on two channels using Crown CM311A Differoid headworn close-talking microphones, then downsampled to 16kHz before processing. The interviews were orthographically transcribed by hand using the NIST EARS transcription guidelines. Labels for local lies were obtained automatically from the pedal-press data and hand-corrected for alignment, and labels for global lies were annotated during transcription based on the known scores of the subjects versus their reported scores. The orthographic transcription was force-aligned using the SRI telephone speech recognizer adapted for full-bandwidth recordings. There are several segmentations associated with the corpus: the implicit segmentation of the pedal presses, derived semi-automatically sentence-like units (EARS SLASH-UNITS or SUs) which were hand labeled, intonational phrase units and the units corresponding to each topic of the interview. Transcript files are in .trs format and audio files are .wav presented in flac-compressed form for this release. Samples Please view these audio and transcript samples for the interviewer side of a conversation.. Updates On May 22, 2014 an additional documentation file was added to explain the questions participants were asked.
Extent:		Corpus size: 1597744 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: pcm
Identifier:		LDC2013S09
		https://catalog.ldc.upenn.edu/LDC2013S09
		ISBN: 1-58563-660-6
		ISLRN: 030-491-638-667-9
		DOI: 10.35111/q500-9a28
Language:		English
Language (ISO639):		eng
License:		CSC Deceptive Speech: https://catalog.ldc.upenn.edu/license/csc-deceptive-speech.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2013S09
Rights Holder:		Portions © 2013 The Trustees of Columbia University, Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2013S09
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Columbia University; International, SRI; University of Colorado Boulder. 2013. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text