OLAC Record: 2019 OpenSAT Public Safety Communications Simulation

OLAC Record
oai:www.ldc.upenn.edu:LDC2023S06

Metadata

Title: 2019 OpenSAT Public Safety Communications Simulation

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Delgado, Dana, et al. 2019 OpenSAT Public Safety Communications Simulation LDC2023S06. Web Download. Philadelphia: Linguistic Data Consortium, 2023

Contributor: Delgado, Dana

Jones, Karen

Walker, Kevin

Strassel, Stephanie

Caruso, Christopher

Graff, David

Date (W3CDTF): 2023

Date Issued (W3CDTF): 2023-08-15

Description: *Introduction* 2019 OpenSAT Public Safety Communications Simulation was developed by the Linguistic Data Consortium (LDC) and contains approximately 141 hours of speech recordings and transcripts used in the used in the National Institute of Standards and Technology (NIST) Open Speech Analytic Technologies (OpenSAT) 2019 evaluation's automatic speech recognition, speech activity detection, and keyword search tasks. The data is a portion of the Speech Analysis For Emergency Response Technology (SAFE-T) corpus, which was created by LDC under the NIST Public Safety project in support of NIST's OpenSAT evaluation campaign. The NIST OpenSAT evaluation series was designed to bring together researchers developing different types of technologies to address speech analytic challenges present in some of the most difficult acoustic conditions with the end goal of improving the state-of-the-art through objective, large-scale common evaluations. The SAFE-T corpus contains speakers engaged in a collaborative problem-solving activity representative of public safety communications in terms of speech content, noise types and noise levels. *Data* US English speakers played the board game Flash Point Fire Rescue. Background noise was played through a participant's headset during the recording session. Recording sessions consisted of two 30-minute games. This corpus contains training, development and evaluation data. Development and evaluation audio files consist of four 3-minute snippets selected from the six sections of five minutes each drawn from the 30-minute recording. All recordings are single channel. The background noise was mixed into the single channel recording at a reduced level. Audio data is presented as 48KHz 16-bit mono flac files. Transcripts are in tab-separated, .tsv format with UTF-8 encoding. *Samples* Please view these samples: * audio (FLAC) * transcript (TSV) *Updates* None at this time.

Extent: Corpus size: 19023375 KB

Format: Sampling Rate: 48000

Sampling Format: PCM

Identifier: LDC2023S06

https://catalog.ldc.upenn.edu/LDC2023S06

ISLRN: 443-338-774-840-7

DOI: 10.35111/7z20-jg48

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2023S06

Rights Holder: Portions © 2023 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2023S06

DateStamp: 2023-12-05

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Delgado, Dana; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Caruso, Christopher; Graff, David. 2023. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2023S06
Up-to-date as of: Tue May 20 0:15:23 EDT 2025

Metadata
Title:		2019 OpenSAT Public Safety Communications Simulation
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Delgado, Dana, et al. 2019 OpenSAT Public Safety Communications Simulation LDC2023S06. Web Download. Philadelphia: Linguistic Data Consortium, 2023
Contributor:		Delgado, Dana
		Jones, Karen
		Walker, Kevin
		Strassel, Stephanie
		Caruso, Christopher
		Graff, David
Date (W3CDTF):		2023
Date Issued (W3CDTF):		2023-08-15
Description:		Introduction 2019 OpenSAT Public Safety Communications Simulation was developed by the Linguistic Data Consortium (LDC) and contains approximately 141 hours of speech recordings and transcripts used in the used in the National Institute of Standards and Technology (NIST) Open Speech Analytic Technologies (OpenSAT) 2019 evaluation's automatic speech recognition, speech activity detection, and keyword search tasks. The data is a portion of the Speech Analysis For Emergency Response Technology (SAFE-T) corpus, which was created by LDC under the NIST Public Safety project in support of NIST's OpenSAT evaluation campaign. The NIST OpenSAT evaluation series was designed to bring together researchers developing different types of technologies to address speech analytic challenges present in some of the most difficult acoustic conditions with the end goal of improving the state-of-the-art through objective, large-scale common evaluations. The SAFE-T corpus contains speakers engaged in a collaborative problem-solving activity representative of public safety communications in terms of speech content, noise types and noise levels. Data US English speakers played the board game Flash Point Fire Rescue. Background noise was played through a participant's headset during the recording session. Recording sessions consisted of two 30-minute games. This corpus contains training, development and evaluation data. Development and evaluation audio files consist of four 3-minute snippets selected from the six sections of five minutes each drawn from the 30-minute recording. All recordings are single channel. The background noise was mixed into the single channel recording at a reduced level. Audio data is presented as 48KHz 16-bit mono flac files. Transcripts are in tab-separated, .tsv format with UTF-8 encoding. Samples Please view these samples: * audio (FLAC) * transcript (TSV) Updates None at this time.
Extent:		Corpus size: 19023375 KB
Format:		Sampling Rate: 48000
Format:		Sampling Format: PCM
Identifier:		LDC2023S06
		https://catalog.ldc.upenn.edu/LDC2023S06
		ISLRN: 443-338-774-840-7
		DOI: 10.35111/7z20-jg48
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2023S06
Rights Holder:		Portions © 2023 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2023S06
DateStamp:		2023-12-05
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Delgado, Dana; Jones, Karen; Walker, Kevin; Strassel, Stephanie; Caruso, Christopher; Graff, David. 2023. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text