OLAC Record: SRI Speech-Based Collaborative Learning Corpus

OLAC Record
oai:www.ldc.upenn.edu:LDC2019S01

Metadata

Title: SRI Speech-Based Collaborative Learning Corpus

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Richey, Colleen, et al. SRI Speech-Based Collaborative Learning Corpus LDC2019S01. Web Download. Philadelphia: Linguistic Data Consortium, 2019

Contributor: Richey, Colleen

D'Angelo, Cynthia

Alozie, Nonye

Bratt, Harry

Shriberg, Elizabeth

Date (W3CDTF): 2019

Date Issued (W3CDTF): 2019-01-15

Description: *Introduction* SRI Speech-Based Collaborative Learning Corpus was developed by SRI International and is comprised of approximately 120 hours of English speech from 134 US middle school students working collaboratively. The data set also contains orthographic transcriptions, manual annotation of collaboration, log files, and supporting documentation. This collection was part of a project investigating the utility of a speech-based learning analytics approach to collaborative learning. The goal was to determine whether detectable patterns exist in student speech that correlate with collaborative learning indicators and to provide a means of assessing collaboration quality. The participants were students in middle schools (grades six, seven and eight) located in California. Students worked in groups of three on sets of short mathematics problems based on the "cloze" task in which each student was assigned one blank and each problem required the students to work together and talk to each other to coordinate their three answers. The problems were presented on iPads with a custom software application. *Data* The audio data was captured by both head-mounted and table-top microphones and is released as 16 kHz, 16-bit flac compressed pcm wav. Recording sessions were manually annotated with codes that mark indicators of collaboration (I codes) and that assess the overall collaboration quality of the interaction (Q codes). Annotations are presented as UTF-8 csv files. Also included in this corpus are orthorgraphic transcripts for a subset of the audio recordings and log files for iPad usage; both are released as UTF-8 encoded plain text. *Samples* Please view this speech sample and transcript sample. *Updates* None at this time.

Extent: Corpus size: 5127024 KB

Format: Sampling Rate: 16000

Sampling Format: pcm

Identifier: LDC2019S01

https://catalog.ldc.upenn.edu/LDC2019S01

ISBN: 1-58563-870-6

ISLRN: 199-041-455-836-2

DOI: 10.35111/1jsy-0150

Language: English

Language (ISO639): eng

License: SRI Speech-Based Collaborative Learning Corpus Agreement: https://catalog.ldc.upenn.edu/license/sri-speech-based-collaborative-learning-corpus-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2019S01

Rights Holder: Portions © 2019 SRI International, © 2019 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2019S01

DateStamp: 2021-06-09

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Richey, Colleen; D'Angelo, Cynthia; Alozie, Nonye; Bratt, Harry; Shriberg, Elizabeth. 2019. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2019S01
Up-to-date as of: Wed Oct 29 7:01:51 EDT 2025

Metadata
Title:		SRI Speech-Based Collaborative Learning Corpus
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Richey, Colleen, et al. SRI Speech-Based Collaborative Learning Corpus LDC2019S01. Web Download. Philadelphia: Linguistic Data Consortium, 2019
Contributor:		Richey, Colleen
		D'Angelo, Cynthia
		Alozie, Nonye
		Bratt, Harry
		Shriberg, Elizabeth
Date (W3CDTF):		2019
Date Issued (W3CDTF):		2019-01-15
Description:		Introduction SRI Speech-Based Collaborative Learning Corpus was developed by SRI International and is comprised of approximately 120 hours of English speech from 134 US middle school students working collaboratively. The data set also contains orthographic transcriptions, manual annotation of collaboration, log files, and supporting documentation. This collection was part of a project investigating the utility of a speech-based learning analytics approach to collaborative learning. The goal was to determine whether detectable patterns exist in student speech that correlate with collaborative learning indicators and to provide a means of assessing collaboration quality. The participants were students in middle schools (grades six, seven and eight) located in California. Students worked in groups of three on sets of short mathematics problems based on the "cloze" task in which each student was assigned one blank and each problem required the students to work together and talk to each other to coordinate their three answers. The problems were presented on iPads with a custom software application. Data The audio data was captured by both head-mounted and table-top microphones and is released as 16 kHz, 16-bit flac compressed pcm wav. Recording sessions were manually annotated with codes that mark indicators of collaboration (I codes) and that assess the overall collaboration quality of the interaction (Q codes). Annotations are presented as UTF-8 csv files. Also included in this corpus are orthorgraphic transcripts for a subset of the audio recordings and log files for iPad usage; both are released as UTF-8 encoded plain text. Samples Please view this speech sample and transcript sample. Updates None at this time.
Extent:		Corpus size: 5127024 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: pcm
Identifier:		LDC2019S01
		https://catalog.ldc.upenn.edu/LDC2019S01
		ISBN: 1-58563-870-6
		ISLRN: 199-041-455-836-2
		DOI: 10.35111/1jsy-0150
Language:		English
Language (ISO639):		eng
License:		SRI Speech-Based Collaborative Learning Corpus Agreement: https://catalog.ldc.upenn.edu/license/sri-speech-based-collaborative-learning-corpus-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2019S01
Rights Holder:		Portions © 2019 SRI International, © 2019 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2019S01
DateStamp:		2021-06-09
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Richey, Colleen; D'Angelo, Cynthia; Alozie, Nonye; Bratt, Harry; Shriberg, Elizabeth. 2019. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text