OLAC Record
oai:www.ldc.upenn.edu:LDC2015T22

Metadata
Title:Karlsruhe Children's Text
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Fay, Johanna. Karlsruhe Children's Text LDC2015T22. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:Fay, Johanna
Date (W3CDTF):2015
Date Issued (W3CDTF):2015-10-15
Description:*Introduction* Karlsruhe Children's Text was developed by the Cooperative State University Baden-Württemberg, University of Education and Karlsruhe Institute of Technology. It consists of over 14,000 freely written, German sentences from more than 1,700 school children in grades one through eight. The data collection was conducted in 2011-2013 at elementary and secondary schools in and around Karlsruhe, Germany. Students were asked to write as verbose a text as possible. Those in grades one to four were read two stories and were then asked to write their own stories. Students in grades five through eight were instructed to write on a specific theme, such as "Imagine the world in 20 years. What has changed?" The goal of the collection was to use the data to develop a spelling error classification system. *Data* Annotators converted the handwritten text into digital form with all errors committed by the writers; they also created an orthographically correct version of every sentence. Metadata about the text was gathered, including the circumstances under which it was collected, information about the student writer and background about spelling lessons in the particular class. In a second step, the students' spelling errors were annotated into general groupings: grapheme level, syllable level, morphology and syntax. The files were anonymized in a third step. This release also contains metadata regarding the writers’ language biography, teaching methodology, age, gender and school year. The average age of the participants was 11 years, and the gender distribution was nearly equal. Original handwriting is presented as JPEG format image files and the converted annotated text as UTF-8 plain text. Metadata is contained within each text file. *Samples* Please view these image and text samples. *Updates* None at this time. *Additional Citation* Users should cite this paper in any publication that describes this corpus. Preparing Children's Writing Database for Automated Processing Rémi Lavalley, Kay Berkling, Sebastian Stüker, Workshop on Language Teaching, Learning and Technology (LTLT) Leipzig, September 4, 2015
Extent:Corpus size: 629904 KB
Identifier:LDC2015T22
https://catalog.ldc.upenn.edu/LDC2015T22
ISBN: 1-58563-734-3
ISLRN: 840-846-753-370-2
DOI: 10.35111/13mf-v667
Language:German
Language (ISO639):deu
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Provenance:Collected by by the University of Education Karlsruhe in Karlsruhe, Germany.
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2015T22
Rights Holder:Portions © 2015 Dr. Johanna Fay, © 2015 Trustees of the University of Pennsylvania
Type (DCMI):StillImage
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2015T22
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Fay, Johanna. 2015. Linguistic Data Consortium.
Terms: area_Europe country_DE dcmi_StillImage dcmi_Text iso639_deu olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015T22
Up-to-date as of: Thu Oct 24 7:30:50 EDT 2024