OLAC Record: Spanish TimeBank 1.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2012T12

Metadata

Title: Spanish TimeBank 1.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Sauri, Roser, and Toni Badia. Spanish TimeBank 1.0 LDC2012T12. Web Download. Philadelphia: Linguistic Data Consortium, 2012

Contributor: Sauri, Roser

Badia, Toni

Date (W3CDTF): 2012

Date Issued (W3CDTF): 2012-08-16

Description: *Introduction* Spanish TimeBank 1.0 was developed by researchers at Barcelona Media and consists of Spanish texts in the AnCora corpus annotated with temporal and event information according to the TimeML specification language. TimeML (Pusteyovsky, et al., 2005) is a schema for annotating eventualities and time expressions in natural language as well as the temporal relations among them, thus facilitating the task of extraction, representation and exchange of temporal information. Spanish Timebank 1.0 is annotated in three levels, marking events, time expressions and event metadata. The TimeML annotation scheme was tailored for the specifics of the Spanish language. Temporal relations in Spanish present distinctions of verbal mood (e.g., indicative, subjunctive, conditional, etc.) and grammatical aspect (e.g., imperfective) which are absent in English. Spanish TimeBank 1.0 joins the family of TimeBank annotated corpora which includes languages such as English, Italian, French, Korean and Chinese. Through their common layer of annotation, these corpora provide resources useful for multilingual temporal extraction and processing, such as multilingual text entailment, opinion mining or question answering. Spanish Timebank 1.0 is the Spanish language complement to Catalan Timebank 1.0 LDC2012T10. LDC has released other corpora incorporating TimeBank annotation: TimeBank 1.2 LDC2006T08, FactBank 1.0 LDC2009T23 and ModeS TimeBank 1.0 LDC2012T01. *Data* Spanish TimeBank 1.0 contains stand-off annotations for 210 documents with over 75,800 tokens (including punctuation marks) and 68,000 tokens (excluding punctuation). The source documents are news stories and fiction from the AnCora corpus. The AnCora corpus is the largest multilayer annotated corpus of Spanish and Catalan. AnCora contains 400,000 words in Spanish and 275,000 words in Catalan. The AnCora documents are annotated on many linguistic levels including stucture, syntax, dependencies, semantics and pragmatics. That information is not included in this release, but it can be mapped to the present annotations. The data contained in the AnCora corpus has been used in several international natural language processing evaluations such as CoNLL-2006, CoNLL-2007 and SemEval-2007. The corpus is freely available from the Centre de Llenguatge i Computació (CLiC). *Samples*

Extent: Corpus size: 4608 KB

Identifier: LDC2012T12

https://catalog.ldc.upenn.edu/LDC2012T12

ISBN: 1-58563-620-7

ISLRN: 422-097-648-917-5

DOI: 10.35111/6sfh-f762

Language: Spanish

Language (ISO639): spa

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2012T12

Rights Holder: Portions © 2012 Roser Saurí, Toni Badia, © 2012 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2012T12

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Sauri, Roser; Badia, Toni. 2012. Linguistic Data Consortium.
Terms: area_Europe country_ES dcmi_Text iso639_spa olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2012T12
Up-to-date as of: Thu Sep 18 1:00:03 EDT 2025

Metadata
Title:		Spanish TimeBank 1.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Sauri, Roser, and Toni Badia. Spanish TimeBank 1.0 LDC2012T12. Web Download. Philadelphia: Linguistic Data Consortium, 2012
Contributor:		Sauri, Roser
Contributor:		Badia, Toni
Date (W3CDTF):		2012
Date Issued (W3CDTF):		2012-08-16
Description:		Introduction Spanish TimeBank 1.0 was developed by researchers at Barcelona Media and consists of Spanish texts in the AnCora corpus annotated with temporal and event information according to the TimeML specification language. TimeML (Pusteyovsky, et al., 2005) is a schema for annotating eventualities and time expressions in natural language as well as the temporal relations among them, thus facilitating the task of extraction, representation and exchange of temporal information. Spanish Timebank 1.0 is annotated in three levels, marking events, time expressions and event metadata. The TimeML annotation scheme was tailored for the specifics of the Spanish language. Temporal relations in Spanish present distinctions of verbal mood (e.g., indicative, subjunctive, conditional, etc.) and grammatical aspect (e.g., imperfective) which are absent in English. Spanish TimeBank 1.0 joins the family of TimeBank annotated corpora which includes languages such as English, Italian, French, Korean and Chinese. Through their common layer of annotation, these corpora provide resources useful for multilingual temporal extraction and processing, such as multilingual text entailment, opinion mining or question answering. Spanish Timebank 1.0 is the Spanish language complement to Catalan Timebank 1.0 LDC2012T10. LDC has released other corpora incorporating TimeBank annotation: TimeBank 1.2 LDC2006T08, FactBank 1.0 LDC2009T23 and ModeS TimeBank 1.0 LDC2012T01. Data Spanish TimeBank 1.0 contains stand-off annotations for 210 documents with over 75,800 tokens (including punctuation marks) and 68,000 tokens (excluding punctuation). The source documents are news stories and fiction from the AnCora corpus. The AnCora corpus is the largest multilayer annotated corpus of Spanish and Catalan. AnCora contains 400,000 words in Spanish and 275,000 words in Catalan. The AnCora documents are annotated on many linguistic levels including stucture, syntax, dependencies, semantics and pragmatics. That information is not included in this release, but it can be mapped to the present annotations. The data contained in the AnCora corpus has been used in several international natural language processing evaluations such as CoNLL-2006, CoNLL-2007 and SemEval-2007. The corpus is freely available from the Centre de Llenguatge i Computació (CLiC). Samples
Extent:		Corpus size: 4608 KB
Identifier:		LDC2012T12
		https://catalog.ldc.upenn.edu/LDC2012T12
		ISBN: 1-58563-620-7
		ISLRN: 422-097-648-917-5
		DOI: 10.35111/6sfh-f762
Language:		Spanish
Language (ISO639):		spa
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2012T12
Rights Holder:		Portions © 2012 Roser Saurí, Toni Badia, © 2012 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2012T12
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Sauri, Roser; Badia, Toni. 2012. Linguistic Data Consortium.
Terms:		area_Europe country_ES dcmi_Text iso639_spa olac_primary_text