OLAC Record: 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish

OLAC Record
oai:www.ldc.upenn.edu:LDC2018T06

Metadata

Title: 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: University of the Basque Country, et al. 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish LDC2018T06. Web Download. Philadelphia: Linguistic Data Consortium, 2018

Contributor: University of the Basque Country

Technical University of Catalunya

Charles University

Middle East Technical University

Sabanci University

Date (W3CDTF): 2018

Date Issued (W3CDTF): 2018-01-18

Description: *Introduction* 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basque, Catalan, Czech and Turkish. LDC also released the following 2006 & 2007 CoNLL Shared Task corpora: * 2007 CoNLL Shared Task - Greek, Hungarian & Italian (LDC2018T07) * 2007 CoNLL Shared Task - Arabic & English (LDC2018T08) * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) * 2006 CoNLL Shared Task - 2006 CoNLL Shared Task - Arabic & Czech (LDC2015T12) This corpus is cross listed and jointly released with ELRA as ELRA-W0121. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006 and 2007, the shared tasks were devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. The 2007 shared task added a domain adaptation track for English in addition to the multilingual track. More information about the 2007 shared task is available at the CoNLL Previous Tasks web site. LDC has released data sets from other CoNLL shared tasks. 2008 CoNLL Shared Task Data (LDC2009T12) contains the English material used in the 2008 shared task which focused on English, employed a unified dependency-based formalism and merged the tasks of syntactic dependency parsing, identifying semantic arguments and labeling them with semantic roles. 2009 CoNLL Shared Task Data Parts 1 and 2 (LDC2012T03 and LDC2012T04) consists of the English, Catalan, Chinese, Czech, German and Spanish resources used in the 2009 task which included a comparison of time and space complexity based on participants' input and learning curve comparison for languages with large datasets. 2015-2016 CoNLL Shared Task (LDC2017T13) contains Chinese and English resources used in the 2015 and 2016 shared tasks on dependency parsing. *Data* The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. The Penn Treebank (LDC99T42) is an example of a constituency or phrase structure approach. All of the data sets in this release are dependency treebanks. The individual data sets are: * The 3LB Treebank (Basque) * CESS-Cat Dependency Treebank (Catalan) * Prague Dependency Treebank 2.0 (Czech) * METU-Sabanci Turkish Treebank (Turkish) *Samples* Please view these sampls: * Basque * Catalan * Czech * Turkish *Updates* None at this time.

Extent: Corpus size: 46304 KB

Identifier: LDC2018T06

https://catalog.ldc.upenn.edu/LDC2018T06

ISBN: 1-58563-827-7

ISLRN: 769-620-932-723-2

DOI: 10.35111/v8d9-7f98

Language: Basque

Catalan

Czech

Turkish

Language (ISO639): eus

cat

ces

tur

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2018T06

Rights Holder: Portions © 1991, 1994, 1995 Lidové noviny daily newspapers, © 1992 Mladá fronta Dnes daily newspapers, © 1993-1996 Readers Digest, © 1992-1993 Vesmír scientific magazine, Academia Publishers, © 1996-2001, 2002-2004, 2007 Center for Computational Linguistics & Institute for Formal and Applied Linguistics & Institute of Comparative Linguistics, Charles University in Prague, © 2007 Middle East Technical University, © 2007 Sabanci University, © 2007 Technical University of Catalunya, © 2007 University of the Basque Country, © 2018 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2018T06

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: University of the Basque Country; Technical University of Catalunya; Charles University; Middle East Technical University; Sabanci University. 2018. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_CZ country_ES country_TR dcmi_Text iso639_cat iso639_ces iso639_eus iso639_tur olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2018T06
Up-to-date as of: Wed Oct 29 7:01:46 EDT 2025

Metadata
Title:		2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		University of the Basque Country, et al. 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish LDC2018T06. Web Download. Philadelphia: Linguistic Data Consortium, 2018
Contributor:		University of the Basque Country
		Technical University of Catalunya
		Charles University
		Middle East Technical University
		Sabanci University
Date (W3CDTF):		2018
Date Issued (W3CDTF):		2018-01-18
Description:		Introduction 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basque, Catalan, Czech and Turkish. LDC also released the following 2006 & 2007 CoNLL Shared Task corpora: * 2007 CoNLL Shared Task - Greek, Hungarian & Italian (LDC2018T07) * 2007 CoNLL Shared Task - Arabic & English (LDC2018T08) * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) * 2006 CoNLL Shared Task - 2006 CoNLL Shared Task - Arabic & Czech (LDC2015T12) This corpus is cross listed and jointly released with ELRA as ELRA-W0121. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006 and 2007, the shared tasks were devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. The 2007 shared task added a domain adaptation track for English in addition to the multilingual track. More information about the 2007 shared task is available at the CoNLL Previous Tasks web site. LDC has released data sets from other CoNLL shared tasks. 2008 CoNLL Shared Task Data (LDC2009T12) contains the English material used in the 2008 shared task which focused on English, employed a unified dependency-based formalism and merged the tasks of syntactic dependency parsing, identifying semantic arguments and labeling them with semantic roles. 2009 CoNLL Shared Task Data Parts 1 and 2 (LDC2012T03 and LDC2012T04) consists of the English, Catalan, Chinese, Czech, German and Spanish resources used in the 2009 task which included a comparison of time and space complexity based on participants' input and learning curve comparison for languages with large datasets. 2015-2016 CoNLL Shared Task (LDC2017T13) contains Chinese and English resources used in the 2015 and 2016 shared tasks on dependency parsing. Data The source data in the treebanks in this release consists principally of various texts (e.g., textbooks, news, literature) annotated in dependency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. The Penn Treebank (LDC99T42) is an example of a constituency or phrase structure approach. All of the data sets in this release are dependency treebanks. The individual data sets are: * The 3LB Treebank (Basque) * CESS-Cat Dependency Treebank (Catalan) * Prague Dependency Treebank 2.0 (Czech) * METU-Sabanci Turkish Treebank (Turkish) Samples Please view these sampls: * Basque * Catalan * Czech * Turkish Updates None at this time.
Extent:		Corpus size: 46304 KB
Identifier:		LDC2018T06
		https://catalog.ldc.upenn.edu/LDC2018T06
		ISBN: 1-58563-827-7
		ISLRN: 769-620-932-723-2
		DOI: 10.35111/v8d9-7f98
Language:		Basque
		Catalan
		Czech
		Turkish
Language (ISO639):		eus
		cat
		ces
		tur
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2018T06
Rights Holder:		Portions © 1991, 1994, 1995 Lidové noviny daily newspapers, © 1992 Mladá fronta Dnes daily newspapers, © 1993-1996 Readers Digest, © 1992-1993 Vesmír scientific magazine, Academia Publishers, © 1996-2001, 2002-2004, 2007 Center for Computational Linguistics & Institute for Formal and Applied Linguistics & Institute of Comparative Linguistics, Charles University in Prague, © 2007 Middle East Technical University, © 2007 Sabanci University, © 2007 Technical University of Catalunya, © 2007 University of the Basque Country, © 2018 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2018T06
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		University of the Basque Country; Technical University of Catalunya; Charles University; Middle East Technical University; Sabanci University. 2018. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_CZ country_ES country_TR dcmi_Text iso639_cat iso639_ces iso639_eus iso639_tur olac_primary_text