OLAC Record: COSTRA 1.0: A Dataset of Complex Sentence Transformations

OLAC Record
oai:lindat.mff.cuni.cz:11234/1-3123

Metadata

Title: COSTRA 1.0: A Dataset of Complex Sentence Transformations

Bibliographic Citation: http://hdl.handle.net/11234/1-3123

Creator: Barančíková, Petra

Bojar, Ondřej

Date (W3CDTF): 2019-12-05T09:18:04Z

Date Available: 2019-12-05T09:18:04Z

Description: COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The dataset consist of 4,262 unique sentences with average length of 10 words, illustrating 15 types of modifications such as simplification, generalization, or formal and informal language variation. The hope is that with this dataset, we should be able to test semantic properties of sentence embeddings and perhaps even to find some topologically interesting “skeleton” in the sentence embedding space.

Identifier (URI): http://hdl.handle.net/11234/1-3123

Language: Czech

Language (ISO639): ces

Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Rights: Creative Commons - Attribution 4.0 International (CC BY 4.0)

http://creativecommons.org/licenses/by/4.0/

Subject: sentences

sentence embeddings

paraphrases

semantic relations

Type: corpus

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Description: http://www.language-archives.org/archive/lindat.mff.cuni.cz

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:lindat.mff.cuni.cz:11234/1-3123

DateStamp: 2021-06-29

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Barančíková, Petra; Bojar, Ondřej. 2019. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text

http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-3123
Up-to-date as of: Mon Jun 16 1:05:29 EDT 2025

Metadata
Title:		COSTRA 1.0: A Dataset of Complex Sentence Transformations
Bibliographic Citation:		http://hdl.handle.net/11234/1-3123
Creator:		Barančíková, Petra
Creator:		Bojar, Ondřej
Date (W3CDTF):		2019-12-05T09:18:04Z
Date Available:		2019-12-05T09:18:04Z
Description:		COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The dataset consist of 4,262 unique sentences with average length of 10 words, illustrating 15 types of modifications such as simplification, generalization, or formal and informal language variation. The hope is that with this dataset, we should be able to test semantic properties of sentence embeddings and perhaps even to find some topologically interesting “skeleton” in the sentence embedding space.
Identifier (URI):		http://hdl.handle.net/11234/1-3123
Language:		Czech
Language (ISO639):		ces
Publisher:		Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:		Creative Commons - Attribution 4.0 International (CC BY 4.0)
Rights:		http://creativecommons.org/licenses/by/4.0/
Subject:		sentences
		sentence embeddings
		paraphrases
		semantic relations
Type:		corpus
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:		http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:lindat.mff.cuni.cz:11234/1-3123
DateStamp:		2021-06-29
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Barančíková, Petra; Bojar, Ondřej. 2019. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms:		area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text