OLAC Record

Title:WMT17 Quality Estimation Shared Test Data
Bibliographic Citation:http://hdl.handle.net/11372/LRT-2135
Creator:Specia, Lucia
Logacheva, Varvara
Date (W3CDTF):2017-04-13T08:16:50Z
Date Available:2017-04-13T08:16:50Z
Description:Test data for the WMT17 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-1974 This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include: - To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets. - To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions. - To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
Identifier (URI):http://hdl.handle.net/11372/LRT-2135
Is Replaced By (URI):http://hdl.handle.net/11372/LRT-2805
Language (ISO639):eng
Publisher:University of Sheffield
Subject:machine translation
quality estimation
machine learning
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-2135
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Specia, Lucia; Logacheva, Varvara. 2017. University of Sheffield.
Terms: area_Europe country_DE country_GB dcmi_Text iso639_deu iso639_eng olac_primary_text

Up-to-date as of: Thu Oct 5 0:40:43 EDT 2023