OLAC Record

Title:MEBA word aligner
Bibliographic Citation:http://hdl.handle.net/11372/LRT-1299
Creator:Tufiş, Dan
Ceauşu, Alexandru
Date (W3CDTF):2014-07-30T21:34:21Z
Date Available:2014-07-30T21:34:21Z
Description:MEBA is a lexical aligner, implemented in C#, based on an iterative algorithm that uses pre-processing steps: sentence alignment ([[http://www.clarin.eu/tools/sal-sentence-aligner|SAL]]), tokenization, POS-tagging and lemmatization (through [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]], sentence chunking. Similar to YAWA aligner, MEBA generates the links step by step, beginning with the most probable (anchor links). The links to be added at any later step are supported or restricted by the links created in the previous iterations. The aligner has different weights and different significance thresholds on each feature and iteration. Each of the iterations can be configured to align different categories of tokens (named entities, dates and numbers, content words, functional words, punctuation) in decreasing order of statistical evidence. MEBA has an individual F-measure of 81.71% and it is currently integrated in the platform [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]]. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2005). Combined Aligners. In Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, pp. 107-110, Ann Arbor, USA, June 2005. Association for Computational Linguistics. ISBN 978-973-703-208-9.
Identifier (URI):http://hdl.handle.net/11372/LRT-1299
Language (ISO639):eng
Publisher:Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Subject:word aligner
Type (DCMI):Software


Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-1299
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Tufiş, Dan; Ceauşu, Alexandru. 2014. Research Institute for Artificial Intelligence, Romanian Academy of Sciences.
Terms: area_Europe country_GB country_RO dcmi_Software iso639_eng iso639_ron

Up-to-date as of: Thu Oct 5 0:40:15 EDT 2023