Improving Coverage of Translation Memories with Language Modelling

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	BAISA Vít BUŠTA Josef HORÁK Aleš
Year of publication	2014
Type	Article in Proceedings
Conference	Eighth Workshop on Recent Advances in Slavonic Natural Language Processing
MU Faculty or unit	Faculty of Informatics
Citation
web	https://nlp.fi.muni.cz/raslan/2014/11.pdf
Field	Informatics
Keywords	translation memory; CAT; segment; subsegment leveraging; partial translation; Moses; GIZA++; word matrix; METEOR; MemoQ; language model
Description	In this paper, we describe and evaluate current improvements to methods for enlarging translation memories. In comparison with the previous results in 2013, we have achieved improvement in coverage by almost 35 percentage points on the same test data. The basic subsegment splitting of the translation pairs is done using Moses and (M)GIZA++ tools, which provide the subsegment translation probabilities. The obtained phrases are then combined with subsegment combination techniques and filtered by large target language models.
Related projects:	Masaryk University Technology Transfer