European Union Language Resources in Sketch Engine

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors

BAISA Vít MICHELFEIT Jan MEDVEĎ Marek JAKUBÍČEK Miloš

Year of publication 2016
Type Article in Proceedings
Conference Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
MU Faculty or unit

Faculty of Informatics

Citation
Web http://www.lrec-conf.org/proceedings/lrec2016/pdf/572_Paper.pdf
Field Informatics
Keywords JRC-Acquis; DCEP; DGT-TM; Europarl; EUR-Lex; Sketch Engine; parallel corpus; word sketch; parallel concordance
Description Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs).
Related projects: