Informace o projektu
Pattern Recognition-based Statistically Enhanced MT (PRESEMT)

Informace

Projekt nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka projektu je na webu muni.cz.

Kód projektu

248307

Období řešení

1/2010 - 12/2012

Investor / Programový rámec / typ projektu

Evropská unie

7. rámcový program EU
Spolupráce

Fakulta / Pracoviště MU

Fakulta informatiky

Spolupracující organizace

Institute for Language and Speech Processing

Odpovědná osoba George Tambouratzis

Gesellschaft zurFörderung angewandter Informatik
Norwegian University of Science and Technology
National Technical University of Athens
Lexical Computing Ltd.

This proposal describes PRESEMT, a flexible and adaptable MT system, based on a language-independent method, whose principles ensure easy portability to new language pairs. This method attempts to overcome well-known problems of other MT approaches, e.g. bilingual corpora compilation or creation of new rules per language pair. PRESEMT will address the issue of effectively managing multilingual content and is expected to suggest a language-independent machine-learning-based methodology. The key aspects of PRESEMT involve syntactic phrase-based modelling, pattern recognition approaches (such as extended clustering or neural networks) or game theory techniques towards the development of a language-independent analysis, evolutionary algorithms for system optimisation. It is intended to be of a hybrid nature, combining linguistic processing with the positive aspects of corpus-based approaches, such as SMT and EBMT.

Publikace

Počet publikací: 14

2012

Building a 70 billion word corpus of English from ClueWeb

POMIKÁLEK Jan RYCHLÝ Pavel JAKUBÍČEK Miloš

Článek ve sborníku

Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), rok: 2012
Detecting Spam in Web Corpora

BAISA Vít SUCHOMEL Vít

Článek ve sborníku

6th Workshop on Recent Advances in Slavonic Natural Language Processing, rok: 2012
Finding Multiwords of More Than Two Words

KILGARRIFF Adam RYCHLÝ Pavel KOVÁŘ Vojtěch BAISA Vít

Článek ve sborníku

Proceedings of the 15th EURALEX International Congress, rok: 2012
Linguistic Logical Analysis of Direct Speech

HORÁK Aleš JAKUBÍČEK Miloš KOVÁŘ Vojtěch

Článek ve sborníku

RASLAN 2012 Recent Advances in Slavonic Natural Language Processing, rok: 2012

2011

Analyzing Time-Related Clauses in Transparent Intensional Logic

HORÁK Aleš JAKUBÍČEK Miloš KOVÁŘ Vojtěch

Článek ve sborníku

Proceedings of Recent Advances in Slavonic Natural Language Processing 2011, rok: 2011
Corpus-based Disambiguation for Machine Translation

BAISA Vít

Článek ve sborníku

Recent Advances in Slavonic Natural Language Processing, rok: 2011
Effective Parsing Using Competing CFG Rules

JAKUBÍČEK Miloš

Článek ve sborníku

Proceedings of Text, Speech and Dialogue 2011, rok: 2011
chared: Character Encoding Detection with a Known Language

POMIKÁLEK Jan SUCHOMEL Vít

Článek ve sborníku

RASLAN 2011, rok: 2011
Japanese Word Sketches: Advances and Problems

SRDANOVIĆ Irena IDA Naomi SHIGEMORI BUČAR Chikako KILGARRIFF Adam KOVÁŘ Vojtěch

Článek v odborném periodiku

Acta Linguistica Asiatica, rok: 2011, ročník: 1/2011, vydání: 2
Practical Web Crawling for Text Corpora

SUCHOMEL Vít POMIKÁLEK Jan

Článek ve sborníku

Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2011, rok: 2011