DMoG : A Data-Based Morphological Guesser


This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on

KOVÁŘ Vojtěch RYCHLÝ Pavel

Year of publication 2021
Type Article in Proceedings
Conference Recent Advances in Slavonic Natural Language Processing (RASLAN 2021)
MU Faculty or unit

Faculty of Informatics

Keywords Lemmatization; Morphological guesser; Morphological analysis; Morphological guessing
Description We present a novel corpus-based approach to lemmatization of unknown words. The tool learns a?ix patterns from annotated data, and based on these patterns, it predicts other word forms that should be present in the corpus. A lemma candidate then comes from the pattern whose predictions are really found in the corpus. We present a prototype implementation and an initial evaluation on Czech, which shows promising results.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.