Discriminating Between Similar Languages Using Large Web Corpora

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Autoři

SUCHOMEL Vít

Rok publikování 2019
Druh Článek ve sborníku
Konference Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2019
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
Klíčová slova language identification; discriminating similar languages; building web corpora
Popis This paper presents a method for discriminating similar lan-guages based on wordlists from large web corpora. The main benefits ofthe approach are language independency, a measure of confidence of theclassification and an easy-to-maintain implementation.The method is evaluated on VarDial 2014 workshop data set. The resultaccuracy is comparable to other methods successfully performing at theworkshop.A tool implementing the method in Python can be obtained from web sitehttp://corpus.tools/.
Související projekty: