On Evaluation of Natural Language Processing Tasks: Is Gold Standard Evaluation Methodology a Good Solution?

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	KOVÁŘ Vojtěch JAKUBÍČEK Miloš HORÁK Aleš
Year of publication	2016
Type	Article in Proceedings
Conference	Proceedings of the 8th International Conference on Agents and Artificial Intelligence
MU Faculty or unit	Faculty of Informatics
Citation
Field	Informatics
Keywords	Natural Language Processing; Applications; Evaluation
Description	The paper discusses problems in state of the art evaluation methods used in natural language processing (NLP). Usually, some form of gold standard data is used for evaluation of various NLP tasks, ranging from morphological annotation to semantic analysis. We discuss problems and validity of this type of evaluation, for various tasks, and illustrate the problems on examples. Then we propose using application-driven evaluations, wherever it is possible. Although it is more expensive, more complicated and not so precise, it is the only way to find out if a particular tool is useful at all.
Related projects:	Harvesting big text data for under-resourced languages Hyperintensionální logika pro analýzu přirozeného jazyka