Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

Investor logo


This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.


KOVÁŘ Vojtěch

Year of publication 2016
Type Article in Proceedings
Conference Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016
MU Faculty or unit

Faculty of Informatics

Field Informatics
Keywords NLP; inter-annotator agreement; low inter-annotator agreement; evaluation; application; application-based evaluation; word sketch; thesaurus; terminology
Description In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.