Text Punctuation: An Inter-annotator Agreement Study

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.


BOHÁČ Marek ROTT Michal KOVÁŘ Vojtěch

Year of publication 2017
Type Article in Proceedings
Conference Text, Speech, and Dialogue: 20th International Conference, TSD 2017
MU Faculty or unit

Faculty of Informatics

Web https://link.springer.com/chapter/10.1007/978-3-319-64206-2_14
Doi http://dx.doi.org/10.1007/978-3-319-64206-2_14
Field Informatics
Keywords Comma adding;Spoken language;Inter-annotator agreement
Description Spoken language is a phenomenon which is hard to be annotated accurately. One of the most ambiguous tasks is to fill in the punctuation marks into the spoken language transcription. Used punctuation marks are often dependent on how annotators understand the transcription content. This may differ as the spoken language often lacks clear structure (inherent to written language) due to the utterance spontaneity or due to skipping between ideas. Therefore we suspect that filling commas into the spoken language transcription is a very ambiguous task with low inter-annotator agreement (IAA). In this paper we analyze the IAA within group of annotators and we propose methods to increase it. We also propose and evaluate a reformulation of classical GT annotations for cases with multiple annotations available.
Related projects: