Using Low-Cost Annotation to Train a Reliable Czech Shallow Parser

Autoři

RADZISZEWSKI Adam GRÁC Marek

Rok publikování 2013
Druh Článek ve sborníku
Konference Text, Speech, and Dialogue
Fakulta / Pracoviště MU

Filozofická fakulta

Citace
Doi http://dx.doi.org/10.1007/978-3-642-40585-3_72
Obor Jazykověda
Klíčová slova corpus annotation; shallow parsing; Czech
Popis Bushbank is a relatively new concept - a type of annotated corpus where annotation is driven by use of automatic tools and the task of human annotators is limited to accepting or rejecting parts of their output. This creates a possibility to obtain annotated corpora of considerable size at relatively low cost. In this paper we ask the question if the Czech Bushbank is reliable enough to be used for a NLP task instead of a traditional corpus with high annotation rigour. We perform evaluation of three different parsers using its shallow syntactic annotation, including a CRF chunker made originally for Polish. The results are very promising, showing that many practical applications could benefit from low-cost annotation.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.