Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus

Varování

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Autoři	ANETTA Krištof
Rok publikování	2020
Druh	Článek ve sborníku
Konference	Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020
Fakulta / Pracoviště MU	Fakulta informatiky
Citace
www	PDF ve sborníku Domovská stránka workshopu
Klíčová slova	EHR; electronic health records; named entity recognition; text data mining; NLP; natural language processing; Slavic languages; Polish
Popis	This paper deals with data mining from free-form text electronic health records both from global perspective and with specific application to Slavic languages. It introduces the reader to the promises and challenges of this enterprise and provides a short overview of the global state of the art and of the general absence of this kind of research in Central European Slavic languages. It describes pl_ehr_cardio, a new corpus of Polish health records with 18 years’ worth of medical text. This paper marks the beginning of a pioneering research project in medical text data mining in Central European Slavic languages.
Související projekty:	LINDAT/CLARIAH-CZ - Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy Aplikovaný výzkum: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, zpracování přirozeného jazyka a jazykové inženýrství, vizualizaci velkých dat a rozšířená realita.