Towards Personal Data Anonymization for Social Messaging

Varování

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Autoři

SOTOLÁŘ Ondřej PLHÁK Jaromír ŠMAHEL David

Rok publikování 2021
Druh Článek ve sborníku
Konference Text, Speech, and Dialogue
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www https://link.springer.com/chapter/10.1007/978-3-030-83527-9_24
Doi http://dx.doi.org/10.1007/978-3-030-83527-9_24
Klíčová slova Text anonymization; Personal data; Sanitization; De-identification; Privacy protection
Popis We present a method for building text corpora for the supervised learning of text-to-text anonymization while maintaining a strict privacy policy. In our solution, personal data entities are detected, classified, and anonymized. We use available machine-learning methods, like named-entity recognition, and improve their performance by grouping multiple entities into larger units based on the theory of tabular data anonymization. Experimental results on annotated Czech Facebook Messenger conversations reveal that our solution has recall comparable to human annotators. On the other hand, precision is much lower because of the low efficiency of the named entity recognition in the domain of social messaging conversations. The resulting anonymized text is of high utility because of the replacement methods that produce natural text.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.