Pre-processing Large Resources for Family Names Research

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

RAMBOUSEK Adam

Year of publication 2016
Type Article in Proceedings
Conference RASLAN 2016 Recent Advances in Slavonic Natural Language Processing
MU Faculty or unit

Faculty of Informatics

Citation
Web PDF full paper
Field Informatics
Keywords DEB platform; lexicography; big data; family names; data conversion
Description This paper describes methodology and tools used to pre-process historical archive documents in various formats and their conversion to unified format. Resources were used to investigate the origins and geographical distribution of surnames in the United Kingdom, as part of the Family Names in Britain and Ireland research project. Data extracted from the documents and their connection proved to be valuable research resource which helped to speed up the lexicographic work.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.