Automatic Adaptation of Author's Stylometric Features to Document Types

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Název česky Automatická adaptace stylometrických rysů autora podle typu dokumentů
Autoři

RYGL Jan

Rok publikování 2014
Druh Článek ve sborníku
Konference Text, Speech, and Dialogue - 17th International Conference
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://www.tsdconference.org/tsd2014/download/preprints/575.pdf
Doi http://dx.doi.org/10.1007/978-3-319-10816-2_7
Obor Informatika
Klíčová slova authorship verification; feature selection; machine learning; stylome; stylometric features
Popis Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).
Související projekty: