Investigating the Impact of Semantic Similarity on Stylometric Attribution Using Controlled Artificial Texts and Delta Distances.

Authors

ČECH Radek MUTLOVÁ Petra MIKROS George

Year of publication 2025
Type Appeared in Conference without Proceedings
MU Faculty or unit

Faculty of Arts

Citation
Description The contribution investigated how semantic similarity affects stylometric methods, focusing on Delta distances for authorship attribution. It was inspired by Czech authors' analysis of medieval Latin texts where stylometric methods for authorship attribution failed. In particular, these methods proved to be extremely sensitive to the initial settings, meaning that small changes in the parameters led to fundamentally different results. The assumtption that this was caused by the high similarity of the texts was tested with the help of controlled artificial text corpora with varying levels of semantic overlap in which clustering robustness was assessed. Our findings confirmed that increasing semantic similarity diminishes the reliability of Delta distances in distinguishing authors. By employing controlled experimental conditions, this contribution underscored the importance of mitigating semantic overlap in stylometric analyses and emphasized the need for robust feature selection in authorship attribution methodologies, particularly in contexts of high semantic similarity.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.