How formulaic are inquisition records? Some formal corpus-based measurements

Investor logo
Authors

ZBÍRAL David KOTZÉ Gideon SHAW Robert Laurence John

Year of publication 2024
Type Appeared in Conference without Proceedings
MU Faculty or unit

Faculty of Arts

Citation
Description In this paper, we begin to fill the gap through the analysis of a corpus of Latin-language medieval inquisition material including 15 dierent inquisition registers and amounting to ca. 1.4M tokens. We look at the repetitiveness and formulaicity of language in this corpus from two different perspectives: 1) lexicaldiversity, and (2) the degree of text similarity detected through text reuse detection algorithms. From each perspective, we compare the 15 registers with one another. We compare the registers on lexical diversity measures, n-gram frequency distributions, and text reuse patterns extracted with the Passim text reuse detection tool. We conclude that a large variation in text repetition exists among the registers, and based on our results concerning specific registers, we challenge the widespread notion that lower repetitiveness corresponds to higher historical reliability and vice versa.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.