How formulaic are inquisition records? Some formal corpus-based measurements

Authors	ZBÍRAL David KOTZÉ Gideon SHAW Robert Laurence John
Year of publication	2024
Type	Appeared in Conference without Proceedings
MU Faculty or unit	Faculty of Arts
Citation
Description	In this paper, we begin to fill the gap through the analysis of a corpus of Latin-language medieval inquisition material including 15 dierent inquisition registers and amounting to ca. 1.4M tokens. We look at the repetitiveness and formulaicity of language in this corpus from two different perspectives: 1) lexicaldiversity, and (2) the degree of text similarity detected through text reuse detection algorithms. From each perspective, we compare the 15 registers with one another. We compare the registers on lexical diversity measures, n-gram frequency distributions, and text reuse patterns extracted with the Passim text reuse detection tool. We conclude that a large variation in text repetition exists among the registers, and based on our results concerning specific registers, we challenge the widespread notion that lower repetitiveness corresponds to higher historical reliability and vice versa.
Related projects:	Networks of Dissent: Computational Modelling of Dissident and Inquisitorial Cultures in Medieval Europe