Network information extraction from medieval trial records combining LLM-based coreference resolution with string matching in pre-existing lists of persons

Investor logo
Authors

ZBÍRAL David KOTZÉ Gideon BRYS Zoltán SHAW Robert Laurence John HAMPEJS Tomáš KARJUS Andres

Year of publication 2025
Type Appeared in Conference without Proceedings
MU Faculty or unit

Faculty of Arts

Citation
Description This study presents a method for extracting person-to-person network data from medieval trial records by combining Large Language Model (LLM)-based coreference resolution with string matching against pre-existing person lists. Focusing on a corpus of depositions from 14th-century Bologna, we evaluated the performance of a multi-stage pipeline under four conditions differing in the availability and specificity of external person data. Using GPT-4o for clause classification and entity extraction, followed by string normalization and ID matching, we assessed the pipeline’s precision, recall, and ability to replicate a ground-truth incrimination network. While basic LLM extraction without tailored data yielded low performance, enriching the pipeline with document-specific name lists and trial role metadata (Conditions C3 and C4) significantly improved network reconstruction, achieving F1 scores up to 0.77 and high correlation with ground-truth centrality rankings. These results demonstrate that combining LLMs with structured pre-existing data can produce network datasets suitable for historical analysis, while also highlighting the limitations of LLM-based extraction in the absence of contextual person identifiers.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.