Network information extraction from medieval trial records combining LLM-based coreference resolution with string matching in pre-existing lists of persons

Logo poskytovatele
Autoři

ZBÍRAL David KOTZÉ Gideon BRYS Zoltán SHAW Robert Laurence John HAMPEJS Tomáš KARJUS Andres

Rok publikování 2025
Druh Další prezentace na konferencích
Fakulta / Pracoviště MU

Filozofická fakulta

Citace
Popis This study presents a method for extracting person-to-person network data from medieval trial records by combining Large Language Model (LLM)-based coreference resolution with string matching against pre-existing person lists. Focusing on a corpus of depositions from 14th-century Bologna, we evaluated the performance of a multi-stage pipeline under four conditions differing in the availability and specificity of external person data. Using GPT-4o for clause classification and entity extraction, followed by string normalization and ID matching, we assessed the pipeline’s precision, recall, and ability to replicate a ground-truth incrimination network. While basic LLM extraction without tailored data yielded low performance, enriching the pipeline with document-specific name lists and trial role metadata (Conditions C3 and C4) significantly improved network reconstruction, achieving F1 scores up to 0.77 and high correlation with ground-truth centrality rankings. These results demonstrate that combining LLMs with structured pre-existing data can produce network datasets suitable for historical analysis, while also highlighting the limitations of LLM-based extraction in the absence of contextual person identifiers.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.