Data collection in historical network research : An extreme proposal

Zbíral,  David; Mertel,  Adam; Shaw,  Robert Laurence John; Hampejs,  Tomáš

Data collection in historical network research : An extreme proposal

Authors	ZBÍRAL David MERTEL Adam SHAW Robert Laurence John HAMPEJS Tomáš
Year of publication	2021
Type	Appeared in Conference without Proceedings
MU Faculty or unit	Faculty of Arts
Citation
Description	The extent of data collection in historical network research (HNR) is often delimited by the specific hypotheses that drive the research in question. Such a parsimonious approach is completely logical and in many cases sufficient; moreover, there is no such thing as “total” data collection, because the data is to a degree in the eye of the beholder. At the same time, however, historical research has a tried and tested tradition of more “data-driven” research, where the close reading of sources often drives the direction of study more than the testing of hypotheses. In this paper, we present our experience of developing a thorough data model and user interface for the collection of structured data from medieval inquisitorial registers; we undertook this as part of a project that seeks to provide a networked perspective on religious dissent and its repression in the period (Dissident Networks Project / DISSINET, https://dissinet.cz). From this experience, we derive several proposals which should be of interest to historians who, on the continuous scale between hypothesis-driven and source-driven data collection, lean somewhat more towards the latter. Our point of departure is that a data model for source-driven data collection should allow as much relational complexity as the natural language does. Our approach is not completely new from a conceptual or technical point of view; it is based on statements whose departure point is the “semantic triple” and which are stored in a graph database. However, we dig quite deeply into the language of our sources to propose a way of recording its minutiae, allowing for modifiers (e.g., adjectives, adverbs), temporal and spatial relations, modality (negative, question, possibility etc.), and give specific meaning to the different actant positions (subject, objects) of each verb. We thus preserve the semantic structure and detail of the source, while also producing highly structured data suited to various projections and various kinds of network analysis and visualization (as well as other computational methodologies). This approach to data collection thus amounts to modelling, in the instance, the source itself. The talk does not focus on technical solutions (e.g., review of data collection environments) or standards. Rather, we explore conceptual issues and a practical workflow that we believe can be inspirational not only for HNR but for SNA more generally. In the terminology of the latter, our approach allows for a genuine “mixed methods” approach to research, standing at the intersection between the richness of qualitative detail and the power of quantitative analyses of structured relational data.
Related projects:	Nekonformní náboženské kultury ve středověké Evropě z pohledu analýzy sociálních sítí a geografických informačních systémů