Maximising the Power of Semantic Textual Data : CASTEMO Data Collection and the InkVisitor Application

Zbíral,  David; Shaw,  Robert Laurence John; Hampejs,  Tomáš; Mertel,  Adam

Maximising the Power of Semantic Textual Data : CASTEMO Data Collection and the InkVisitor Application

Authors	ZBÍRAL David SHAW Robert Laurence John HAMPEJS Tomáš MERTEL Adam
Year of publication	2023
Type	Appeared in Conference without Proceedings
Citation
Description	The authors present Computer-Assisted Semantic Text Modelling (CASTEMO), a novel but now well-developed approach to transformation of textual resources into rich structured data, CASTEMO knowledge graphs, stored in JSON-based document databases. They also introduce the open-source InkVisitor research environment which assists in CASTEMO data collection workflow. Both the workflow and the environment were developed within the ERC-funded Dissident Networks Project (DISSINET] but are now made available to use by other researchers and projects. The CASTEMO data collection approach aims to preserve the rich qualitative texture of texts and at the same time produce structured data suitable for computational analysis. It preserves the contextual embeddedness of knowledge and the natural features of human knowledge, such as conflicting evidence and information given in a non-indicative modality, e.g. questions and conditional sentences. It thus answers a significant challenge in the digital study of texts, where a decision must often be taken to prefer extracting content or analysing discursive features, as well as whether to focus on distant or close reading. With CASTEMO, these levels can be readily interwoven into “scalable reading”. This presentation introduces the essential data modelling principles of CASTEMO, as well as its use cases and advantages for certain types of study. It also introduces the InkVisitor research environment.
Related projects:	Networks of Dissent: Computational Modelling of Dissident and Inquisitorial Cultures in Medieval Europe