Invited Talk by Dr Albina Sarymsakova; Title:Veranstalter: Lehrstuhl für Grundlagen der Sprachverarbeitung Mittwoch, 18.2.2026: 10:00 - 12:00 Uhr; GU13/02.05Abstract:
Automatic emotion detection has shown promising results in contemporary textual domains, particularly in English and in areas such as social media, marketing, and healthcare. However, the analysis of emotions in historical texts and in multilingual corpora remains largely unexplored. Within the project History and Emotions: Characterization and Evaluation of Computational Linguistics Techniques and Language Models for Emotion Mining from Historical Sources , conducted at the Padre Sarmiento Institute of Galician Studies (Spanish National Research Council (CSIC)), we present an overview of the current state of the art in automatic emotion detection across textual domains, with particular emphasis on historical datasets. We also summarise the main advances achieved during the first year of the project.
We introduce a manually annotated corpus comprising 753 fragments extracted from 40 personal letters in Spanish and 264 fragments from 21 letters in English, written and/or received by women in the sixteenth and seventeenth centuries. Emotions were annotated using a multilabel scheme that included Ekman s basic emotions (anger, disgust, fear, joy, sadness, and surprise) and historically and culturally relevant categories such as nostalgia and hope. Ten expert annotators participated in the annotation of the Spanish corpus, and three in the English corpus. Inter-annotator agreement was assessed using Cohen s , leading to methodological refinements and improved annotation guidelines. The English correspondence corpus is currently being developed to achieve comparability in size with the Spanish corpus.
We evaluated both traditional language models (BERT-based architectures) and large language models (LLaMA, Gemma, DeepSeek) for automatic emotion detection in Spanish historical domain. The results reveal persistent challenges related to semantic shift. While DeepSeek achieves higher overall accuracy, Gemma shows a tendency to assign dominant emotions based on broader syntactic and semantic relations. Qualitative analyses using LIME algorithm further illustrate the interpretability limits of current models.
These outcomes highlight the need for domain-sensitive resources and methodologies in historical emotion detection and contribute a validated corpus, annotation strategy, and empirical insights for future research in Digital Humanities and computational emotion analysis.
|