LLMs Meet Medieval Texts

This research explores how large language models perform when applied to historical languages, specifically Old Occitan texts from medieval periods.

Models achieved up to 87% accuracy in POS tagging despite orthographic variations
Performance varied significantly between medical and hagiographical texts
Fine-tuning with domain-specific data substantially improved results
Cross-domain generalization remains challenging for historical languages

This work demonstrates the potential and limitations of modern NLP tools for linguistic analysis of historical texts, offering valuable insights for computational linguists working with non-standardized languages.

Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan