Archive Retrieval & LLM Readiness (Fribourg Patois) 

Swiss German and Fribourg patois are primarily oral languages, closely linked to regional identity and local history, and are only partially represented in written form. As a result, much of this linguistic heritage remains hidden within audiovisual archives and unstructured documents. Preserving and making these dialects accessible through digital methods is essential both for cultural transmission and for ensuring that emerging language technologies can reflect Switzerland’s linguistic diversity. 

To move from the preservation of regional dialects to their active reuse in digital systems, access to large and well-structured archival collections is essential. In this context, the Cantonal and University Library of Fribourg (BCU) holds a rich collection of cantonal archives, including audio recordings, videos, newspapers, reports, and photographs. In addition, access to a Fribourg patois dictionary containing approximately 40’000 entries provides a valuable linguistic resource.  

Project Objectives

This project explores how such a lexicon can be used to (1) identify and retrieve patois content within the archives and (2) assess what would be required to use the collected documents for training or adapting a language model. 

Project Approach 

Dictionary-driven retrieval 

Use the 40’000-entry patois dictionary to identify and retrieve archive items (audio, video, text) that contain Fribourg patois. 

Corpus construction 

Build a first curated patois corpus from the archives, with metadata (source, date, type, dialect variant if known, etc.). 

LLM training readiness assessment 

Determine what is necessary to use the retrieved materials to train or fine-tune an LLM (data volume and quality, cleaning and normalization needs, transcription requirements for audio/video, language variety / orthography issues, evaluation plan and benchmarks.) 

Expected Outcome 

Identification and structuring of patois-related archival content 

A ranked and documented inventory of audio, video, and textual archival materials containing Fribourg patois, enriched with metadata and confidence indicators. 

Creation of a first curated Fribourg patois corpus 

A cleaned and structured corpus combining selected archival documents, transcripts, and linguistic annotations, suitable for analysis and future reuse. 

Assessment of feasibility for language model use 

A technical evaluation outlining the requirements, limitations, and next steps for using the collected materials to train, fine-tune, or support a language model. 

Contact