Semantic Audio Segmentation for Archive Transcription ‒ CVLAB ‐ EPFL

The Cantonal University Library of the Canton of Fribourg (BCU) will soon move into a newly built building, with the official opening planned for September 2026. Within this new space, and with the aim of helping people from Fribourg connect with their cultural heritage, the BCU partnered with EPFL+ECAL Lab, CVLab and Swiss Data Science Center (SDSC) to create a digital installation that fosters the discovery of archival collections.

The BCU archives are diverse and extensive, ranging from photographs, posters, and postcards to video reports, radio recordings, films by local filmmakers, and newspapers. To enable exploration of this large collection, the SDSC set up a processing pipeline that creates a semantic and metadata-based database, allowing for in-depth search and discovery.

Due to time constraints, the current system generates audio embeddings using fixed-length audio chunks of 30 seconds. However, this approach is arbitrary and often cuts speech in the middle of sentences or ideas. We believe that audio should instead be segmented according to its semantic content, rather than fixed time intervals.

Project Objectives

The goal of this project is to segment audio based on meaning rather than time.
Instead of fixed-length chunks, audio will be segmented when the topic or meaning changes in the speech.

Project Approach

Implement semantic segmentation pipeline:

Transcribe audio recordings using an automatic speech recognition system
Compute text embeddings from the transcription
Detect semantic shifts in the embeddings to identify segment boundaries
Align the detected boundaries back to the audio

Expected Outcome

A prototype for semantic audio segmentation

Example segmented audio files and aligned transcripts

A short evaluation comparing semantic segmentation with 30-second segmentation

An indicative user-testing with people for National library to evaluate the potential usage of such technology

Required Skills

Basic Python programming

Interest in audio, language, or AI

Contact

Delphine Ribes – [email protected]