Bring voice user interfaces to our offices ‒ LCAV ‐ EPFL

Contact: Dümbgen, Frederike ; Hoffet, Adrien

Synopsis: Implement a new way of interacting with your computer via voice control instead of the mouse and keyboard.

Level:BS, MS

Description: Google Home and Amazon Alexa are quickly revolutionizing how we interact with smart devices. Both use “wake words” (“OK Google” and “Alexa” respectively) to detect the user’s intention to interact. While the wake word detection is typically done on the device to insure minimum latency, the user’s commands following it are usually processed remotely.

The goal of this project is to program a microcontroller to process acoustic data locally and in real time. The microcontroller should run a speech recognition model to extract specific commands from the spoken words of the user. The chip should then emulate a USB device such as a mouse or keyboard buttons and send the derived commands to trigger actions on the host computer. An important aspect of the project will be to understand the limits of what can be processed on the microcontroller, in terms of memory and computation time.

The student has the option to either work on implementing machine learning models such as CNNs on the microchip, or to work on emulating the USB peripheral. Ideally, we will have two students working on both components of the project such that we have a full working system in the end of the semester.

Deliverables: A report and a working system with clear documentation.

References: for useful links, see list of URLs below.

Prerequisites: First part: Knowledge of or strong interest for machine learning, in particular neural networks. Basics in programming of embedded systems.
Second part: Basics of C programming, embedded systems, preferably knowledge of USB devices.

Type of Work: 50% algorithm design/analysis, 50% programming