
Objective
This practical project will teach you to code a GPT-like Large Language Model (LLM) from scratch for the application field of your choice, following these steps:
- understanding the transformer architecture,
- implementation of a multihead attention module,
- implementation of a GPT-like LLM,
- implementation of the pretraining process,
- exploration of fine-tuning approaches,
- exploration of instruction fine-tuning.
This project will give you a solid understanding of LLMs and teach you the skills to develop your own. While your demo is unlikely to outperform current state-of-the-art LLMs, this work will allow you to master their underlying techniques.
Assessment
This project can accept two students who will collaboratively study the fundamentals, but independently fine-tune the model for their own application. Each student will need to provide:
- a short developer’s journal documenting the challenges you faced and the decisions you made, e.g., where were you stuck? What did you try? . . . Your thought process and what you learned will be rewarded,
- a project roadmap summarizing your high-level implementation, e.g., a short tutorial,
- a demo of your LLM on the application of your choice.
Requirements
- validated at least one course in machine learning and one course in deep learning,
- solid Python coding skills and comfortable with PyTorch,
- ambitious, independent, and initiative taker.