Code your own LLM

Image generated by DALL-E
Image generated by DALL-E

Objective

This practical project will teach you to code a GPT-like Large Language Model (LLM) from scratch for the application field of your choice, following these steps:

  • understanding the transformer architecture,
  • implementation of a multihead attention module,
  • implementation of a GPT-like LLM,
  • implementation of the pretraining process,
  • exploration of fine-tuning approaches,
  • exploration of instruction fine-tuning.

This project will give you a solid understanding of LLMs and teach you the skills to develop your own. While your demo is unlikely to outperform current state-of-the-art LLMs, this work will allow you to master their underlying techniques.

Assessment

This project can accept two students who will collaboratively study the fundamentals, but independently fine-tune the model for their own application. Each student will need to provide:

  • a short developer’s journal documenting the challenges you faced and the decisions you made, e.g., where were you stuck? What did you try? . . . Your thought process and what you learned will be rewarded,
  • a project roadmap summarizing your high-level implementation, e.g., a short tutorial,
  • a demo of your LLM on the application of your choice.

Requirements

  • validated at least one course in machine learning and one course in deep learning,
  • solid Python coding skills and comfortable with PyTorch,
  • ambitious, independent, and initiative taker.

Contact