EE-628 Training Large Language Models

Instructor

Prof. Volkan Cevher, Prof. Caglar Gulcehre

Description

Training Large Language Models (LLMs) has become central to advances in artificial intelligence, with datasets, pre-training and post-training methodologies playing complementary roles in their performance and scalability. This PhD-level course explores the key stages of training these models, emphasizing the impact of data on model performance in downstream tasks. Students will bridge the theory and practice of building LLMs through a comprehensive study of dataset construction, optimization techniques, scaling laws, pre-training strategies, synthetic data generation, and post-training refinements (e.g., fine-tuning and alignment).
The course will combine theoretical instruction with hands-on experimentation. Students will gain insights into:

  • The principles and methodologies for creating high-quality, diverse, and effective datasets.
  • Optimization strategies for large-scale model training, including computational efficiency.
  • Empirical scaling laws and their implications for model size and dataset size.
  • Leveraging synthetic data and its role in improving generalization and robustness.
  • Post-training techniques such as Reinforcement Learning with Human Feedback (RLHF) and alignment with desired outcomes.

This project-based course will result in collaborative research projects to advance our understanding of LLM training. Enrollment is limited and is by selection only.

This course aims to create a project to measure the impact of data selection methods on the pre-training and fine-tuning stages at scale evaluated on the reasoning tasks.

Learning outcomes

By the end of the course, the student must be able to:

  • Develop a deep understanding of the key components in training LLMs.
  • Construct and evaluate their own LLM pipelines focusing on dataset design.
  • Analyze and document the relationship between data, optimization, and scaling laws.
  • Create original research that advances the field.

Prerequisites

Strong foundations in machine learning, deep learning, and optimization; experience with large-scale models is recommended.