ToyLLM: Learning LLM from Scratch¶

A hands-on educational project for understanding and implementing Large Language Models (LLMs) from scratch. This project provides implementations of GPT-2 and related techniques, making it an excellent resource for learning about transformer architectures and modern language models.

Features¶

GPT-2 Implementation¶

A clean, educational implementation of GPT-2 with type hints, supporting both training and inference.

Speculative Sampling¶

An implementation of speculative sampling for faster inference, featuring configurable draft models and performance benchmarking.

KV Cache Optimization¶

A memory-efficient GPT-2 implementation with KV cache optimization for handling longer sequences.

Quick Start¶

Prerequisites¶

Python 3.11 or 3.12
Git and Git LFS (for model files)
UV (recommended package manager)

Installation¶

Clone the repository:

git clone https://github.com/ai-glimpse/toyllm.git
cd toyllm

Set up the environment:

# Create and activate virtual environment
uv venv -p 3.12
source .venv/bin/activate

# Install toyllm
uv pip install toyllm

Download model files:

# Install Git LFS if not already installed
git lfs install

# Download model files
git clone https://huggingface.co/MathewShen/toyllm-gpt2 models

Alternatively, you can manually download the model files from Hugging Face and place them in the toyllm/models directory.

Usage Examples¶

Basic GPT-2 Inference¶

python toyllm/cli/run_gpt2.py --help  # View available options
python toyllm/cli/run_gpt2.py         # Run with default settings

KV Cache Optimized GPT-2¶

python toyllm/cli/run_gpt2_kv.py --help  # View available options
python toyllm/cli/run_gpt2_kv.py         # Run with default settings

Speculative Sampling¶

python toyllm/cli/run_speculative_sampling.py --help  # View available options
python toyllm/cli/run_speculative_sampling.py         # Run with default settings

Benchmarking¶

python toyllm/cli/benchmark/bench_gpt2kv.py --help  # View available options
python toyllm/cli/benchmark/bench_gpt2kv.py         # Run benchmarks

Project Structure¶

toyllm/
├── cli/                    # Command-line interface scripts
├── gpt2/                   # GPT-2 specific implementations
├── gpt2_kv/                # KV-cache optimized GPT-2
├── sps/                    # Speculative sampling implementations
├── util/                   # Utility functions
└── models/                 # Model weights and configurations

Acknowledgements¶

This project is inspired by and builds upon the following excellent resources:

Contributing¶

Contributions are welcome! Please feel free to submit a Pull Request.