MT History — Reproducible Machine Translation from 1949

An interactive course tracing the full history of machine translation, from Shannon (1948) and Weaver (1949) through LLM-based MT (2023+). Every model is implemented from scratch in Python. Every chapter runs in a free Colab session.

Live site: https://eduardosanchezg.github.io/mthistory

Quick start

# Download the corpus (optional — chapters fall back to a built-in sample)
python data/download_multi30k.py

# Parts 0–2: no dependencies
python -c "import notebook"   # any Jupyter will do
jupyter lab notebooks/part0/00_shannon.ipynb

# Parts 3+: install per-part deps with uv
uv sync --group part3
uv run jupyter lab notebooks/part3/31_ibm_model1.ipynb

Open in Colab

Every notebook has an “Open in Colab” badge and a !pip install header cell with pinned versions, so you can run any chapter in one click.

Reproducibility

Parts Dependencies Runs on
0–2 stdlib only Any Python 3.8+
3 numpy, scipy Any laptop
4 torch (CPU) Any laptop
5 torch Codespaces / Colab GPU
6 transformers, sentencepiece Colab / Codespaces

See .devcontainer/ for a one-click GitHub Codespaces setup.

Structure

notebooks/
  part0/   Before MT Was MT (1948–1949)
  part1/   Direct Transfer / RBMT (1954–1970)
  part2/   Example-Based MT (1984–1995)
  part3/   Statistical MT (1990–2014)
  part4/   Neural MT (2013–2017)
  part5/   Transformer Era (2017–)
  part6/   Multilingual & Low-Resource MT (2019–)
  part7/   Evaluation

CI

GitHub Actions runs every notebook on every push via nbmake. Neural chapters use TRAIN_FROM_SCRATCH = False and load pre-trained checkpoints from checkpoints/ so CI passes on CPU.