An interactive course tracing the full history of machine translation, from Shannon (1948) and Weaver (1949) through LLM-based MT (2023+). Every model is implemented from scratch in Python. Every chapter runs in a free Colab session.
Live site: https://eduardosanchezg.github.io/mthistory
# Download the corpus (optional — chapters fall back to a built-in sample)
python data/download_multi30k.py
# Parts 0–2: no dependencies
python -c "import notebook" # any Jupyter will do
jupyter lab notebooks/part0/00_shannon.ipynb
# Parts 3+: install per-part deps with uv
uv sync --group part3
uv run jupyter lab notebooks/part3/31_ibm_model1.ipynb
Every notebook has an “Open in Colab” badge and a !pip install header cell
with pinned versions, so you can run any chapter in one click.
| Parts | Dependencies | Runs on |
|---|---|---|
| 0–2 | stdlib only | Any Python 3.8+ |
| 3 | numpy, scipy | Any laptop |
| 4 | torch (CPU) | Any laptop |
| 5 | torch | Codespaces / Colab GPU |
| 6 | transformers, sentencepiece | Colab / Codespaces |
See .devcontainer/ for a one-click GitHub Codespaces setup.
notebooks/
part0/ Before MT Was MT (1948–1949)
part1/ Direct Transfer / RBMT (1954–1970)
part2/ Example-Based MT (1984–1995)
part3/ Statistical MT (1990–2014)
part4/ Neural MT (2013–2017)
part5/ Transformer Era (2017–)
part6/ Multilingual & Low-Resource MT (2019–)
part7/ Evaluation
GitHub Actions runs every notebook on every push via nbmake.
Neural chapters use TRAIN_FROM_SCRATCH = False and load pre-trained
checkpoints from checkpoints/ so CI passes on CPU.