Machine Translation, 1949 → Today

Machine Translation, 1949 → Today#

An executable history · every model from scratch

Seventy-five years of
machine translation,
rebuilt from scratch.

From Claude Shannon's 1948 entropy equations to LLM-based translation — every system in this course is implemented in Python, runnable in your browser, and tested on the same 20 sentences so you can watch quality evolve across eras.

Start with Shannon (1948) → 🕰️ Open the Time Machine

28executable chapters

75+years of history

20shared test sentences

0dependencies, Parts 0–2

The arc, at a glance

Hover a milestone for context. Click to jump to its chapter.

1948 Shannon Information theory: language as a statistical source 1949 Weaver memo "Translation as decryption" — the founding metaphor 1954 Georgetown–IBM First public demo: 6 rules, 250 words, RU→EN 1960 Bar-Hillel "The box was in the pen" — the impossibility argument 1968 SYSTRAN Rule-based transfer goes industrial 1984 EBMT Nagao: translate by analogy to stored examples 1990 IBM models The statistical revolution: P(e|f) ∝ P(f|e)·P(e) 2003 Phrase-based Moses-era SMT: phrase tables + beam decoding 2014 Seq2seq Sutskever et al.: neural nets enter MT for real 2015 Attention Bahdanau: the fixed-vector bottleneck falls 2017 Transformer "Attention is all you need" 2021 M2M / NLLB One model, 200 languages 2023 LLM MT GPT-4, ALMA, Tower: translation as emergent skill

How this course works

🔧

Everything from scratch

No black boxes. IBM Model 1's EM loop, Bahdanau attention, the Transformer, BLEU — all implemented in plain Python you can read in one sitting.

⚡

Runs anywhere

Parts 0–2 need only the Python standard library. Every neural model trains in under 30 minutes on a free Colab T4 — or loads a pre-trained checkpoint instantly.

🕰️

One test set, every era

Each chapter ends by translating the same 20 held-out sentences. Seventy-five years of progress becomes something you can see, line by line.

The course

Eight parts, in historical order. Each chapter is an executable notebook — open it here, in Colab, or in a Codespace.

1948 – 1960

Part 0 · Before MT Was MT

The intellectual prehistory: information theory, the Weaver memorandum, and the first impossibility argument.

Shannon — entropy of language Weaver — the cryptanalysis window Bar-Hillel — "the box was in the pen"

1954 – 1970

Part 1 · Rule-Based MT

Dictionaries plus hand-written rules: the Georgetown demo, the Soviet mirror program, SYSTRAN, and interlingua.

The Georgetown–IBM 1954 demo The Soviet mirror program Transfer-based MT / SYSTRAN Interlingua — a toy UNL round-trip

1984 – 1995

Part 2 · Example-Based MT

Nagao's insight: don't translate from rules — translate by analogy to sentences you've seen before.

EBMT — retrieval & recombination

1990 – 2014

Part 3 · Statistical MT

The IBM revolution: translation as probabilistic inference. EM, alignments, phrase tables, n-gram LMs.

The noisy channel IBM Model 1 — EM from scratch Models 2–5 & HMM alignment Phrase-based SMT n-gram LMs & Kneser–Ney Hiero — hierarchical phrases

2013 – 2017

Part 4 · Neural MT

Sequence-to-sequence learning, the fixed-vector bottleneck, and the attention mechanism that broke it.

The neural shift RNN encoder–decoder Bahdanau attention ConvS2S

2017 –

Part 5 · The Transformer Era

Self-attention end to end: the architecture that ate NLP, plus the subword revolution and the scaling playbook.

The Transformer from scratch BPE & SentencePiece WMT systems & back-translation

2019 –

Part 6 · Multilingual & LLM MT

One model, many languages: zero-shot transfer, M2M-100, NLLB-200, and translation as an emergent LLM skill.

The language-token trick Low-resource MT M2M-100 & NLLB LLM-based MT

1950 – today

Part 7 · Evaluation

How do we even know a translation is good? BLEU, chrF, TER and COMET implemented from scratch, plus human protocols.

BLEU → COMET Human evaluation & MQM Benchmarks & the frontier