The Chi-squared Test

A single number that says how close a piece of text is to English. Lower is more English-like. It's the engine behind every letter-frequency attack on this site.

Text

Caesar shift (try it)

Slide to Caesar-encrypt the text live. Watch chi² spike for every non-zero shift — that's why the crackers work.

Analysed text

Chi² score

Observed vs expected distribution

The grey bars are the expected English frequencies; the green bars are what we actually see in your text. The bigger the mismatch, the bigger the chi² score.

Expected (English) Observed (your text)

Per-letter contributions

Sorted by how much each letter adds to the total. The biggest offenders — letters that are wildly over- or under-represented — rise to the top.

How it works

Pearson's chi-squared statistic compares observed counts against what you'd expect under some hypothesis — here, "this text is English". Given N letters in total, for each letter of the alphabet we compute the expected count E = p · N, where p is that letter's standard English frequency (E ≈ 12.7 %, T ≈ 9.1 %, … Z ≈ 0.07 %). The score is then the sum of squared, normalised differences:

χ² = Σ (O − E)² / E

Letters far from their expected counts dominate the sum thanks to the squared term. The result stays small for anything that looks like English and climbs rapidly as the distribution drifts. That's exactly why it cracks Caesar and Vigenère so cleanly: among the 26 (or L×26) candidate shifts, the correct key is almost always the one with the lowest score.

Rule of thumb: a few hundred letters of English score well under 30. Random letters or a wrong-key decryption spike into the hundreds.