The Chi-squared Test
A single number that says how close a piece of text is to English. Lower is more English-like. It's the engine behind every letter-frequency attack on this site.
Slide to Caesar-encrypt the text live. Watch chi² spike for every non-zero shift — that's why the crackers work.
Chi² score
Observed vs expected distribution
The grey bars are the expected English frequencies; the green bars are what we actually see in your text. The bigger the mismatch, the bigger the chi² score.
Per-letter contributions
Sorted by how much each letter adds to the total. The biggest offenders — letters that are wildly over- or under-represented — rise to the top.
How it works
Pearson's chi-squared statistic compares observed counts
against what you'd expect under some hypothesis — here, "this text
is English". Given N letters in total, for each letter
of the alphabet we compute the expected count
E = p · N, where p is that letter's
standard English frequency (E ≈ 12.7 %, T ≈ 9.1 %, … Z ≈ 0.07 %).
The score is then the sum of squared, normalised differences:
χ² = Σ (O − E)² / E
Letters far from their expected counts dominate the sum thanks to the
squared term. The result stays small for anything that looks like
English and climbs rapidly as the distribution drifts. That's exactly
why it cracks Caesar and Vigenère so cleanly: among the 26 (or
L×26) candidate shifts, the correct key is almost
always the one with the lowest score.
Rule of thumb: a few hundred letters of English score well under 30. Random letters or a wrong-key decryption spike into the hundreds.