Skip to main content

Thesis Defense

POT 1645 and Zoom
Speaker(s) / Presenter(s):
John Winstead

Title: "A Computational Investigation of English Spelling"

Zoom link: 


This thesis examines English orthography's predictability and regularity using computational methods, primarily n-gram models to predict missing letters in words. It evaluates how dataset size, word length, letter position, and vowel presence affect predictive accuracy, using diverse datasets and focusing on distinct word types to minimize frequency biases.

The findings reveal consistent orthographic predictability across genres and corpus sizes, with n-gram models performing well even with limited training data. Word length, letter position, and vowel presence significantly influence accuracy, with second and penultimate letters being most accurately predicted. The study demonstrates that English spelling follows systematic patterns capturable by computational models.