I am a computational linguist in the Linguistics Department at the University of Canterbury in Christchurch, New Zealand. I work across both linguistic theory and natural language processing. My research models how grammatical structure emerges within individuals and how variants spread across both dialects and registers.

Before joining the University of Canterbury, I held positions in computer science at the Illinois Institute of Technology and received a PhD in linguistics from Purdue University under Victor Raskin. I have published over 30 papers in computational linguistics and my first book, Natural Language Processing for Corpus Linguistics, is now available from Cambridge University Press.

On a practical level, my work provides solutions to difficult problems: Language Identification, Dialect Identification, Construction Grammar, Language Mapping, and Corpus Similarity.

Right now I am working on global-scale computational dialectology as the combination of grammar induction and geospatial text classification. The goal is to model regional syntactic variation so accurately that dialect models can predict an individual’s region-of-origin. This work depends on large geo-referenced corpora that reflect the demographics of underlying populations. For example, here’s a recent paper showing that the COVID-19 pandemic allows us to remove non-local populations from digital corpora.

If you’re interested in learning more about computational linguistics, check out my recent book or my two courses on edX: Text Analytics 1: Introducing Natural Language Processing and Text Analytics 2: Visualizing Natural Language Processing. Taken together, these free courses provide a basic introduction to natural language processing. You can also use my introductory Python package on its own: Text Analytics.