Home

I am a computational linguist using data science to model both the emergence of grammatical structure and variation in grammatical structure using large multi-lingual corpora. My recent work has also focused on the impact of linguistic variation on models in NLP and on low-resource contexts. I have published over 30 papers and my first book was recently published by Cambridge University Press. My interdisciplinary teaching experience includes a MOOC which has now taught over 12,000 students about NLP.

My research models two related phenomena:

(A) the emergence of grammatical structure within individuals, with a focus on the degree to which structure can be learned from usage alone

(B) variation in grammatical structures across populations and across registers, with a focus on how grammars change as complex adaptive systems

The basic question is how language learning and language change interact at scale when we observe both an entire grammar and a global community of speaker-hearers. Computational models applied to large corpora provide a method for solving this difficult problem.

If you’re interested in learning more about computational linguistics, check out my recent book or my two courses on edX: Text Analytics 1: Introducing Natural Language Processing and Text Analytics 2: Visualizing Natural Language Processing. Taken together, these free courses provide a basic introduction to natural language processing. You can also use my introductory Python package on its own: Text Analytics.