Research

I am a computational linguist working on models of language learning and language change using large multi-lingual corpora. My research models both (A) the emergence of grammatical structure within individuals and (B) the diffusion of those structures across global populations. My basic research question is how language learning and language change interact at scale across an entire grammar. The challenge is to model how speakers converge onto similar grammars while at the same time modelling how grammars diverge into distinct dialects and registers.

Computational Models of Language Learning

My research experiments with usage-based grammar using computational models applied to large multi-lingual corpora. The basic idea behind these computational experiments is to understand the relationship between exposure to language and the emergence of grammatical structure. For example, a recent paper has shown that increased exposure, simulated using increased training data, creates grammars that are more similar across different contexts of production (Dunn & Tayyar Madabushi, 2021). This is important because it means that the idiosyncratic properties of corpora become less important as the amount of exposure increases. Another recent paper has shown that grammars converge more quickly given exposure to a single individual, rather than to an aggregation of individuals (Dunn & Nini, 2021). This is important because it tests the implicit assumption behind computational models that rely on aggregated corpora to represent the behavior of individuals.

Computational Models of Language Change

My research also models variation and change in grammars across both geographic dialects and contexts of production. The basic idea behind these computational experiments is to understand how different sets of exposure lead to change across an entire grammar. For example, a recent paper has shown that learned grammars are able to capture spatial variation in language to such a degree that it is possible to predict the dialect membership of held-out samples (Dunn, 2019). This is important because it means that a model of language learning is able to represent the feature space within which variation and change takes place.

Digital Language Mapping

My larger goal is to model how grammatical structure emerges and how language change is connected to this process of emergence. In order to carry out these experiments, we need large multi-lingual geo-referenced corpora. I have systematically solved this practical problem by developing state-of-the-art language identification and the 400 billion word Corpus of Global Language Use, covering over 400 languages from 100 countries (Dunn, 2020). This data is visualized through my earthLings.io project, which last year had 3,400 users. My recent work has shown how geo-referenced data can be used to train models without the population bias that over-represents speakers from Western countries (Dunn & Adams, 2020). And other recent work has shown that we can observe changes in linguistic diversity caused by pandemic-related travel restrictions using these large corpora (Dunn, et al., 2020). This kind of practical work is important because it enables us to undertake computational experiments on a population-level scale.

Computational Models of Metaphor

From the perspective of cognitive linguistics, the emergence of grammatical structures is closely related to the meaning and function of those structures. In particular, metaphor is seen as a central process driving the fluidity of meaning in natural languages. From a practical perspective, metaphor and other types of non-literal meaning pose a significant challenge to computational linguistics, a problem that I have worked on extensively in the past. For example, my work has shown how we might model degrees of metaphoricity rather than creating a binary classification between literal and metaphoric utterances (Dunn, 2014). In other work on metaphor I showed that the linguistic utterances commonly identified as metaphoric actually belong to three distinct sub-classes (Dunn, 2015). This work is not my most recent, but it does show the range of my research taking a computational approach to language. I intend for my future work on language learning to include the emergence of both syntactic and semantic structure, eventually including a semantic specification for learned grammars.