LLMs Learn Constructions That Humans Do Not Know

Dunn, J. and Eida, M. (2025). “LLMs Learn Constructions That Humans Do Not Know.” In Proceedings of the Second International Workshop on Construction Grammars and NLP. Abstract. This paper investigates false positive constructions: grammatical structures which an LLM hallucinates as distinct constructions but which human introspection does not support. Both a behavioural probing task using … More LLMs Learn Constructions That Humans Do Not Know

Diffusion Across the Grammar: Complexity in Areal Interactions Between Dialects of English

Dunn, J. (2025). “Diffusion Across the Grammar: Complexity in Areal Interactions Between Dialects of English.” In Enrique-Arias, Andrés, Carlota de Benito Moreno and Florencio del Barrio de la Rosa (eds.). The spatial diffusion of linguistic changes: new methods and theoretical perspectives. Berlin: De Gruyter. Studies in Language Change 26. Abstract. This paper experiments with the … More Diffusion Across the Grammar: Complexity in Areal Interactions Between Dialects of English

Language Contact and Population Contact as Sources of Dialect Similarity

Dunn, J. and Wong, S. (2025). “Language Contact and Population Contact as Sources of Dialect Similarity.” Languages, 10(8), 188. https://doi.org/10.3390/languages10080188 Abstract. This paper creates a global similarity network between city-level dialects of English in order to determine whether external factors like the amount of population contact or language contact influence dialect similarity. While previous computational … More Language Contact and Population Contact as Sources of Dialect Similarity

Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

Dunn, J.; Adams, B.; and Tayyar Madabushi, H. (2023). “Pre-Trained Language Models Represent Some Geographic Populations Better Than Others.” In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC/COLING 2024). 12966–12976 Abstract. This paper measures the skew in how well two families of LLMs represent diverse geographic populations. A spatial … More Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

Geographically-Informed Language Identification

Dunn, J. and Edwards-Brown, L. (2024). Geographically-Informed Language Identification. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC/COLING 2024). 7672–7682. Abstract. This paper develops an approach to language identification in which the set of languages considered by the model depends on the geographic origin of the text in question. … More Geographically-Informed Language Identification

Validating and Exploring Large Geographic Corpora

Dunn, J. (2024). “Validating and Exploring Large Geographic Corpora.” In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC/COLING 2024). 17348–17358. Abstract. This paper investigates the impact of corpus creation decisions on large multi-lingual geographic web corpora. Beginning with a 427 billion word corpus derived from the Common Crawl, three … More Validating and Exploring Large Geographic Corpora

Syntactic Variation Across the Grammar: Modelling a complex adaptive system

Dunn, J. (2023). “Syntactic variation across the grammar: modelling a complex adaptive system.” Frontiers in Complex Systems. DOI: 10.3389/fcpxs.2023.1273741 Abstract. While language is a complex adaptive system, most work on syntactic variation observes a few individual constructions in isolation from the rest of the grammar. This means that the grammar, a network which connects thousands … More Syntactic Variation Across the Grammar: Modelling a complex adaptive system

Variation and Instability in Dialect-Based Embedding Spaces

Dunn, J. (2023). “Variation and Instability in Dialect-Based Embedding Spaces.” In Proceedings of the Workshop on NLP for Similar Languages, Varieties and Dialects. Association for Computational Linguistics. Abstract. This paper measures variation in embedding spaces which have been trained on different regional varieties of English while controlling for instability in the embeddings. While previous work … More Variation and Instability in Dialect-Based Embedding Spaces

Exploring the Constructicon

Dunn, J. (2023). “Exploring the Constructicon: Linguistic Analysis of a Computational CxG.” In Proceedings of the Workshop on CxGs and NLP @ the Georgetown University Round Table on Linguistics / SyntaxFest. Association for Computational Linguistics. Abstract. Recent work has formulated the task for computational construction grammar as producing a constructicon given a corpus of usage. … More Exploring the Constructicon

Exposure and Emergence in Usage-Based Grammar

Dunn, J. (2022). “Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages.” Cognitive Linguistics. 33(4): 659-699. Abstract. This paper uses computational experiments to explore the role of exposure in the emergence of construction grammars. While usage-based grammars are hypothesized to depend on a learner’s exposure to actual language use, the mechanisms of such … More Exposure and Emergence in Usage-Based Grammar