corpus-based variation – Jonathan Dunn

Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

March 13, 2024

Dunn, J.; Adams, B.; and Tayyar Madabushi, H. (2023). “Pre-Trained Language Models Represent Some Geographic Populations Better Than Others.” In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC/COLING 2024). Abstract. This paper measures the skew in how well two families of LLMs represent diverse geographic populations. A spatial probing … More Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

Syntactic variation across the grammar: Modelling a complex adaptive system

September 29, 2023

Dunn, J. (2023). “Syntactic variation across the grammar: modelling a complex adaptive system.” Frontiers in Complex Systems. DOI: 10.3389/fcpxs.2023.1273741 Abstract. While language is a complex adaptive system, most work on syntactic variation observes a few individual constructions in isolation from the rest of the grammar. This means that the grammar, a network which connects thousands … More Syntactic variation across the grammar: Modelling a complex adaptive system

Variation and Instability in Dialect-Based Embedding Spaces

March 27, 2023

Dunn, J. (2023). “Variation and Instability in Dialect-Based Embedding Spaces.” In Proceedings of the Workshop on NLP for Similar Languages, Varieties and Dialects. Association for Computational Linguistics. Abstract. This paper measures variation in embedding spaces which have been trained on different regional varieties of English while controlling for instability in the embeddings. While previous work … More Variation and Instability in Dialect-Based Embedding Spaces

Exploring the Constructicon

January 30, 2023

Dunn, J. (2023). “Exploring the Constructicon: Linguistic Analysis of a Computational CxG.” In Proceedings of the Workshop on CxGs and NLP @ the Georgetown University Round Table on Linguistics / SyntaxFest. Association for Computational Linguistics. Abstract. Recent work has formulated the task for computational construction grammar as producing a constructicon given a corpus of usage. … More Exploring the Constructicon

Register Variation Remains Stable Across 60 Languages

September 21, 2022

Li, H.; Dunn, J.; and Nini, A. (In Press). “Register Variation Remains Stable Across 60 Languages.” Corpus Linguistics and Linguistic Theory. Abstract. This paper measures the stability of cross-linguistic register variation. A register is a variety of a language that is associated with extra-linguistic context. The relationship between a register and its context is functional: … More Register Variation Remains Stable Across 60 Languages

Stability of Syntactic Dialect Classification Over Space and Time

September 12, 2022

Dunn, J. and Wong, S. (2022). “Stability of Syntactic Dialect Classification Over Space and Time.” In Proceedings of International Conference on Computational Linguistics (COLING 2022). 26-36. Abstract. This paper analyses the degree to which dialect classifiers based on syntactic representations remain stable over space and time. While previous work has shown that the combination of … More Stability of Syntactic Dialect Classification Over Space and Time

Predicting Embedding Reliability in Low-Resource Settings

April 21, 2022

Dunn, J.; Li, H.; & Sastre, D. (2022). “Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures.” In Proceedings of the 13th International Conference on Language Resources and Evaluation. European Language Resources Association. 6461-6470. Abstract This paper simulates a low-resource setting across 17 languages in order to evaluate embedding similarity, stability, and reliability under … More Predicting Embedding Reliability in Low-Resource Settings

Representations of Language Varieties Are Reliable

March 2, 2021

Dunn, J. (2021). “Representations of Language Varieties Are Reliable Given Corpus Similarity Measures.” In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties, and Dialects. Association for Computational Linguistics. 28-38. Abstract. This paper measures similarity both within and between 84 language varieties across nine languages. These corpora are drawn from digital sources (the … More Representations of Language Varieties Are Reliable

Global Syntactic Variation in Seven Languages

July 22, 2019

Dunn, J. (2019). “Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology.” In Frontiers in Artificial Intelligence: Language and Computation. Abstract. The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited … More Global Syntactic Variation in Seven Languages

Modeling Global Syntactic Variation in English

April 11, 2019

Dunn, J. (2019). “Modeling Global Syntactic Variation in English Using Dialect Classification.” In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (NAACL 19). Association for Computational Linguistics. 42-53. Abstract. This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper … More Modeling Global Syntactic Variation in English