Cognitive Linguistics Meets Computational Linguistics

Dunn, J. (2022). “Cognitive Linguistics Meets Computational Linguistics: Construction Grammar, Dialectology, and Linguistic Diversity.” In Tay, D. & Xie Pan, M. (eds.), Data Analytics in Cognitive Linguistics: Methods and Insights. 273-308. Berlin: De Gruyter. Abstract. Computational linguistics and cognitive linguistics come together when we use data-driven methods to conduct linguistic experiments on corpora. This chapter … More Cognitive Linguistics Meets Computational Linguistics

Language Identification for Austronesian Languages

Dunn, J. & Nijhof, W. (2022). “Language Identification for Austronesian Languages.” In Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022). European Language Resources Association. 6530‑6539 Abstract. This paper provides language identification models for low- and under-resourced languages in the Pacific region with a focus on previously unavailable Austronesian languages. Accurate … More Language Identification for Austronesian Languages

Measuring Linguistic Diversity During COVID-19

Dunn, J.; Coupe, T.; & Adams, B. (2020). “Measuring Linguistic Diversity During COVID-19.” Proceedings of the 4th Workshop on NLP and Computational Social Science. Association for Computational Linguistics. 1-10. Abstract. Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of … More Measuring Linguistic Diversity During COVID-19

Mapping Languages: The Corpus of Global Language Use

Dunn, J. (2020). “Mapping Languages: The Corpus of Global Language Use.” Language Resources and Evaluation. 54: 999-1018. Abstract. This paper describes a web-based corpus of global language use with a focus on how this corpus can be used for data-driven language mapping. First, the corpus provides a representation of where national varieties of major languages … More Mapping Languages: The Corpus of Global Language Use

Geographically-Balanced Gigaword Corpora for 50 Language Varieties

Dunn, J. & Adams, B. (2020). “Geographically-Balanced Gigaword Corpora for 50 Language Varieties.” In Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association. 2528-2536. Abstract. While text corpora have been steadily increasing in overall size, even very large corpora are not designed to represent global population demographics. For example, recent work has … More Geographically-Balanced Gigaword Corpora for 50 Language Varieties

Mapping Languages and Demographics

Dunn, J. and Adams, B. (2019). “Mapping Languages and Demographics with Georeferenced Corpora.” In Proceedings of Geocomputation 2019. Abstract. This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets. The goal is to determine (i) which dataset best represents population demographics; (ii) in what parts … More Mapping Languages and Demographics