Corpus similarity measures remain robust across diverse languages

Li, H. & Dunn, J. (2022). “Corpus similarity measures remain robust across diverse languages.” Lingua. Corpus similarity remains a robust measurable property across 39 diverse languages. Measures are highly accurate on a register prediction task using comparable corpora. Accuracy is not dependent on specific registers, shown by out-of-domain experiments. Corpus similarity measures also work for … More Corpus similarity measures remain robust across diverse languages

Cognitive Linguistics Meets Computational Linguistics

Dunn, J. (2022). “Cognitive Linguistics Meets Computational Linguistics: Construction Grammar, Dialectology, and Linguistic Diversity.” In Tay, D. & Xie Pan, M. (eds.), Data Analytics in Cognitive Linguistics: Methods and Insights. 273-308. Berlin: De Gruyter. Abstract. Computational linguistics and cognitive linguistics come together when we use data-driven methods to conduct linguistic experiments on corpora. This chapter … More Cognitive Linguistics Meets Computational Linguistics

Language Identification for Austronesian Languages

Dunn, J. & Nijhof, W. (2022). “Language Identification for Austronesian Languages.” In Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022). European Language Resources Association. 6530‑6539 Abstract. This paper provides language identification models for low- and under-resourced languages in the Pacific region with a focus on previously unavailable Austronesian languages. Accurate … More Language Identification for Austronesian Languages

Predicting Embedding Reliability in Low-Resource Settings

Dunn, J.; Li, H.; & Sastre, D. (2022). “Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures.” In Proceedings of the 13th International Conference on Language Resources and Evaluation. European Language Resources Association. 6461-6470. Abstract This paper simulates a low-resource setting across 17 languages in order to evaluate embedding similarity, stability, and reliability under … More Predicting Embedding Reliability in Low-Resource Settings

Automatic Identification of Metaphoric Utterances

Dunn, J. (2013). Automatic Identification of Metaphoric Utterances. PhD Dissertation. Purdue University. Abstract. This dissertation analyzes the problem of metaphor identification in linguistic and computational semantics, considering both manual and automatic approaches. It describes a manual approach to metaphor identification, the Metaphoricity Measurement Procedure (MMP), and compares this approach with other manual approaches. The dissertation … More Automatic Identification of Metaphoric Utterances

Towards a Computational Model of Metaphor

Dunn, J. (2010). Towards a Computational Model of Metaphor. MA Thesis. Purdue University. Abstract. This thesis works towards a micro-theory of metaphor within the ontological semantics framework. It does so using a parameter-based system modeled roughly after Attardo and Raskin’s (1991) general theory of verbal humor. At the same time, it tries to convert Lakoff … More Towards a Computational Model of Metaphor

Construction Grammars Converge Given Increased Exposure

Dunn, J. & Tayyar Madabushi, H. (2021). “Learned Construction Grammars Converge Across RegistersGiven Increased Exposure.” Proceedings of the Conference on Computational Natural Language Learning (CoNLL 2021). Association for Computational Linguistics. Abstract. This paper measures the impact of increased exposure on whether learned construction grammars converge onto shared representations when trained on data from different registers. … More Construction Grammars Converge Given Increased Exposure

Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction

Dunn, J. & Nini, A. (2021). “Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction.” Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (NAACL 2021). Association for Computational Linguistics. 149-159. Abstract. This paper asks whether a distinction between production-based and perception-based grammar induction influences either (i) the growth curve of grammars … More Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction

Representations of Language Varieties Are Reliable

Dunn, J. (2021). “Representations of Language Varieties Are Reliable Given Corpus Similarity Measures.” In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties, and Dialects. Association for Computational Linguistics. 28-38. Abstract. This paper measures similarity both within and between 84 language varieties across nine languages. These corpora are drawn from digital sources (the … More Representations of Language Varieties Are Reliable

Measuring Linguistic Diversity During COVID-19

Dunn, J.; Coupe, T.; & Adams, B. (2020). “Measuring Linguistic Diversity During COVID-19.” Proceedings of the 4th Workshop on NLP and Computational Social Science. Association for Computational Linguistics. 1-10. Abstract. Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of … More Measuring Linguistic Diversity During COVID-19