Corpus similarity measures remain robust across diverse languages
Li, H. & Dunn, J. (2022). “Corpus similarity measures remain robust across diverse languages.” Lingua. Abstract. This paper experiments with frequency-based corpus similarity measures across 39 languages using a register prediction task. The goal is to quantify (i) the distance between different corpora from the same language and (ii) the homogeneity of individual corpora. Both … More Corpus similarity measures remain robust across diverse languages