Corpus similarity measures remain robust across diverse languages
Li, H. & Dunn, J. (2022). “Corpus similarity measures remain robust across diverse languages.” Lingua. Corpus similarity remains a robust measurable property across 39 diverse languages. Measures are highly accurate on a register prediction task using comparable corpora. Accuracy is not dependent on specific registers, shown by out-of-domain experiments. Corpus similarity measures also work for … More Corpus similarity measures remain robust across diverse languages