Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

[Read full-text] Abstract The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar … More Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

Mapping Languages and Demographics with Georeferenced Corpora

[Read full-text] Co-authored with Ben Adams. Abstract This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sources, against ground-truth population and language-census datasets. The goal is to determine (i) which dataset best represents population demographics; (ii) in what parts of the world the datasets are most representative of actual populations; and … More Mapping Languages and Demographics with Georeferenced Corpora