Modeling Global Syntactic Variation in English

Dunn, J. (2019). “Modeling Global Syntactic Variation in English Using Dialect Classification.” In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (NAACL 19). Association for Computational Linguistics. 42-53.

Abstract. This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.

[Presentation Slides]

[Data, CC: https://labbcat.canterbury.ac.nz/download/?jonathandunn/CGLU_v3]

[Data: Grammars: https://labbcat.canterbury.ac.nz/download/?jonathandunn/CxG_Data_FixedSize]

[Code, CC: https://github.com/jonathandunn/common_crawl_corpus]

[Code, LID: https://github.com/jonathandunn/idNet]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s