Modeling Global Syntactic Variation in English Using Dialect Classification

[Read Full-Text]

[Presentation Slides]

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.

[Data, CC: https://labbcat.canterbury.ac.nz/download/?jonathandunn/CGLU_v3]

[Data: Grammars: https://labbcat.canterbury.ac.nz/download/?jonathandunn/CxG_Data_FixedSize]

[Code, CC: https://github.com/jonathandunn/common_crawl_corpus]

[Code, LID: https://github.com/jonathandunn/idNet]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s