Gan, Q.; Dunn, J.; Nini, A.; & Adams, B. (2026). “A Multi-Dialectal, Longitudinal Corpus of Human-AI Hybrid Language Production.” In Proceedings of the International Conference on Language Resources and Evaluation. ERLA.
Abstract. This paper describes a multi-dialectal, longitudinal corpus of human-AI hybrid language production, which includes purely human-produced samples, purely LLM-generated samples, and hybrid samples produced under various writing conditions. The corpus comprises 693 speakers from five national dialects of English, with natural and hybrid samples paired for the same individuals in a longitudinal design spanning four weeks. This design allows investigation of both short- and longer-term effects of LLM assistance on language use across geographic groupings. To demonstrate the utility of the corpus, we analyze linguistic features across three dimensions: lexical diversity, syntactic complexity, and stylistic variation. The results indicate that LLM assistance promotes lexical diversity but simplifies syntactic and stylistic complexity, suggesting distinct effects of LLM assistance across these dimensions. This corpus provides a resource for studies on human-AI interaction, dialectal variation, and the effects of AI assistance on language production.