Skip to content

Dr. Jonathan Dunn

Computational Linguist @ University of Canterbury, Christchurch NZL

  • Home
  • CV
  • Papers
  • Teaching NLP
  • Geo-Corpora
  • earthLings
  • GitHub
  • Department

Geo-Corpora

CGLU v4.2: The Corpus of Global Language Use (423 billion words)

http://www.earthlings.io/download_cglu.html

GeoWAC v1: Geographically-balanced Gigaword Corpora (45 billion words)

http://www.earthlings.io/download_geowac.html

CGLU v3: The Corpus of Global Language Use (16 billion words)

https://publicdata.canterbury.ac.nz/Research/NZILBB/jonathandunn/CGLU_v3/

Jonathan Dunn

Contact

Locke 206
University of Canterbury
Christchurch, New Zealand
jonathan.dunn@canterbury.ac.nz
github.com/jonathandunn

Categories

  • construction grammar
  • corpus measurements
  • corpus-based variation
  • language mapping
  • metaphor
  • Uncategorized
Blog at WordPress.com.