CGLU v4: The Corpus of Global Language Use (401 billion words)
http://www.earthlings.io/corpus_download.html
GeoWAC v1: Geographically-balanced Gigaword Corpora (45 billion words)
http://www.earthlings.io/corpus_download.html
CGLU v3: The Corpus of Global Language Use (16 billion words)
https://labbcat.canterbury.ac.nz/download/?jonathandunn/CGLU_v3