CGLU v4.2: The Corpus of Global Language Use (423 billion words)
http://www.earthlings.io/download_cglu.html
GeoWAC v1: Geographically-balanced Gigaword Corpora (45 billion words)
http://www.earthlings.io/download_geowac.html
CGLU v3: The Corpus of Global Language Use (16 billion words)
https://publicdata.canterbury.ac.nz/Research/NZILBB/jonathandunn/CGLU_v3/