- Click web traffic study of 50 billion+ HTTP requests – http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset
- SourceForge Research data of 140.000+ software projects – http://www3.nd.edu/~oss/Data/data.html
- Enron e-mail database – http://www.cs.cmu.edu/~enron/
- Wikipedia downloadable dataset – http://en.wikipedia.org/wiki/Wikipedia:Database_download
- Structured Wikipedia data – http://dbpedia.org
- N-Gram Records – http://googleresearch.blogspot.hu/2006/08/all-our-n-gram-are-belong-to-you.html
- Sessions, clicks, queries from Yandex – http://imat-relpred.yandex.ru/en/datasets
