In 2010, Karl Köckemann adapted a corpus by the Universität Leipzig, comprised of 3 million sentences from newspaper articles, to the new German orthography, and removed some abbreviations specific to newspapers (for example, “(dpa)”). Here are the frequency files that can be extracted from this corpus:
The optimiser ships with frequency files for German. Here you find the word list that has been extracted from the corpus that has been used to construct those frequency files:
The optimiser ships with frequency files for English. Here you find the word list that has been extracted from the corpus that has been used to construct those frequency files:
A C program for the X Window System, which allows to measure n-gram typing times. The instructions (in German) are contained as a comment at the top: