Auxiliary material for the Optimiser

Corpora

German

A corpus by the Universität Leipzig, post-processed by Karl Köckemann

In 2010, Karl Köckemann adapted a corpus by the Universität Leipzig, comprised of 3 million sentences from newspaper articles, to the new German orthography, and removed some abbreviations specific to newspapers (for example, “(dpa)”). Here are the frequency files that can be extracted from this corpus:

Download frequency files

Word lists for my German standard corpus

The optimiser ships with frequency files for German. Here you find the word list that has been extracted from the corpus that has been used to construct those frequency files:

Download German word list

English

Word lists for my English standard corpus

The optimiser ships with frequency files for English. Here you find the word list that has been extracted from the corpus that has been used to construct those frequency files:

Download English word list

Measuring n-gram times

A C program for the X Window System, which allows to measure n-gram typing times. The instructions (in German) are contained as a comment at the top:

Download n-gram timer (source code)