On the discussion group for “Aus der Neo-Welt”, it was recently claimed that the layout would not live up to its promise to be designed for English as well, as the apostrophe, which is important for English, had not been accounted for. Furthermore, it was suggested to think about a layout specifically designed for English, as such a layout could be appealing to a wide group of interested people. At least, the issue with the apostrophe is not disputable. Therefore, in what follows, we will create layouts with an apostrophe, and also one layout specifically for English.
I assume that you have downloaded the optimiser and are roughly familiar with it. The optimiser and its documentation can be found on the overview page. For the present article, I have used version 1.247. Here you can find a collection of files that help to reproduce this article more easily:
To create an optimised keyboard layout, a corpus is required. The frequency files that are shipping with the optimiser have to a large part been created from 8-bit text files. Therefore, they do not contain the “’” (U+2019 RIGHT SINGLE QUOTATION MARK) as preferred by the Unicode standard, but “'” (U+0027 APOSTROPHE). This character is sometimes used as an opening and closing quotation mark, but not in a consistent manner. For this reason, we have to ask if the “'” are apostrophes to begin with.
In a trigram consisting of a letter, a “'”, and one more letter, the “'” is almost certainly an apostrophe. Therefore, we can count
awk "/[a-zA-Z]'[a-zA-Z]/ { n = n+\$1 } END { print n }" englisch.txt.3
and we obtain 3359. From englisch.txt.1, it can be seen that “'” appears 4180 times in the English corpus. Therefore, at least three quarters of the apostrophes are real. To me, this seems good enough. In the German corpus, the fraction estimated in the same manner is about one quarter. This is not good, however, the estimate is just a lower bound, and as the number of “'” is small, it does not matter anyway.
Apart from the apostrophe, I want to account for the hyphen, which is also important for English. Here, we face a similar problem: What is a hyphen, what is an en-dash, what is an em-dash, and what is a minus? We will ignore these stylistic subtleties.
We also ignore another matter of style: In non-fictional texts, contractions (for example “don’t”) are often frowned upon. Without contractions, the number of apostrophes would be reduced significantly.
As compared to the standard configuration, we will drop the dedicated keys for umlauts and the “ß”, and in exchange introduce keys for the apostrophe and the hyphen. Furthermore, we will introduce a key for the symbol «¨», which we use to enter umlauts and the «ß»:
Ersatz 'ߨs' Ersatz 'ĨA'
and so on. With that, the number of keys totals to 34. We compile the optimiser for this number of keys:
g++ -std=c++11 -Wall -Ofast -DNDEBUG -DOHNE2SHIFT -DTASTENZAHL=34 \ -DENGLISH -DMIT_THREADS -pthread opt.cc -o opt34
As compared to the standard configuration, we drop the QWERTZ-Ä key.
With this, we are already prepared to go. For German and English, we get
./opt34 -2 deutsch.txt -2 englisch.txt -K anglophil.cfg -t 4 241.356 total effort 188.319 positional effort left right 1.089 same finger rp 6.999 shift same finger top 6.4 11.6 ku¨.- vgcljf 70.697 hand alternat. 24.261 shift hand alter. mid 36.4 31.7 hieao dtrns 1.781 inward/outward 25.745 inward or outward bot 5.2 8.8 xy',q bpwmz 10.128 adjacent 21.972 shift adjacent sum 47.9 52.1 8.3 11.3 14.3 14.0 --.- --.- 17.4 10.7 14.1 10.0 Sh 2.8 1.2
that is, nearly “Aus der Neo-Welt”, just without the umlauts. Optimising for English exclusively we get
./opt34 -2 englisch.txt -K anglophil.cfg -t 4 226.210 total effort 185.472 positional effort left right 0.877 same finger rp 1.786 shift same finger top 5.8 13.5 jyu.' zmldbp 68.404 hand alternat. 35.327 shift hand alter. mid 39.8 29.0 sieao hnrtc 1.129 inward/outward 28.064 inward or outward bot 3.6 8.3 x¨-,q fvwkg 9.418 adjacent 10.627 shift adjacent sum 49.2 50.8 8.5 8.6 14.5 17.6 --.- --.- 17.0 11.5 13.0 9.3 Sh 1.8 1.0
I have called this layout «Anglomane». And, for the sake of completeness, optimising for German exclusively:
./opt34 -2 deutsch.txt -K anglophil.cfg -t 4 262.256 total effort 192.547 positional effort left right 1.541 same finger rp 17.062 shift same finger top 9.2 13.5 k¨o,- pcmljf 71.329 hand alternat. 22.110 shift hand alter. mid 33.7 31.4 heaiu dtnrs 2.383 inward/outward 24.475 inward or outward bot 5.3 6.9 x'q.y gbwvz 13.543 adjacent 13.645 shift adjacent sum 48.2 51.8 7.5 11.9 14.9 13.9 --.- --.- 20.6 10.9 10.7 9.6 Sh 1.9 0.9
We compare these layouts for the two languages:
./opt34 -2 deutsch.txt -K anglophil.cfg -r anglophil.txt Anglomane 287.831 total effort 198.532 positional effort left right 1.869 same finger rp 5.644 shift same finger top 5.2 14.1 jyu.' zmldbp 67.207 hand alternat. 21.231 shift hand alter. mid 36.6 28.2 sieao hnrtc 0.960 inward/outward 28.631 inward or outward bot 6.4 9.5 x¨-,q fvwkg 13.276 adjacent 16.491 shift adjacent sum 48.2 51.8 9.7 8.9 19.4 10.2 --.- --.- 19.1 11.6 11.4 9.7 Sh 3.5 1.8 AdNW∖Umlaute 246.209 total effort 189.959 positional effort left right 1.075 same finger rp 2.212 shift same finger top 7.9 11.4 ku¨.- vgcljf 70.094 hand alternat. 25.285 shift hand alter. mid 34.7 31.9 hieao dtrns 1.526 inward/outward 26.538 inward or outward bot 5.1 9.1 xy',q bpwmz 12.000 adjacent 22.994 shift adjacent sum 47.6 52.4 9.1 11.5 16.6 10.5 --.- --.- 16.2 10.8 15.2 10.3 Sh 3.7 1.5
and
./opt34 -2 englisch.txt -K anglophil.cfg -r anglophil.txt AdNW∖Umlaute 236.503 total effort 186.680 positional effort left right 1.103 same finger rp 17.062 shift same finger top 4.8 11.9 ku¨.- vgcljf 71.329 hand alternat. 22.110 shift hand alter. mid 38.1 31.4 hieao dtrns 2.134 inward/outward 24.913 inward or outward bot 5.3 8.5 xy',q bpwmz 8.165 adjacent 19.821 shift adjacent sum 48.2 51.8 7.5 11.1 11.9 17.6 --.- --.- 18.5 10.6 13.0 9.6 Sh 1.9 0.9 262.256 total effort 192.547 positional effort left right 1.541 same finger rp 17.062 shift same finger top 9.2 13.5 k¨o,- pcmljf 71.329 hand alternat. 22.110 shift hand alter. mid 33.7 31.4 heaiu dtnrs 2.383 inward/outward 24.475 inward or outward bot 5.3 6.9 x'q.y gbwvz 13.543 adjacent 13.645 shift adjacent sum 48.2 51.8 7.5 11.9 14.9 13.9 --.- --.- 20.6 10.9 10.7 9.6 Sh 1.9 0.9
(results that have been shown previously have been omitted) As can be seen, the compromise layout “Aus der Neo-Welt ohne Umlaute” scores worse that the layouts optimised for the respective language. However, the difference is not overly large, especially if one compares it to the score of the layout optimised for the other language. Apparently, English and German fit well together in the same layout.
My conclusion: The result above demonstrates that the missing apostrophe in “Aus der Neo-Welt” is just a cosmetic problem, as it still can be added in an optimal manner, without substantial changes to the layout overall. Furthermore, the results demonstrate that targeting German and English simultaneously causes only minor trade-offs for each of those languages considered by themselves, and, therefore, should be acceptable even for users that write in one of the languages much more than in the other.