Overview Examples Materials Deutsch

Still more English

Introduction

On the discussion group for “Aus der Neo-Welt”, it was recently claimed that the layout would not live up to its promise to be designed for English as well, as the apostrophe, which is important for English, had not been accounted for. Furthermore, it was suggested to think about a layout specifically designed for English, as such a layout could be appealing to a wide group of interested people. At least, the issue with the apostrophe is not disputable. Therefore, in what follows, we will create layouts with an apostrophe, and also one layout specifically for English.

Prerequisites

I assume that you have downloaded the optimiser and are roughly familiar with it. The optimiser and its documentation can be found on the overview page. For the present article, I have used version 1.247. Here you can find a collection of files that help to reproduce this article more easily:

Download supporting material

Corpus issues

To create an optimised keyboard layout, a corpus is required. The frequency files that are shipping with the optimiser have to a large part been created from 8-bit text files. Therefore, they do not contain the “’” (U+2019 RIGHT SINGLE QUOTATION MARK) as preferred by the Unicode standard, but “'” (U+0027 APOSTROPHE). This character is sometimes used as an opening and closing quotation mark, but not in a consistent manner. For this reason, we have to ask if the “'” are apostrophes to begin with.

In a trigram consisting of a letter, a “'”, and one more letter, the “'” is almost certainly an apostrophe. Therefore, we can count

awk "/[a-zA-Z]'[a-zA-Z]/ { n = n+\$1 } END { print n }" englisch.txt.3

and we obtain 3359. From englisch.txt.1, it can be seen that “'” appears 4180 times in the English corpus. Therefore, at least three quarters of the apostrophes are real. To me, this seems good enough. In the German corpus, the fraction estimated in the same manner is about one quarter. This is not good, however, the estimate is just a lower bound, and as the number of “'” is small, it does not matter anyway.

Apart from the apostrophe, I want to account for the hyphen, which is also important for English. Here, we face a similar problem: What is a hyphen, what is an en-dash, what is an em-dash, and what is a minus? We will ignore these stylistic subtleties.

We also ignore another matter of style: In non-fictional texts, contractions (for example “don’t”) are often frowned upon. Without contractions, the number of apostrophes would be reduced significantly.

Optimisation

As compared to the standard configuration, we will drop the dedicated keys for umlauts and the “ß”, and in exchange introduce keys for the apostrophe and the hyphen. Furthermore, we will introduce a key for the symbol «¨», which we use to enter umlauts and the «ß»:

Ersatz 'ߨs'
Ersatz 'ĨA'

and so on. With that, the number of keys totals to 34. We compile the optimiser for this number of keys:

g++ -std=c++11 -Wall -Ofast -DNDEBUG -DOHNE2SHIFT -DTASTENZAHL=34 \
    -DENGLISH -DMIT_THREADS -pthread opt.cc -o opt34

As compared to the standard configuration, we drop the QWERTZ-Ä key.

Results

With this, we are already prepared to go. For German and English, we get

./opt34 -2 deutsch.txt -2 englisch.txt -K anglophil.cfg -t 4

                 241.356 total effort   188.319 positional effort    left right
                   1.089 same finger rp   6.999 shift same finger top  6.4 11.6
  ku¨.- vgcljf    70.697 hand alternat.  24.261 shift hand alter. mid 36.4 31.7
  hieao dtrns      1.781 inward/outward  25.745 inward or outward bot  5.2  8.8
  xy',q bpwmz     10.128 adjacent        21.972 shift adjacent    sum 47.9 52.1
                  8.3 11.3 14.3 14.0 --.- --.- 17.4 10.7 14.1 10.0 Sh  2.8  1.2

that is, nearly “Aus der Neo-Welt”, just without the umlauts. Optimising for English exclusively we get

./opt34 -2 englisch.txt -K anglophil.cfg -t 4

                 226.210 total effort   185.472 positional effort    left right
                   0.877 same finger rp   1.786 shift same finger top  5.8 13.5
  jyu.' zmldbp    68.404 hand alternat.  35.327 shift hand alter. mid 39.8 29.0
  sieao hnrtc      1.129 inward/outward  28.064 inward or outward bot  3.6  8.3
  x¨-,q fvwkg      9.418 adjacent        10.627 shift adjacent    sum 49.2 50.8
                  8.5  8.6 14.5 17.6 --.- --.- 17.0 11.5 13.0  9.3 Sh  1.8  1.0

I have called this layout «Anglomane». And, for the sake of completeness, optimising for German exclusively:

./opt34 -2 deutsch.txt -K anglophil.cfg -t 4

                 262.256 total effort   192.547 positional effort    left right
                   1.541 same finger rp  17.062 shift same finger top  9.2 13.5
  k¨o,- pcmljf    71.329 hand alternat.  22.110 shift hand alter. mid 33.7 31.4
  heaiu dtnrs      2.383 inward/outward  24.475 inward or outward bot  5.3  6.9
  x'q.y gbwvz     13.543 adjacent        13.645 shift adjacent    sum 48.2 51.8
                  7.5 11.9 14.9 13.9 --.- --.- 20.6 10.9 10.7  9.6 Sh  1.9  0.9

We compare these layouts for the two languages:

./opt34 -2 deutsch.txt -K anglophil.cfg -r anglophil.txt

Anglomane        287.831 total effort   198.532 positional effort    left right
                   1.869 same finger rp   5.644 shift same finger top  5.2 14.1
  jyu.' zmldbp    67.207 hand alternat.  21.231 shift hand alter. mid 36.6 28.2
  sieao hnrtc      0.960 inward/outward  28.631 inward or outward bot  6.4  9.5
  x¨-,q fvwkg     13.276 adjacent        16.491 shift adjacent    sum 48.2 51.8
                  9.7  8.9 19.4 10.2 --.- --.- 19.1 11.6 11.4  9.7 Sh  3.5  1.8

AdNW∖Umlaute     246.209 total effort   189.959 positional effort    left right
                   1.075 same finger rp   2.212 shift same finger top  7.9 11.4
  ku¨.- vgcljf    70.094 hand alternat.  25.285 shift hand alter. mid 34.7 31.9
  hieao dtrns      1.526 inward/outward  26.538 inward or outward bot  5.1  9.1
  xy',q bpwmz     12.000 adjacent        22.994 shift adjacent    sum 47.6 52.4
                  9.1 11.5 16.6 10.5 --.- --.- 16.2 10.8 15.2 10.3 Sh  3.7  1.5

and

./opt34 -2 englisch.txt -K anglophil.cfg -r anglophil.txt

AdNW∖Umlaute     236.503 total effort   186.680 positional effort    left right
                   1.103 same finger rp  17.062 shift same finger top  4.8 11.9
  ku¨.- vgcljf    71.329 hand alternat.  22.110 shift hand alter. mid 38.1 31.4
  hieao dtrns      2.134 inward/outward  24.913 inward or outward bot  5.3  8.5
  xy',q bpwmz      8.165 adjacent        19.821 shift adjacent    sum 48.2 51.8
                  7.5 11.1 11.9 17.6 --.- --.- 18.5 10.6 13.0  9.6 Sh  1.9  0.9

                 262.256 total effort   192.547 positional effort    left right
                   1.541 same finger rp  17.062 shift same finger top  9.2 13.5
  k¨o,- pcmljf    71.329 hand alternat.  22.110 shift hand alter. mid 33.7 31.4
  heaiu dtnrs      2.383 inward/outward  24.475 inward or outward bot  5.3  6.9
  x'q.y gbwvz     13.543 adjacent        13.645 shift adjacent    sum 48.2 51.8
                  7.5 11.9 14.9 13.9 --.- --.- 20.6 10.9 10.7  9.6 Sh  1.9  0.9

(results that have been shown previously have been omitted) As can be seen, the compromise layout “Aus der Neo-Welt ohne Umlaute” scores worse that the layouts optimised for the respective language. However, the difference is not overly large, especially if one compares it to the score of the layout optimised for the other language. Apparently, English and German fit well together in the same layout.

My conclusion: The result above demonstrates that the missing apostrophe in “Aus der Neo-Welt” is just a cosmetic problem, as it still can be added in an optimal manner, without substantial changes to the layout overall. Furthermore, the results demonstrate that targeting German and English simultaneously causes only minor trade-offs for each of those languages considered by themselves, and, therefore, should be acceptable even for users that write in one of the languages much more than in the other.

Version 23. Mär 2017Impressum