The German language Wikipedia mentions the Bulgarian keyboard layout by Teodor Galabov from the year 1907 as an early “alternative” layout. An ergonomic keyboard layout, 25 years before Dvorak? The entry in Wikipedia refers to the article The Bulgarian Alphabet and Keyboard in the Context of EU Communications. According to Wikipedia, this article has been published by the working group MEEK of the Comité Européen de Normalisation. Meanwhile, it has disappeared from its original location, the working group MEEK has been disbanded. The article itself mentions no authors. At least, in section 5, there is a reference to an investigation (ANABELA) of the layout from the year 2007, and some names of participants are mentioned, but references to related publications are missing.
A disappeared article by unknown authors, which does not cite its sources clearly, published by a disbanded working group: It is sad what is known about Galabov’s work in the west. At least, the layout is well-known, because it is the Bulgarian standard. Thanks to this, we are able to judge ourselves whether Galabov’s work was a pioneering act, or whether our only source exaggerates its merits. We proceed as follows: We compare Galabov’s layout to a current competitor (a phonetic transcription of QWERTY to Cyrillic, called “CIA” in what follows) and a self optimised layout, using a Bulgarian corpus. CIA is about what the Bulgarians would have if they would have simply ripped off Sholes, as the rest of the world did. In parallel to this, Dvorak is compared to QWERTY and to a self optimised layout, using an English corpus. Then, we investigate the improvement of Galabov and Dvorak compared to the Sholes layouts, and how much they stay behind what is “feasible”. We judge feasibility according to the self optimised layouts. As the optimisation takes into account different criteria, these layouts certainly are not the absolute limit with respect to each individual criterion, but demonstrate what is possible in a well-balanced layout. Finally, we take a look at DHIATENSOR, an English layout of 1893 or even earlier, to investigate if already this layout was “ergonomic” by Dvorak’s standards.
I assume that you have downloaded the optimiser and are roughly familiar with it. The optimiser and its documentation can be found on the overview page. For the present article, I have used version 1.227. Here you can find a collection of files that help to reproduce this article more easily:
My web pages do not instruct your browser about which fonts it should use. If the fonts in your default settings do not support Cyrillic, this article might be displayed incorrectly. This should not be a big problem, as there are plenty of free fonts that cover Cyrillic and work well for on-screen usage, for example, the DejaVu family.
A good starting point for corpora is the collection of the Universität Leipzig, which contains large corpora for many languages. I choose bul_news_2007_30K-sentences.text.tar.gz, 30000 sentences taken from newspaper articles. The file is already UTF-8 encoded, just as required by the optimiser. Each sentence appears on a line of its own, and each sentence is preceded by a number, separated by a tabulator character. We must get rid of the number, which is easy by using Unix text tools:
cut -f2- bul_news_2007_30K-sentences.txt > bulgarisch.txt
Now we can create the frequency tables:
The corpus is large and, therefore, we do not worry about statistical errors.
As I do not know Bulgarian, and as I do know very few Cyrillic letters, for me, the hardest part is to pick the characters for the Bulgarian layout and to enter them correctly into the configuration file. As my source of information, I use Keyboard layouts for Bulgarian language writing devices. This article describes Galabov and CIA, and it provides the Unicode codepoints explicitly.
On the less-than key, both layouts have an ѝ, which does not appear in the corpus and, therefore, can be left out. We want to have all letters, as well as period and comma in the layout, and to this end include all keys to which such a symbol is mapped in one or the other layout. Compared to the standard configuration file, we need four additional keys,
Taste TLDE 0 0 -0.25 0 -5 - 10 Taste AE12 14 0 11.75 0 5 - 10 Taste AD12 14 1 12.25 1 5 - 8 Taste AB12 14 3 13.00 3 5 - 10
TLDE is in the upper left corner, AE12 is to the right of backspace, AD12 is the key to the upper left of return (ISO) and AB12 is to the right of the right shift key. The key names TLDE, AE12, AD12, and AB12 are inspired by xkeyboard-config, however, for our purposes do not matter.
Unfortunately, this is not sufficient to describe both layouts in a uniform manner. The symbols э and ы are missing from CIA and, furthermore, the pairing of symbols is not identical. For combinations with conflicts, I have decided for Galabov. Therefore, I use
Zeichen '()' # TLDE Zeichen '.€' # AE12 Zeichen ',ы' # AD01 Zeichen ';§' # AD12 Zeichen :'": #"AB12
The remaining symbols pose no problem. The comments above denote the position of the symbols in Galabov. For э, the round parentheses, the semicolon, and the quotation marks it is not clear where they should be mapped to in CIA. More on this topic below.
In the standard configuration, 35 keys are used. With the four additional keys, but without the space key, we now need 38 keys and, therefore, must compile the optimiser accordingly:
g++ -std=c++11 -Ofast -DTASTENZAHL=38 -DNDEBUG -DENGLISH -DMIT_THREADS -pthread opt.cc -o opt38
CIA uses a phonetic relation between Latin and Cyrillic letters. We can express such relations using Ersatz. For example,
means that the Latin “p” is typed as the Cyrillic “п”. Therefore, we can evaluate how well a Cyrillic layout is suited for entering an English text, assuming this phonetic relation is used.
If you want to create graphics, you must select a font that contains the required symbols. I have chosen Fira:
Zeichenfont FiraMono-Medium Beschreibungsfont FiraSans-Book
Actually, I would have had to worry only about Zeichenfont, as from Beschreibungsfont, only the digits are used. You should take care that the PostScript interpreter can find the fonts. For ghostscript, one can assign the environment variable GS_FONTPATH accordingly:
In general, using an extended set of symbols is not that easy. The problem is that one must tell PostScript which glyph to use for a particular symbol. Glyphs have names, and for the glyph names, standards exist, unfortunately, multiple standards. Two of them are described in the Adobe glyph lists. According to this, the glyph for Д can have the name Decyrillic or afii10021. As a third possibility, the glyph name can be made from a prefix uni, followed by the code point as a hexadecimal number. In the case of Д, this gives uni0414. This third possibility is the most programmer friendly and is assumed by the optimiser as default. Luckily, Fira uses this convention and, therefore, we need not mess around with the others. Otherwise, we would have had to append to each Zeichen line a glyph name, for example, like this:
Zeichen 'дД' Decyrillic
You can find out the glyph names in a font using otfinfo from the LCDF Typetools.
At first glance, DHIATENSOR seems trivial. However, on the photographs of the Blickensderfer 5, one can see that only one Shift key is present. We account for this fact by a trick: First, each key gets only one level with a lower case letter. Uppercase letters are treated as dead key combinations. As symbol for the dead key, we use @, as this symbol does not occur in the English corpus. The optimiser requires two shift keys to be present in the layout. As all keys have only one level, these shift keys are simply never used.
Zeichen 'a' Zeichen '@' Ersatz 'A@a'
and so on. For the optimisation run to determine a reference layout, the dead key has been fixed. We also test a configuration variant with normal shift keys.
We run the optimiser with the appropriate corpus and the new configuration file:
./opt38 -2 bulgarisch.txt -K bulgarisch.cfg -k -t 4
After a while, we grab the last printed layout string
and copy it to our layout collection vsgalabow.txt
We enter the level-1 symbols in vsgalabow.txt according to their serial order:
The order is line-wise, from the upper left to the lower right. The reason for this is that in the configuration file, the keys have been specified in this order.
CIA is more difficult, as we cannot represent this layout exactly. For reasons of fairness, we want to place all symbols for which the position is not clear as good as possible. Initially, we enter all symbols into the file CIA.txt, according to their order. For this, we use the symbols of the second level (that is, typically, an uppercase letter) for each symbol for which the position is clear. For the four symbols that are not clear, we use the symbol from level 1. The relative order of these four symbols does not matter for now. For example:
In the next step, we create variations, which differ in at most four locations from this layout, where all symbols specified with level 2 remain at their place:
./opt38 -2 bulgarisch.txt -K bulgarisch.cfg -k -r CIA.txt -V 4
In other words, we create and evaluate all possibilities to place the four symbols in question. We take the best variation over to vsgalabow.txt.
We rely on the defaults of the optimiser and the English corpus that is included with it. We leave the umlauts in the configuration. As the corpus does not contain any, they do no harm.
We run the optimiser on our layout collection:
./opt38 -2 bulgarisch.txt -K bulgarisch.cfg -r vsgalabow.txt -g vsgalabow.ps CIA 592.901 total effort 348.957 positional effort left right ю э 8.979 same finger rp 8.947 shift same finger top 22.2 24.8 чшерт ъуиопящ 50.164 hand alternat. 48.746 shift hand alter. mid 21.3 7.5 асдфг хйкл'( 0.820 inward/outward 40.477 inward or outward bot 10.7 13.3 зжцвб нм,.; ь 19.379 adjacent 10.954 shift adjacent sum 54.3 45.7 16.6 5.7 12.3 19.7 --.- --.- 12.7 12.7 12.8 7.6 Sh 1.7 1.8 Galabow 378.591 total effort 277.552 positional effort left right ( . 3.255 same finger rp 5.970 shift same finger top 19.6 14.5 ,уеиш щксдзц; 78.539 hand alternat. 26.643 shift hand alter. mid 22.2 23.0 ьяаож гтнвмч 0.775 inward/outward 17.827 inward or outward bot 5.2 14.0 юйъэф хпрлб ' 11.459 adjacent 6.804 shift adjacent sum 47.0 53.0 4.0 3.4 21.4 18.3 --.- --.- 15.8 16.4 10.4 10.4 Sh 2.8 0.7 Optimal 247.325 total effort 188.220 positional effort left right ь э 1.598 same finger rp 7.216 shift same finger top 5.8 12.0 йъя.ц фкдлжгш 76.096 hand alternat. 33.134 shift hand alter. mid 38.9 29.1 иаеоз внтрсч 1.326 inward/outward 21.928 inward or outward bot 5.2 9.1 ('у,ю пбмщх ; 7.069 adjacent 16.878 shift adjacent sum 49.9 50.1 11.5 13.5 11.3 13.6 --.- --.- 18.6 13.0 8.6 10.0 Sh 2.3 1.3
Of course, it is always possible to doubt in the absolute efforts. It is more interesting to focus on the criteria typical for Dvorak. Comparing Galabov to CIA, same finger repetitions are reduced to about a third (factor 2.8), and could further be halved. Adjacent finger usage is almost halved (factor 1,7) and could be reduced further by about 40%. The relation of inward to outward motions is smaller for Galabov compared to CIA, however, due to the larger amount of hand alternations, the outward motions are fewer. Presumably, this aspect did not bother Galabov very much. The “optimal” layout can also increase the ratio of inward to outward motions considerably. Galabov achieves very many hand alternations, even more than the optimal layout.
To create the picture for Galabov used in this article, we first turn off the little keyboards in the PostScript file vsgalabow.ps, as they create a lot of detail which will not be properly resolved:
/mitmini false def % Zeige Minitastatur unter Buchstaben
When looking at the pictures in PostScript of PDF format, one can leave the little keyboards enabled, as it is possible to scale the images and, thereby, make the details discernible.
We first take a look at the layouts with a normal physical layout:
./opt35 -2 englisch.txt -r vsdvorak.txt -g vsdvorak.ps QWERTY 521.623 total effort 338.719 positional effort left right 6.803 same finger rp 5.978 shift same finger top 28.0 20.2 qwert yuiopü 52.746 hand alternat. 41.714 shift hand alter. mid 22.1 9.5 asdfg hjklöä 1.074 inward/outward 37.871 inward or outward bot 6.8 13.3 zxcvb nm,.ß 21.616 adjacent 11.945 shift adjacent sum 56.9 43.1 9.1 8.4 18.5 20.9 --.- --.- 18.4 8.9 12.1 3.6 Sh 1.1 1.7 Dvorak 280.208 total effort 202.011 positional effort left right 2.680 same finger rp 12.393 shift same finger top 6.0 16.8 ä,.py fgcrlö 70.460 hand alternat. 34.503 shift hand alter. mid 36.1 30.5 aoeui dhtnsß 1.601 inward/outward 24.280 inward or outward bot 3.0 7.6 üqjkx bmwvz 11.129 adjacent 19.799 shift adjacent sum 45.1 54.9 9.7 8.3 13.0 14.1 --.- --.- 16.5 13.3 13.7 11.4 Sh 1.8 0.9 Optimal 220.795 total effort 182.572 positional effort left right 0.762 same finger rp 2.044 shift same finger top 5.7 13.6 jyu.q zmldbp 68.797 hand alternat. 35.700 shift hand alter. mid 40.0 31.1 sieao hnrtcg 1.113 inward/outward 27.862 inward or outward bot 3.0 6.7 äöü,ß fxwkv 9.288 adjacent 10.195 shift adjacent sum 48.7 51.3 8.4 8.7 14.2 17.4 --.- --.- 16.4 11.5 13.1 10.3 Sh 1.8 1.0
Compared to QWERTY, Dvorak can reduce the same finger repetitions by a factor of 2.5. The amount of same finger repetition is smaller than for Galabov, however, the values have been obtained for different languages and are not comparable; apparently, Bulgarian is more difficult to type than English. Regarding same finger repetition, Galabov beats CIA more clearly than Dvorak beats QWERTY. The same finger repetitions could be further reduced by a factor of 3.5 compared to Dvorak; that is, Dvorak does not approach the limit of feasibility as close as Galabov does. Dvorak has about half as many adjacent finger use compared to QWERTY (factor 1.9), and the possible further reduction is not very large (less than 20%). Dvorak is clearly inward-dominant, and, in this respect, is more Dvorak-y than Galabov.
For comparison, DHIATENSOR, using a dead shift key:
./opt30 -2 englisch.txt -K dhiatensor.cfg -r vsdhiatensor.txt -g dhiatensor.ps DHIATENSOR 400.832 total effort 195.146 positional effort left right 7.272 same finger rp nan shift same finger top 2.9 2.7 zxkg bvqj 51.806 hand alternat. nan shift hand alter. mid 9.6 11.1 .pwfu lcmy 1.026 inward/outward 38.396 inward or outward bot 35.2 38.4 @dhiat ensor 14.728 adjacent nan shift adjacent sum 47.8 52.2 14.8 9.0 10.8 13.2 --.- --.- 17.2 10.9 9.2 14.9 Sh 0.0 0.0 Optimal 250.594 total effort 198.583 positional effort left right 1.227 same finger rp nan shift same finger top 6.6 2.2 qmfg .zjk 68.173 hand alternat. nan shift hand alter. mid 12.0 13.4 xvlcd ouyp 3.119 inward/outward 28.073 inward or outward bot 32.7 33.2 @wrnst aeihb 9.714 adjacent nan shift adjacent sum 51.3 48.7 11.8 13.3 11.7 14.5 --.- --.- 16.3 14.5 8.9 9.0 Sh 0.0 0.0
and using two shift keys:
./opt29 -K dhiatensor2.cfg -2 englisch.txt -r vsdhiatensor2.txt -g dhiatensor2.ps DHIATENSOR 405.967 total effort 200.796 positional effort left right 7.061 same finger rp 12.689 shift same finger top 2.9 2.7 zxkg bvqj 52.071 hand alternat. 50.175 shift hand alter. mid 9.6 11.1 .pwfu lcmy 0.947 inward/outward 38.249 inward or outward bot 33.7 39.9 dhiat ensor 14.762 adjacent 5.707 shift adjacent sum 46.2 53.8 13.2 9.0 10.8 13.2 --.- --.- 17.2 10.9 9.2 16.5 Sh 1.3 1.6 Optimal 248.166 total effort 201.153 positional effort left right 0.712 same finger rp 1.819 shift same finger top 5.0 1.6 qkwm .zjx 64.886 hand alternat. 39.012 shift hand alter. mid 16.1 13.5 bpdlh ouyf 1.104 inward/outward 31.783 inward or outward bot 27.7 36.1 gctrn aeisv 9.866 adjacent 9.411 shift adjacent sum 48.8 51.2 9.7 13.3 11.7 14.2 --.- --.- 16.3 14.5 8.9 11.5 Sh 1.1 1.7
DHIATENSOR does very badly regarding same finger repetitions. No way this layout can be considered as a precursor for Dvorak. It is a typical beginner’s layout, which is concerned with letter frequencies, but pays little attention to digrams.
Thanks to the phonetic relations in bulgarisch.cfg, we can evaluate the Bulgarian layouts also for an English corpus:
./opt38 -2 englisch.txt -K bulgarisch.cfg -r vsgalabow.txt -g en_vsgalabow.ps CIA 525.345 total effort 338.999 positional effort left right ю э 6.817 same finger rp 5.932 shift same finger top 27.8 20.0 чшерт ъуиопящ 52.899 hand alternat. 42.296 shift hand alter. mid 21.9 10.1 асдфг хйкл'( 1.063 inward/outward 37.732 inward or outward bot 7.1 13.2 зжцвб нм,.; ь 21.422 adjacent 11.423 shift adjacent sum 56.7 43.3 9.4 8.3 18.3 20.7 --.- --.- 18.2 8.8 12.0 4.3 Sh 1.4 1.7 Galabow 557.450 total effort 316.195 positional effort left right ( . 10.710 same finger rp 3.346 shift same finger top 23.9 14.0 ,уеиш щксдзц; 67.331 hand alternat. 37.888 shift hand alter. mid 14.9 20.6 ьяаож гтнвмч 0.967 inward/outward 19.407 inward or outward bot 5.9 19.4 юйъэф хпрлб ' 12.939 adjacent 7.244 shift adjacent sum 44.8 55.2 3.3 2.7 21.0 17.8 --.- --.- 17.8 19.0 8.5 9.8 Sh 2.1 1.0 Optimal 323.691 total effort 208.502 positional effort left right ь э 3.147 same finger rp 8.430 shift same finger top 6.0 14.1 йъя.ц фкдлжгш 69.810 hand alternat. 37.308 shift hand alter. mid 33.3 28.6 иаеоз внтрсч 0.561 inward/outward 24.491 inward or outward bot 6.1 11.9 ('у,ю пбмщх ; 10.927 adjacent 12.170 shift adjacent sum 45.4 54.6 8.9 10.0 14.1 12.4 --.- --.- 13.8 14.7 9.7 16.3 Sh 1.8 1.3
Here, Galabov does not score well. There are many same finger repetitions, mainly because of “th”, which gets mapped to the Cyrillic “тх” and is typed with one finger.
For Bulgarian, Galabov is clearly superior to CIA. Someone writing a lot of English and a little Bulgarian and who insists to rely on the phonetic relation of the layouts used is better off with CIA. However, it is possible to create layouts that are better for both languages than both Galabov and CIA. Whoever recommends Bulgaria to adopt CIA does not do so in the country’s interest.
To a large degree, Galabov’s layout meets Dvorak’s goals, and it is at least as well made as Dvorak. Our only source did not exaggerate. The work of Galabov (and his coworkers) is pioneering.
Why does this layout not get more recognition? Galabov was a stenographer, a practitioner. Along with his layout, he published instructions for typewriting. Presumably, he did not publish scientific works in which he described the criteria used to design his layout, and justified them experimentally. It is this kind of investigation that are Dvorak’s (and his coworker’s) contribution. For this reason, it is fine to mention Dvorak in the context of keyboard layouts; after the true pioneer, Teodor Galabov.