Overview Examples Materials Deutsch

Teodor Galabov’s Bulgarian keyboard layout

Introduction

The German language Wikipedia mentions the Bulgarian keyboard layout by Teodor Galabov from the year 1907 as an early “alternative” layout. An ergonomic keyboard layout, 25 years before Dvorak? The entry in Wikipedia refers to the article The Bulgarian Alphabet and Keyboard in the Context of EU Communications. According to Wikipedia, this article has been published by the working group MEEK of the Comité Européen de Normalisation. Meanwhile, it has disappeared from its original location, the working group MEEK has been disbanded. The article itself mentions no authors. At least, in section 5, there is a reference to an investigation (ANABELA) of the layout from the year 2007, and some names of participants are mentioned, but references to related publications are missing.

A disappeared article by unknown authors, which does not cite its sources clearly, published by a disbanded working group: It is sad what is known about Galabov’s work in the west. At least, the layout is well-known, because it is the Bulgarian standard. Thanks to this, we are able to judge ourselves whether Galabov’s work was a pioneering act, or whether our only source exaggerates its merits. We proceed as follows: We compare Galabov’s layout to a current competitor (a phonetic transcription of QWERTY to Cyrillic, called “CIA” in what follows) and a self optimised layout, using a Bulgarian corpus. CIA is about what the Bulgarians would have if they would have simply ripped off Sholes, as the rest of the world did. In parallel to this, Dvorak is compared to QWERTY and to a self optimised layout, using an English corpus. Then, we investigate the improvement of Galabov and Dvorak compared to the Sholes layouts, and how much they stay behind what is “feasible”. We judge feasibility according to the self optimised layouts. As the optimisation takes into account different criteria, these layouts certainly are not the absolute limit with respect to each individual criterion, but demonstrate what is possible in a well-balanced layout. Finally, we take a look at DHIATENSOR, an English layout of 1893 or even earlier, to investigate if already this layout was “ergonomic” by Dvorak’s standards.

Prerequisites

I assume that you have downloaded the optimiser and are roughly familiar with it. The optimiser and its documentation can be found on the overview page. For the present article, I have used version 1.227. Here you can find a collection of files that help to reproduce this article more easily:

Download supporting material

My web pages do not instruct your browser about which fonts it should use. If the fonts in your default settings do not support Cyrillic, this article might be displayed incorrectly. This should not be a big problem, as there are plenty of free fonts that cover Cyrillic and work well for on-screen usage, for example, the DejaVu family.

The Bulgarian corpus

A good starting point for corpora is the collection of the Universität Leipzig, which contains large corpora for many languages. I choose bul_news_2007_30K-sentences.text.tar.gz, 30000 sentences taken from newspaper articles. The file is already UTF-8 encoded, just as required by the optimiser. Each sentence appears on a line of its own, and each sentence is preceded by a number, separated by a tabulator character. We must get rid of the number, which is easy by using Unix text tools:

cut -f2- bul_news_2007_30K-sentences.txt > bulgarisch.txt

Now we can create the frequency tables:

./opt bulgarisch.txt

The corpus is large and, therefore, we do not worry about statistical errors.

The Bulgarian set of characters and the physical keyboard layout

As I do not know Bulgarian, and as I do know very few Cyrillic letters, for me, the hardest part is to pick the characters for the Bulgarian layout and to enter them correctly into the configuration file. As my source of information, I use Keyboard layouts for Bulgarian language writing devices. This article describes Galabov and CIA, and it provides the Unicode codepoints explicitly.

Physical layout

On the less-than key, both layouts have an ѝ, which does not appear in the corpus and, therefore, can be left out. We want to have all letters, as well as period and comma in the layout, and to this end include all keys to which such a symbol is mapped in one or the other layout. Compared to the standard configuration file, we need four additional keys,

Taste  TLDE   0  0   -0.25  0   -5   -  10
Taste  AE12  14  0   11.75  0    5   -  10
Taste  AD12  14  1   12.25  1    5   -   8
Taste  AB12  14  3   13.00  3    5   -  10

TLDE is in the upper left corner, AE12 is to the right of backspace, AD12 is the key to the upper left of return (ISO) and AB12 is to the right of the right shift key. The key names TLDE, AE12, AD12, and AB12 are inspired by xkeyboard-config, however, for our purposes do not matter.

Set of symbols

Unfortunately, this is not sufficient to describe both layouts in a uniform manner. The symbols э and ы are missing from CIA and, furthermore, the pairing of symbols is not identical. For combinations with conflicts, I have decided for Galabov. Therefore, I use

Zeichen '()' # TLDE
Zeichen '.€' # AE12
Zeichen ',ы' # AD01
Zeichen ';§' # AD12
Zeichen :'": #"AB12

The remaining symbols pose no problem. The comments above denote the position of the symbols in Galabov. For э, the round parentheses, the semicolon, and the quotation marks it is not clear where they should be mapped to in CIA. More on this topic below.

In the standard configuration, 35 keys are used. With the four additional keys, but without the space key, we now need 38 keys and, therefore, must compile the optimiser accordingly:

g++ -std=c++11 -Ofast -DTASTENZAHL=38 -DNDEBUG -DENGLISH -DMIT_THREADS -pthread opt.cc -o opt38

Phonetic relations

CIA uses a phonetic relation between Latin and Cyrillic letters. We can express such relations using Ersatz. For example,

Ersatz 'pп'

means that the Latin “p” is typed as the Cyrillic “п”. Therefore, we can evaluate how well a Cyrillic layout is suited for entering an English text, assuming this phonetic relation is used.

Font and glyph names for the graphics

If you want to create graphics, you must select a font that contains the required symbols. I have chosen Fira:

Zeichenfont       FiraMono-Medium
Beschreibungsfont FiraSans-Book

Actually, I would have had to worry only about Zeichenfont, as from Beschreibungsfont, only the digits are used. You should take care that the PostScript interpreter can find the fonts. For ghostscript, one can assign the environment variable GS_FONTPATH accordingly:

export GS_FONTPATH=/home/myusername/myfonts

In general, using an extended set of symbols is not that easy. The problem is that one must tell PostScript which glyph to use for a particular symbol. Glyphs have names, and for the glyph names, standards exist, unfortunately, multiple standards. Two of them are described in the Adobe glyph lists. According to this, the glyph for Д can have the name Decyrillic or afii10021. As a third possibility, the glyph name can be made from a prefix uni, followed by the code point as a hexadecimal number. In the case of Д, this gives uni0414. This third possibility is the most programmer friendly and is assumed by the optimiser as default. Luckily, Fira uses this convention and, therefore, we need not mess around with the others. Otherwise, we would have had to append to each Zeichen line a glyph name, for example, like this:

Zeichen 'дД' Decyrillic

You can find out the glyph names in a font using otfinfo from the LCDF Typetools.

DHIATENSOR

At first glance, DHIATENSOR seems trivial. However, on the photographs of the Blickensderfer 5, one can see that only one Shift key is present. We account for this fact by a trick: First, each key gets only one level with a lower case letter. Uppercase letters are treated as dead key combinations. As symbol for the dead key, we use @, as this symbol does not occur in the English corpus. The optimiser requires two shift keys to be present in the layout. As all keys have only one level, these shift keys are simply never used.

Zeichen 'a'
Zeichen '@'
Ersatz 'A@a'

and so on. For the optimisation run to determine a reference layout, the dead key has been fixed. We also test a configuration variant with normal shift keys.

Creating the layouts for the comparison

Optimised layout

We run the optimiser with the appropriate corpus and the new configuration file:

./opt38 -2 bulgarisch.txt -K bulgarisch.cfg -k -t 4

After a while, we grab the last printed layout string

245.3710 ьэйъя.,фкдлжгхиаеозвнтрсч('юу;пбмщцш

and copy it to our layout collection vsgalabow.txt

ьэйъя.,фкдлжгхиаеозвнтрсч('юу;пбмщцш  Optimal

Galabov

We enter the level-1 symbols in vsgalabow.txt according to their serial order:

(.,уеишщксдзц;ьяаожгтнвмчюйъэфхпрлб'  Galabow

The order is line-wise, from the upper left to the lower right. The reason for this is that in the configuration file, the keys have been specified in this order.

CIA

CIA is more difficult, as we cannot represent this layout exactly. For reasons of fairness, we want to place all symbols for which the position is not clear as good as possible. Initially, we enter all symbols into the file CIA.txt, according to their order. For this, we use the symbols of the second level (that is, typically, an uppercase letter) for each symbol for which the position is clear. For the four symbols that are not clear, we use the symbol from level 1. The relative order of these four symbols does not matter for now. For example:

Ю(ЧШЕРТЪУИОПЯЩАСДФГХЙКЛ;'ЗЖЦВБНМы€эѝ  CIA

In the next step, we create variations, which differ in at most four locations from this layout, where all symbols specified with level 2 remain at their place:

./opt38 -2 bulgarisch.txt -K bulgarisch.cfg -k -r CIA.txt -V 4

In other words, we create and evaluate all possibilities to place the four symbols in question. We take the best variation over to vsgalabow.txt.

юэчшертъуиопящасдфгхйкл'(зжцвбнм,.;ь  CIA

English layouts

We rely on the defaults of the optimiser and the English corpus that is included with it. We leave the umlauts in the configuration. As the corpus does not contain any, they do no harm.

Results

Bulgarian layouts for Bulgarian

We run the optimiser on our layout collection:

./opt38 -2 bulgarisch.txt -K bulgarisch.cfg -r vsgalabow.txt -g vsgalabow.ps

CIA              592.901 total effort   348.957 positional effort    left right
ю             э    8.979 same finger rp   8.947 shift same finger top 22.2 24.8
  чшерт ъуиопящ   50.164 hand alternat.  48.746 shift hand alter. mid 21.3  7.5
  асдфг хйкл'(     0.820 inward/outward  40.477 inward or outward bot 10.7 13.3
  зжцвб нм,.; ь   19.379 adjacent        10.954 shift adjacent    sum 54.3 45.7
                 16.6  5.7 12.3 19.7 --.- --.- 12.7 12.7 12.8  7.6 Sh  1.7  1.8


Galabow          378.591 total effort   277.552 positional effort    left right
(             .    3.255 same finger rp   5.970 shift same finger top 19.6 14.5
  ,уеиш щксдзц;   78.539 hand alternat.  26.643 shift hand alter. mid 22.2 23.0
  ьяаож гтнвмч     0.775 inward/outward  17.827 inward or outward bot  5.2 14.0
  юйъэф хпрлб '   11.459 adjacent         6.804 shift adjacent    sum 47.0 53.0
                  4.0  3.4 21.4 18.3 --.- --.- 15.8 16.4 10.4 10.4 Sh  2.8  0.7


Optimal          247.325 total effort   188.220 positional effort    left right
ь             э    1.598 same finger rp   7.216 shift same finger top  5.8 12.0
  йъя.ц фкдлжгш   76.096 hand alternat.  33.134 shift hand alter. mid 38.9 29.1
  иаеоз внтрсч     1.326 inward/outward  21.928 inward or outward bot  5.2  9.1
  ('у,ю пбмщх ;    7.069 adjacent        16.878 shift adjacent    sum 49.9 50.1
                 11.5 13.5 11.3 13.6 --.- --.- 18.6 13.0  8.6 10.0 Sh  2.3  1.3

Of course, it is always possible to doubt in the absolute efforts. It is more interesting to focus on the criteria typical for Dvorak. Comparing Galabov to CIA, same finger repetitions are reduced to about a third (factor 2.8), and could further be halved. Adjacent finger usage is almost halved (factor 1,7) and could be reduced further by about 40%. The relation of inward to outward motions is smaller for Galabov compared to CIA, however, due to the larger amount of hand alternations, the outward motions are fewer. Presumably, this aspect did not bother Galabov very much. The “optimal” layout can also increase the ratio of inward to outward motions considerably. Galabov achieves very many hand alternations, even more than the optimal layout.

To create the picture for Galabov used in this article, we first turn off the little keyboards in the PostScript file vsgalabow.ps, as they create a lot of detail which will not be properly resolved:

/mitmini false def       % Zeige Minitastatur unter Buchstaben
Graphical evaluation of Galabov

When looking at the pictures in PostScript of PDF format, one can leave the little keyboards enabled, as it is possible to scale the images and, thereby, make the details discernible.

English layouts for English

We first take a look at the layouts with a normal physical layout:

./opt35 -2 englisch.txt -r vsdvorak.txt -g vsdvorak.ps

QWERTY           521.623 total effort   338.719 positional effort    left right
                   6.803 same finger rp   5.978 shift same finger top 28.0 20.2
  qwert yuiopü    52.746 hand alternat.  41.714 shift hand alter. mid 22.1  9.5
  asdfg hjklöä     1.074 inward/outward  37.871 inward or outward bot  6.8 13.3
  zxcvb nm,.ß     21.616 adjacent        11.945 shift adjacent    sum 56.9 43.1
                  9.1  8.4 18.5 20.9 --.- --.- 18.4  8.9 12.1  3.6 Sh  1.1  1.7


Dvorak           280.208 total effort   202.011 positional effort    left right
                   2.680 same finger rp  12.393 shift same finger top  6.0 16.8
  ä,.py fgcrlö    70.460 hand alternat.  34.503 shift hand alter. mid 36.1 30.5
  aoeui dhtnsß     1.601 inward/outward  24.280 inward or outward bot  3.0  7.6
  üqjkx bmwvz     11.129 adjacent        19.799 shift adjacent    sum 45.1 54.9
                  9.7  8.3 13.0 14.1 --.- --.- 16.5 13.3 13.7 11.4 Sh  1.8  0.9


Optimal          220.795 total effort   182.572 positional effort    left right
                   0.762 same finger rp   2.044 shift same finger top  5.7 13.6
  jyu.q zmldbp    68.797 hand alternat.  35.700 shift hand alter. mid 40.0 31.1
  sieao hnrtcg     1.113 inward/outward  27.862 inward or outward bot  3.0  6.7
  äöü,ß fxwkv      9.288 adjacent        10.195 shift adjacent    sum 48.7 51.3
                  8.4  8.7 14.2 17.4 --.- --.- 16.4 11.5 13.1 10.3 Sh  1.8  1.0

Compared to QWERTY, Dvorak can reduce the same finger repetitions by a factor of 2.5. The amount of same finger repetition is smaller than for Galabov, however, the values have been obtained for different languages and are not comparable; apparently, Bulgarian is more difficult to type than English. Regarding same finger repetition, Galabov beats CIA more clearly than Dvorak beats QWERTY. The same finger repetitions could be further reduced by a factor of 3.5 compared to Dvorak; that is, Dvorak does not approach the limit of feasibility as close as Galabov does. Dvorak has about half as many adjacent finger use compared to QWERTY (factor 1.9), and the possible further reduction is not very large (less than 20%). Dvorak is clearly inward-dominant, and, in this respect, is more Dvorak-y than Galabov.

For comparison, DHIATENSOR, using a dead shift key:

./opt30 -2 englisch.txt -K dhiatensor.cfg -r vsdhiatensor.txt -g dhiatensor.ps

DHIATENSOR       400.832 total effort   195.146 positional effort    left right
                   7.272 same finger rp     nan shift same finger top  2.9  2.7
   zxkg bvqj      51.806 hand alternat.     nan shift hand alter. mid  9.6 11.1
  .pwfu lcmy       1.026 inward/outward  38.396 inward or outward bot 35.2 38.4
 @dhiat ensor     14.728 adjacent           nan shift adjacent    sum 47.8 52.2
                 14.8  9.0 10.8 13.2 --.- --.- 17.2 10.9  9.2 14.9 Sh  0.0  0.0


Optimal          250.594 total effort   198.583 positional effort    left right
                   1.227 same finger rp     nan shift same finger top  6.6  2.2
   qmfg .zjk      68.173 hand alternat.     nan shift hand alter. mid 12.0 13.4
  xvlcd ouyp       3.119 inward/outward  28.073 inward or outward bot 32.7 33.2
 @wrnst aeihb      9.714 adjacent           nan shift adjacent    sum 51.3 48.7
                 11.8 13.3 11.7 14.5 --.- --.- 16.3 14.5  8.9  9.0 Sh  0.0  0.0

and using two shift keys:

./opt29 -K dhiatensor2.cfg -2 englisch.txt -r vsdhiatensor2.txt -g dhiatensor2.ps

DHIATENSOR       405.967 total effort   200.796 positional effort    left right
                   7.061 same finger rp  12.689 shift same finger top  2.9  2.7
   zxkg bvqj      52.071 hand alternat.  50.175 shift hand alter. mid  9.6 11.1
  .pwfu lcmy       0.947 inward/outward  38.249 inward or outward bot 33.7 39.9
  dhiat ensor     14.762 adjacent         5.707 shift adjacent    sum 46.2 53.8
                 13.2  9.0 10.8 13.2 --.- --.- 17.2 10.9  9.2 16.5 Sh  1.3  1.6


Optimal          248.166 total effort   201.153 positional effort    left right
                   0.712 same finger rp   1.819 shift same finger top  5.0  1.6
   qkwm .zjx      64.886 hand alternat.  39.012 shift hand alter. mid 16.1 13.5
  bpdlh ouyf       1.104 inward/outward  31.783 inward or outward bot 27.7 36.1
  gctrn aeisv      9.866 adjacent         9.411 shift adjacent    sum 48.8 51.2
                  9.7 13.3 11.7 14.2 --.- --.- 16.3 14.5  8.9 11.5 Sh  1.1  1.7

DHIATENSOR does very badly regarding same finger repetitions. No way this layout can be considered as a precursor for Dvorak. It is a typical beginner’s layout, which is concerned with letter frequencies, but pays little attention to digrams.

Bulgarian layouts for English

Thanks to the phonetic relations in bulgarisch.cfg, we can evaluate the Bulgarian layouts also for an English corpus:

./opt38 -2 englisch.txt -K bulgarisch.cfg -r vsgalabow.txt -g en_vsgalabow.ps

CIA              525.345 total effort   338.999 positional effort    left right
ю             э    6.817 same finger rp   5.932 shift same finger top 27.8 20.0
  чшерт ъуиопящ   52.899 hand alternat.  42.296 shift hand alter. mid 21.9 10.1
  асдфг хйкл'(     1.063 inward/outward  37.732 inward or outward bot  7.1 13.2
  зжцвб нм,.; ь   21.422 adjacent        11.423 shift adjacent    sum 56.7 43.3
                  9.4  8.3 18.3 20.7 --.- --.- 18.2  8.8 12.0  4.3 Sh  1.4  1.7


Galabow          557.450 total effort   316.195 positional effort    left right
(             .   10.710 same finger rp   3.346 shift same finger top 23.9 14.0
  ,уеиш щксдзц;   67.331 hand alternat.  37.888 shift hand alter. mid 14.9 20.6
  ьяаож гтнвмч     0.967 inward/outward  19.407 inward or outward bot  5.9 19.4
  юйъэф хпрлб '   12.939 adjacent         7.244 shift adjacent    sum 44.8 55.2
                  3.3  2.7 21.0 17.8 --.- --.- 17.8 19.0  8.5  9.8 Sh  2.1  1.0


Optimal          323.691 total effort   208.502 positional effort    left right
ь             э    3.147 same finger rp   8.430 shift same finger top  6.0 14.1
  йъя.ц фкдлжгш   69.810 hand alternat.  37.308 shift hand alter. mid 33.3 28.6
  иаеоз внтрсч     0.561 inward/outward  24.491 inward or outward bot  6.1 11.9
  ('у,ю пбмщх ;   10.927 adjacent        12.170 shift adjacent    sum 45.4 54.6
                  8.9 10.0 14.1 12.4 --.- --.- 13.8 14.7  9.7 16.3 Sh  1.8  1.3

Here, Galabov does not score well. There are many same finger repetitions, mainly because of “th”, which gets mapped to the Cyrillic “тх” and is typed with one finger.

Conclusion

For Bulgarian, Galabov is clearly superior to CIA. Someone writing a lot of English and a little Bulgarian and who insists to rely on the phonetic relation of the layouts used is better off with CIA. However, it is possible to create layouts that are better for both languages than both Galabov and CIA. Whoever recommends Bulgaria to adopt CIA does not do so in the country’s interest.

To a large degree, Galabov’s layout meets Dvorak’s goals, and it is at least as well made as Dvorak. Our only source did not exaggerate. The work of Galabov (and his coworkers) is pioneering.

Why does this layout not get more recognition? Galabov was a stenographer, a practitioner. Along with his layout, he published instructions for typewriting. Presumably, he did not publish scientific works in which he described the criteria used to design his layout, and justified them experimentally. It is this kind of investigation that are Dvorak’s (and his coworker’s) contribution. For this reason, it is fine to mention Dvorak in the context of keyboard layouts; after the true pioneer, Teodor Galabov.

Version 02. Feb 2019Impressum