Skip to content

Backspace and cluster deletion

Marc Durdin edited this page Sep 27, 2023 · 2 revisions

Occasionally, a text editor will be released or updated, that has issues working with Keyman and other language input methods, specifically related to when a backspace key is 'pressed' or issued.

Summary:

The problem revolves around the handling of deletion of clusters vs. individual code points. Instead of deleting only the code point preceding the insertion point, some editors delete the entire cluster when the backspace key is pressed.

With left-to-right(LTR) language you have a keyboard key sequence that inserts combined code points such as “क" and “ ्”, resulting in a cluster or grapheme “क्”. The cursor is positioned to the right of the cluster क्. When the backspace key is pressed, the expected behaviour should be to delete the code point " ्" while leaving "क" intact. However, some editors in recognising the cluster delete both code points on the single backspace key press.

A more extreme example can be seen in some languages, such as Khmer, where a single backspace can end up deleting a whole syllable which took up to 7 keystrokes to type.

Why this is a problem?

There are two reasons:

  1. It is unfriendly to the end user, because the backspacing does not match the user's input expectations, nor conform to the pattern of the input method.
  2. For applications that are 'non-compliant' with modern input protocols/APIs, Keyman runs in a legacy compatibility mode, in which it emits backspace key events to delete characters one codepoint at a time. Applications which delete multiple codepoints from a single backspace event will break this legacy compatibility mode.

Legacy compatibility mode details

Keyman works as a rules-based input method: it maintains the context of what has been entered. If a certain codepoint sequence matches a rule then it will cause the correct number of codepoints already output to be deleted, using backspaces, and replaced by a new codepoint or sequence of codepoints. When the whole cluster and not a single character is deleted too many characters are removed before inserting the new characters. The end result to the user is that text is now jumbled. See the Malayalam example below.

Correct Behaviour

In Windows, the base-level behaviour with backspace is to delete a single codepoint. The decision of how much to delete with backspace should be the responsibility of the input method. This behaviour is the norm in Windows edit and richedit controls, Microsoft applications, including Office and Edge, and the majority of third-party applications.

Example: Malayalam

സ്പീ

Correct

Malayalam Key Unicode Notes
സ് s 0D38 0D4D
സ്പ് sp 0D38 0D4D 0D2A 0D4D
സ്പെ spe 0D38 0D4D 0D2A 0D46
സ്പീ spee 0D38 0D4D 0D2A 0D40 1 x bksp has deleted 1 codepoint

Error

Malayalam Key Unicode Notes
സ് s 0D38 0D4D
സ്പ് sp 0D38 0D4D 0D2A 0D4D
സ്പെ spe 0D38 0D4D 0D2A 0D46
സ്ീ spee 0D38 0D4D 0D40 1 x bksp has deleted 2 codepoints (ERR)

Devanagari example

a. क + ् + क → क्क

b. क + ् + त → क्त

In row typing क + ् → क्

क् + क → क्क

क्क + backspace → क् Could then obtain row b. output by entering त क् + त → क्त When the whole cluster is deleted by backspace you lose the whole cluster क्क + backspace →

Are there exceptions in which removing a cluster is acceptable?

  • Emojis
Clone this wiki locally