C string preprocessor for Emoji and Asian texts

The preprocessor is an input and output text string converter inside IconEdit. The input processor finds the C strings and makes the font. The output processor modifies the C strings to use the font on a left to right display. The converted C strings can be used directly by the compiler and the display. The processors has support for both left to right alphabets and right to left alphabets

Input converter for hexadecimal characters, Asian alphabets and classic 8-bit texts

The input reader converts C text strings to internal 16-bit Unicode.

  • Convert UTF-8 hexadecimal text strings to 16-bit Unicode.
  • Convert UTF-16 hexadecimal numbers in strings to 16-bit Unicode.
  • Convert UTF-32 hexadecimal numbers for emoji in strings to high plane Unicode surrogate characters.
  • Combine surrogate characters to find high plane characters such as emoji.
  • Move high plane characters to the private area in 16-bit Unicode.
  • Find combinations of characters, ligatures, and diacritics to make combined characters.
  • Find and add Arabic presentation characters.
  • Convert classic 8-bit encoded text strings to 16-bit Unicode.

After the input conversion, IconEdit creates all characters.

In this example read a C like pseudocode file with only the two lines

wchar32 szSmile[]={L"Smiley স্মাইলি \U0001F603 !"};
wchar32 szCable[]={L"Cable Car ಕೇಬಲ್ ಕಾರು \U0001F6A1"};

The input converter ignores anything outside the double quotes:

High Plane Emoji in 16-bit Font

Above is the resulting font optimized for the text strings. The input converter moves high plane and combined characters to the private area E700 to F8FF. IconEdit always orders the characters in the font alphabetically according to Unicode. The new Unicode character value is shown above each character.

High Plane Emoji in 16-bit Text

This is how the text will look with the font. Only the text inside the string is in the font, the rest is there for orientation.

Output converter for hexadecimal characters

The output converter writes the internal 16-bit Unicode as C text strings to a file. Texts with right to left characters are prepared for left to right displays. The output file is linked to the font and the two should be used together by the compiler.

Convert the input text Smiley স্মাইলি \U0001F603 ! to one of the following output formats:

  • Smiley ই  ! Pure Unicode with all characters written as 16-bit Unicode. The private characters are not defined in Unicode so may be shown as block characters in a normal editor.
  • Smiley \xE700ই\xE701 \xE706 ! Pure Unicode with private characters as 16-bit hexadecimal. This makes the file easier to read for humans but makes no difference to the compiler.
  • Smiley \xE700\x0987\xE701 \xE706 ! UTF-16 hexadecimal for old editors that can not read Unicode. This is still Unicode to the compiler.
  • Smiley \xEE\x9C\x80\xE0\xA6\x87\xEE\x9C\x81 \xEE\x9C\x86 ! UTF-8 hexadecimal for old 8-bit compilers that can not understand Unicode strings. To the compiler, this is an 8-bit classic text. Use the UTF-8 option in the RAMTEX driver library to display the text as Unicode. This way it is possible to use 16-bit Unicode texts and fonts by an 8-bit compiler.

Memory consumption for different output formats

UTF-16 hexadecimal and pure Unicode always uses 2.0 byte per character ROM space.

UTF-8 heaxdecimal take up different amounts of ROM space per character depending on language and alphabet:

  • 1.0 byte per character: Amecican English.
  • 1.1 - 1.3 byte per character: Other languages written with the Latin alphabet.
  • 2.0 - 2.2 byte per character: Other European and Middle Eastern languages except Arabic.
  • 2.6 - 2.9 byte per character: Arabic and South Asiatic languages.
  • 3.0 byte per character: Chinese, Japanese, and Korean.

Trace characters through the process with the mouse help and blue marks

Blue marks can be set by the mouse and follow the character through all windows. Use mouse help in all windows to see how the character is created:

High Plane Emoji in 16-bit Font
Font with blue high-light for selected character.

High Plane Emoji in 16-bit Text
Text with blue frame around selected character.

High Plane Emoji in 16-bit Text
Output text with private characters as 16-bit hexadecimal.

High Plane Emoji in 16-bit Text
UTF-8 hexadecimal output text, selected character takes up 3 byte.

Both mouse help and blue marks can be turned off and on at any time.

Other editing and conversion functions

IconEdit let you convert fonts and images to C-source code Save Fonts and Symbols as C-source code

Many Alphabets in a Text Optimized Font International Fonts for multiple languages

Many Alphabets in a Text Optimized Font Middle Eastern and South Asian fonts

ROM Optimized Fonts Creating ROM Optimized Fonts

ROM Optimized Characters and Images Color Optimization of Characters and Symbols

Graphic drawing and image conversion Graphic drawing and image conversion

High Plane Emoji in UTF-8 Text C string pre-processor for Emoji and Asian texts

IconEdit font and symbol editor home IconEdit font and symbol editor home

IconEdit let you convert fonts and images to C-source code Download the full IconEdit package as a zip file.