C string preprocessor for Emoji and Asian texts with Diacritics

The C string preprocessor is an input and output text string converter inside IconEdit.

The input processor finds the C strings in text catalogs and makes fonts.

The output processor modifies the C strings to use the fonts on a normal left to right display with minimum embedded processor overhead.

C string preprocessor for Emoji and Asian texts

The font and the converted C strings can be used directly by the compiler and the display drivers from RAMTEX.

The C string preprocessor has Diacritic support for the following languages and alphabets:

Arabic, Bangla, Bengali, Bodo, Buginese, Burmese, Cambodian, Dari, Devanagari, Farsi, Gujariti, Gurmukhi, Hebrew, Hindi, Kannada, Khmer, Konkani, Lao, Marathi, Mayalam, Myanmar, N'Ko, Oriya, Persian, Punjabi,Sinhala, Syriac, Tamil, Telugu, Thaana, Thai, Urdu.

Input C string converter for hexadecimal characters, Asian alphabets and classic 8-bit texts

The input preprocessor converts C text strings to internal 16-bit Unicode in IconEdit.

  • Convert UTF-8 hexadecimal text strings to 16-bit Unicode.
  • Convert UTF-16 hexadecimal numbers in strings to 16-bit Unicode.
  • Convert UTF-32 hexadecimal numbers for high plane emoji in strings to Unicode surrogate characters.
  • Combine surrogate characters to find high plane characters such as emoji.
  • Move high plane characters to the private area in 16-bit Unicode.
  • Find combinations of characters, ligatures, and diacritics to make combined characters.
  • Find and add Arabic presentation characters.
  • Convert classic 8-bit encoded text strings to 16-bit Unicode.

After the input conversion, IconEdit creates all necessary characters for the text strings as one font.

In this example IconEdit read and convert a C like pseudocode file with only the two lines:

wchar32 szSmile[]={L"Smiley স্মাইলি \U0001F603 !"};
wchar32 szCable[]={L"Cable Car ಕೇಬಲ್ ಕಾರು \U0001F6A1"};

The input converter ignores anything outside the double quotes.

The resulting font optimized for the text strings:

High Plane Emoji in 16-bit Font

The input converter moves high plane and combined characters to the private area E700 to F8FF in Unicode.

IconEdit always orders the characters in the font alphabetically according to Unicode.

The new Unicode character value (code point) is shown above each character.

The text is shown automatically with the font:

High Plane Emoji in 16-bit Text

Only the text inside the string is in the font, the rest is there for orientation.

Output C string converter for diacritics and presentation characters

Combinations of basic characters and diacritics in the input strings are substituted with the combined characters in the private area.

Basic Arabic characters are substituted with presentation characters according to their position in the word.

Text strings with right to left characters are mirrored for left to right displays.

The output file with the converted text strings is linked to the font and the two should be used together by the compiler and the display.

Output C string converter for hexadecimal characters

IconEdit can convert the input text Smiley স্মাইলি \U0001F603 ! to one of the following output formats:

  • Smiley \xE700ই\xE701 \xE706 ! Pure Unicode with private characters as 16-bit hexadecimal. This makes the text string easier to read for humans but makes no difference to the compiler.
  • Smiley \xE700\x0987\xE701 \xE706 ! UTF-16 hexadecimal for old editors that can not read Unicode. This is still Unicode to the compiler.
  • Smiley \xEE\x9C\x80\xE0\xA6\x87\xEE\x9C\x81 \xEE\x9C\x86 ! UTF-8 hexadecimal for old 8-bit compilers that can not understand Unicode strings. To the compiler, this is an 8-bit classic text. Use the UTF-8 option in the RAMTEX driver library to display the text as Unicode. This way it is possible to use 16-bit Unicode texts and fonts by an 8-bit compiler.

Memory consumption for different string formats

UTF-16 hexadecimal and pure Unicode always uses 2.0 byte per character ROM space.

UTF-8 heaxdecimal take up different amounts of ROM space per character depending on language and alphabet:

  • 1.0 byte per character: Amecican English.
  • 1.1 - 1.3 byte per character: Other languages written with the Latin alphabet.
  • 2.0 - 2.2 byte per character: Other European and Middle Eastern languages except Arabic.
  • 2.6 - 2.9 byte per character: Arabic and South Asiatic languages.
  • 3.0 byte per character: Chinese, Japanese, and Korean.

The connection between C string formats and text file formats

Windows can save plain text in 4 different file formats:

Windows can save plain text in 4 different formats

  • ANSI 8 Bit One byte per character classic 8-bit encoding of 256 characters for a few languages. Text is only portable to a limited number of countries.
  • Unicode little endian 16 Bit Two byte per character with least significant byte first Unicode encoding of 65536 characters for all living languages. Text is portable anywhere.
  • Unicode big endian 16 Bit Two byte per character with least significant byte last Unicode encoding of 65536 characters for all living languages. Text is portable anywhere.
  • Unicode UTF-8 8-24 Bit Between one and three bytes per character Unicode encoding of 65536 characters for all living languages. Text is portable anywhere.

IconEdit can save C-source text strings in 4 different string formats:

IconEdit can save C-source text strings text in 4 different formats

  • Unicode 16 Bit String texts and comments stay as 16-bit Unicode characters and are saved as Unicode text files. Both strings and comments are portable anywhere.
  • Unicode 16 Bit Hexadecimal String texts are converted to 16-bit hexadecimal characters and comments are converted to a Classic 8-bit ANSI or ISO-8859 encoding of your choice. The string texts are still Unicode to the compiler, but encapsulated in 7-bit ASCII, so it can be saved as ANSI text files. Strings are portable anywhere.
  • Unicode UTF-8 8-24 Bit Hexadecimal String texts are converted to one or more UTF-8 8-bit hexadecimal characters and comments are converted to a Classic 8-bit ANSI or ISO-8859 encoding of your choice. The string texts are still Unicode to the compiler, but encapsulated in 7-bit ASCII, so it can be saved as ANSI text files. Strings are portable anywhere.
  • Classic 8 Bit Both String texts and comments are converted to a Classic 8-bit ANSI or ISO-8859 encoding of your choice and saved as ANSI text files. Strings and comment texts are only portable to a limited number of countries.

Trace characters through the process with the mouse help and blue marks

Blue marks can be set by the mouse and follow the character through all windows.

Use mouse help in all windows to see how the character is created and used.

Font with blue high-light for selected character:

High Plane Emoji in 16-bit Font

Text with blue frame around selected character. Mouse help has an additional text length indicator so you can see if the text will fit the target display:

High Plane Emoji in 16-bit Text

Output text with private characters as 16-bit hexadecimal:

High Plane Emoji in 16-bit Text

UTF-8 hexadecimal output text, selected character takes up 3 byte:

High Plane Emoji in 16-bit Text

Both mouse help and blue marks can be turned off and on at any time.

Other editing and conversion functions

IconEdit let you convert fonts and images to C-source code Save Fonts and Symbols as C-source code. Convert vector fonts to raster fonts. Convert fonts and images to c-source format.

Many Alphabets in a Text Optimized Font International Fonts for multiple languages. Create text optimized fonts for many languages. Select necessary languages directly.

ROM Optimized Fonts Creating ROM Optimized Fonts and Symbols. Achieve significant ROM savings for alphabets with a very large number of characters.

Many Alphabets in a Text Optimized Font Middle Eastern and South Asian fonts. Special support for right to left alphabets on simple left to right display systems.

High Plane Emoji in UTF-8 Text C string pre-processor for Emoji and Asian texts. The preprocessor is an input and output text string converter inside IconEdit. The preprocessor finds the C strings in text catalogs and makes fonts.

ROM Optimized Characters and Images Color Optimization of Characters and Symbols. Find the right balance between color resolution and memory size.

Graphic drawing and image conversion Graphic drawing and image conversion. See and edit icons, characters, and fonts with exactly the same pixel and color resolution as used by the real display module in the target system.

Save time - Start working before you make your own fonts Start working before you make your own fonts. Save start-up time by using existing to C-source code fonts.

Save time - Start working before you buy a license Start working before you buy a license. All symbol, font, and text-string data can be saved in one or more project files.

IconEdit font and symbol editor home IconEdit font and symbol editor home