binFontsTool - Excel based multi-libraries fonts editor and converter

Status
Not open for further replies.

DenSyo

New member
Added support in my editor for ILI9341_t3 v.1 fonts format. Also support Adafruit, UTFT, OLED_I2C and Squix.

binFontsTool

While there is no support for unicode tables, if someone helps with an explanation of how it works, it will speed up the addition of its support to editor.
I will also gladly add to editor the tables of correspondence of the encodings for any regional languages, if you help them make up for your language. Tables are on the sheet Encoding, there is a small explanation on the sheet Manual.
 
What aspect of Unicode do you need an explanation for? At the simplest level (this was true for the earlier versions of Unicode) it is a 16bit character list plus a table of per-character properties (is it punctuation? is it a number? etc). Later versions referred to the original 16-bit block of 64k characters as the Basic Multilingual Plane and added another 16 planes for less frequently used characters, making a total range from 0 to 0x10FFFF.

There are two encodings in widespread use. UTF-8 is the most common, and is a a variable-length encoding with the handy property that the first 127 characters have the same meaning as US-ASCII and are encoded in one byte. UTF-16 is a "mostly 16bit" encoding; all characters in the Basic Multilingual Plane take two bytes. There is a special 2k region of the BMP called surrogates. High surrogates are 0xD800—0xDBFF and low surrogates are 0xDC00—0xDFFF. Surrogates are used in pairs (analogous to MSB and LSB) to point into the upper planes. So in UTF-16, characters outside the BMP are encoded in 32bits.

Fonts conventionally contain a special glyph, notdef, for undefined or unsupported characters. You find the ones that are supported in a given font, and display notdef for everything else.

On a microcontroller, even a Teensy 3.6, you are never going to use a font covering the entirety of Unicode, nor will you have access to a text shaping engine that handles complex writing systems such as Arabic. However, there is an appropriate middle ground between full Unicode support and the basic, C-like "a character is 8bits, there are 256 characters and characters depend entirely on the font". The main implementation detail is dealing with the fact that your font will be sparse - most characters won't have a corresponding glyph in the font. You find the ones that are supported, and display notdef for everything else.

It is probably easier (for people using the fonts in microcontroller projects) if an existing Unicode encoding is used, and easier for you if it is a fixed length. Which suggests using UTF-16 and ignoring surrogates for now. That way, people can still edit text that they want to display, and it will look correct in their editor. Maybe you could use the high byte of the UTF-16 encoded value as an index into an array of supported blocks, assuming that language support is broken down into blocks of 256 characters represented by the low byte. If the value is zero then that block is not supported by the font, otherwise it tells you which table of 256 codepoints to use.
 
While there is no support for unicode tables, if someone helps with an explanation of how it works,

The sad reality is this part was never implemented.

The general idea was to allow any number of extra characters to be added onto the data section. A third index section (really, a sorted list of 16 bit unicode values & their data index) was planned, but never implemented. When encoding the font data, the idea was to allow a user to choose which characters they want.

If you're going to add this to your font tool, maybe I ought to get on this... actually put it in the code?
 
However, ILI9341 does currently implement 2 index ranges, even though pretty much all the published font files have only one.

The second index range is meant to allow any continuous subset of the ISO8859-1 characters to be encoded. For example, you might start with 192 and end at 255 to get all the letters with accents. Or maybe start at 161 if you also want to include the special symbols, like the upside down exclamation point. Or maybe use just 224-255 if you only need lowercase letters with accents. The idea is to make publishing a font easy with two continuous ranges, for ASCII and ISO8859-1.

This 2nd index works the same way as the main one. You fill in the index2_first & index2_last fields with the first and last values. Then just add the encoded bitmaps onto the end of the big data array.

In other words, if you're encoding a font and you want to include chars 33 to 127 and 192 to 255, when you finish 127, just start making the bit data for 192 and add it right onto the end of the data array, and add their index positions within the big array onto the index list array. ILI9341 will "know" that char 192's data comes immediately after 127's data because you put 127 into the index1_last field and 192 into the index2_first field.

Here's the code in drawFontChar which the logic.

Code:
        if (c >= font->index1_first && c <= font->index1_last) {
                bitoffset = c - font->index1_first;
                bitoffset *= font->bits_index;
        } else if (c >= font->index2_first && c <= font->index2_last) {
                bitoffset = c - font->index2_first + font->index1_last - font->index1_first + 1;
                bitoffset *= font->bits_index;
        } else if (font->unicode) {
                return; // TODO: implement sparse unicode
        } else {
                return;
        }

Here you can also see the unimplemented "sparse unicode", which was meant to allow encoding higher than 255. The idea was a 3rd array of unicode values & bit index numbers. The intention was to someday add code to do a binary search of that list. If the unicode number could be found, then then "bitoffset" for that char and used to get the bits from the index list, the same as if it has been found in the first 2 ranges. So from an encoding point of view, you could choose any sparse collection of greater-than-255 chars and just tack them onto the big bit-encoded data array.

Of course, the other necessary part never did was UTF8 parsing of the strings. But that's pretty easy and the code already exists in the USB keyboard. If you're really going to add support for unicode chars, please ping me when you're using the 2nd index range. At the very least I can update ILI9341_t3 to parse UTF8 so people can conveniently print those 161-255 chars.

If that works out and there's interest in chars beyond 255, I can also work on the missing sparse unicode stuff. But first let's focus on just getting up to 255 supported well with UTF8 strings.

Can you try to make a font file using the 2nd index and fill it with at least the accent chars 192-255? That would really help me to add and test the UTF8 support.
 
Then the question is removed, in the editor the second block of symbols is supported by this logic. The only difference from the font format - in the editor there is a total number of characters and the number of characters of the second block from the end of the array. When reading and saving a font file, the character codes of the second block will begin with the first code set for the second block. And even the cells coloring will change for the second block) Here is an example of a font with a second block of characters for the encoding Win-1251, letters of the Russian language without the letter Ё ё. The codes for this letter are 168 and 184 outside the range 192-255, in Russian it can be neglected by replacing the letter E.

font_keyrus_1251

For All: To create this font, I opened the binary dos-font keyrus.fnt attached to the editor specify the width 8, height 16, first code 0, count 256. Made a change of the encoding from DOS-866 to Win-1251 by column 353. For other languages, you will need to create symbol mapping tables on the sheet Encoding. Selecte and delete characters rows from 0 to 31 and from 127 to 191. Then save the document so that the Excel recalculates the last row of the document. The last row can be seen by pressing the combination Ctrl + End. The program considers the total size of the array before this row. In the next version I will correct this and these empty rows will not affect the size of the array. Another way to get rid of these rows is to select and delete them. Then in the settings change the code of the first character to 32, the total counts of characters to 159, the first code of the second block 192, the count of characters of the second block 64. Check "refresh on exit" and click OK. Visually check how the codes of blocks are set and save C file.
You can do a similar action by importing any BDF file.

PS To store a separate table of regional symbols in unicode not seen the sense, easier and requires less memory use the function that performs this in the code. A link to an example of such a function for the Russian language is on the page with the editor.
 
Last edited:
Status
Not open for further replies.
Back
Top