tgx: a 2D/3D graphics library for Teensy.

vindar

Well-known member
Hello,

I have been working on a new graphics library for a few week: tgx - Tiny/Teensy GraphicX library available here: https://github.com/vindar/tgx.

With the advent of powerful MCU like Teensy 4 which has lots of RAM, it becomes possible to use memory framebuffers... and that it what this library does: it just creates images on RAM. Then you can choose whatever method you prefer to send the image to a display. As such, the library does care about hardware and it is cross-platform. It should work with any 32bit MCU/CPU. For instance, it works well on my Ryzen 7 (which is perfectly useless :p). However, the library has been optimized with T4/T4.1 in mind so this is where it really shines. The library supports "tile rendering" which enables to reduce the framebuffer size so it should theoretically also be usable on T3.5/T3.6 but I have not tried it yet...

I know that are already some nice libraries out there. For example Adafruit GFX Canvas and KurtE's ILI9341_t3n for 2D drawing and mjs513 pseudo-OpenGL for 3D but they are all lacking some features I wanted. There is also JarkkoL 3D graphics library which looks amazing but i believe it is not publicly available. Anyway, I always wanted to write my own triangle rasterizer so this was a good excuse to reinvent the wheel one more time...

The library contains methods for both 2D and 3D graphics. Here is the sales pitch:

2D graphics

  • Support for multiple color types: RGB16, RGB24, RGB32, RGBf. Every 2D/3D drawing operation is available for each color type.
  • Template Image class that encapsulates a memory framebuffer and enables the creation of sub-images (i.e. views) that share the same buffer. This provides an elegant and efficient way to clip of all drawing operations to a particular region.
  • API (mostly) compatible with Adafruit's GFX library, but with more primitives and should be faster in most cases.
  • Support for Adafruit's fonts as well as PJRC's ILI9341_t3 v1 and v2.3 (anti-aliased) fonts.

3D graphics

  • Heavily optimized "pixel perfect" triangle rasterizer with 8bits sub-pixels precision.
  • Depth buffer testing.
  • Flat and Gouraud shading.
  • Phong lightning model with separate ambient/diffuse/specular color components (currently only one directional light source).
  • Per object material properties.
  • Perspective-correct texture mapping.
  • Perspective and Orthographic projection supported.
  • Optional backface culling.
  • Tile rasterization: it is possible to render only part of the viewport at a time to save RAM by using a smaller image and a smaller zbuffer.
  • Templates classes for all the needed maths: Vec2, Vec3, Vec4 (vectors), Mat4 (4x4 matrix) and Box2 (2D box).
  • Optimized mesh data format: meshes and textures can be read directly from flash memory to save RAM.
  • Python scripts for easy conversion of texture images and 3D meshes (in Wavefront's .obj format) into C files that can be directly imported into an Arduino project.

For the time being, there is no standalone documentation for the library but all the header files are heavily commented and there is a docstring attached with all public methods to explain its purpose and usage. Also, the examples show how to use most features. It is still very much a work in progress so I expect there are many bugs to be found but I think it is already usable. Please let me know if you find bugs, thanks !


Here are a few videos that show what the library can do. I am sorry for the poor video quality and I blame my crappy cellphone for the colors all washed out. Trust me, it looks my better in reality :)
Everything below is running on a T4.1 and I am using my ILI9341_T4 https://github.com/vindar/ILI9341_T4 to push the memory framebuffer to the ILI9341 display. SPI is running at 30Mhz which means that the theoretical maximum framerate when redrawing the whole screen at each frame is 25FPS. Here, using differential updates makes a impressive improvement as you can see with the FPS counter on the top right corner of the screen that shows the real number of frames rendered and pushed to the screen each second.

(1) Basic test models Comparison between flat vs Gouraud shading (Gouraud here means that colors and not the normals are interpolated to get the pixels colors, however the color on the triangles vertices are computed using Phong's shading/lightning model).


(2) Testing texture mapping. Textures are just Image object that can be stored in RAM or flash.

 
(3) Happy Buddha 20K vertices running with SPI down to a puny 20Mhz. The video does not really shows it but it runs at a fixed 30FPS with vsync enables (so completely tear free).


(4) Complex models Here are a few models that are displayed one after the other. All the models meshes/textures are stored in flash (90% of the 8MB of flash is used!). In this video, textures are loaded into external ram (soldered on the back) as reading extram is a bit faster than flash when access is not linear (cache miss I guess). However, it is possible to draw all models directly from flash with the framerate dropping just a few FPS.


I download all these models from websites offering 'free 3D models'. AFAIK, they can be freely used for non-commercial purposes so they are available on the example folder of the library but I will of course remove them is someone claims copyright.

By the way, the /tools folder of the library contains two Python 3 scripts:

  • obj_to_h : convert an .OBJ 3D model to a Mesh3D object in a .h file so it can be used directly inside a project.
  • texture_2_h : convert a image (jpg, png...) to an Image object in a .h file to import image or textures.

The scripts were not very carefully written and parsing of obj file does not follow all the file format specs but they serve their purposes in most cases.
 
(5) 'Screaming waves' Finally, a last example where the geometry of the object is changed at runtime. The sheet is made of about 4K triangles whose heights are recomputed at each frame to create traveling waves...

 
@vindar
Very impressive results. Really appreciate the work you put into the library and making it available for public consumption. I really need to find the time to get back in 3d graphics :)

Again nice work.
 
@mjs513 Thanks ! In fact, I used your pseudo-opengl library as the starting point for the implementation the Mat4 matrix class. That was very useful :)
 
@vindar. Thank you but I have to give credit where credit is due - I got it from here: https://songho.ca/opengl/gl_matrix.html. I like repurposing stuff as well :)

EDIT: BTW on the ILI9341 you might be able to push the SPI speed up some if you are using short wires - people have had it running at 60Mhz with success :)
 
Hello,

I am upping this old post to mention that I have made some significant improvements to the library: all 2D drawing methods now support alpha-blending. I have also added several drawing primitive and I have made a few speed optimizations.

I also got the opportunity to test the library on Teensy 3.6 and Teensy 3.5. It is obviously not as speedy as on a Teensy 4 but it still works quite well.

Here is an example running on a T3.6 with a ST3375 screen (using the teensy-optimized ST3375_t3 library). The textures and meshes are stored in flash. The same code runs about 30% slower on a T3.5.

 
Hi,

I made some improvement to the library which now supports blitting rotated and rescaled sprites with bilinear filtering. If anyone is interested in creating a gauge, a counter or a clock, then these methods can be pretty handy (especially when used with sprites with a transparent color channel).

There is now also a Python script included with the library that converts images to .cpp/.h files with a given color format (RBG565, RGB24 or RGB32).

Here is an example running on a T4 which consists of 3 images rescaled and rotated. The clock image is stored in RGB565 color format whereas the hands images are stored in RGB32 format so they have a alpha channel which is used when blitting.

The display is an ILI9341 screen with SPI clocked at 40mhz.

 
Hi everyone,

I am bumping this thread to let you know that I have made improvements to my 2D/3D library. In particular, concerning the 3D part of the lib:

  • Added new texturing modes (wrap / clamp to edges).
  • Shaders are now enabled/disabled separately at compile time (via templating) for reduced memory consumption and improved speed.
  • Correct triangle clipping.
  • New primitives for creating sky-boxes, cubes and spheres on the fly, with adaptive meshing (e.g. depending on the object size).
  • New primitives for drawing dots and wire-frame objects...

The library is now quite stable and usable although I am sure there are still many bugs to be found.

Here is a demo animation that takes advantage of several new features. I am sorry for the poor video quality. It really looks better when you look at it directly. You can try it, it is located examples folder of the lib :).

This is running on a Teensy 4.0, what a beast !

 
I would love to check out this library!

On first try, I get this error when compiling :

Arduino/libraries/tgx-main/src/Mesh3D.inl:301:42: error: 'extmem_free' was not declared in this scope
extmem_free((void *)p);

and

Arduino/libraries/tgx-main/src/Mesh3D.inl:310:41: error: 'extmem_free' was not declared in this scope
extmem_free(_vals[k]);

and

Arduino/libraries/tgx-main/src/Mesh3D.inl:333:45: error: 'extmem_malloc' was not declared in this scope
adr = extmem_malloc(size);

Any suggestions?
Best!
 
Hello,

I have seen report about this error before: https://github.com/vindar/tgx/issues/1

I think the problem may be that you are using an older version of Teensyduino. Can you try installing the latest Teensyduino to see this solves it ?


I would love to check out this library!

On first try, I get this error when compiling :

Arduino/libraries/tgx-main/src/Mesh3D.inl:301:42: error: 'extmem_free' was not declared in this scope
extmem_free((void *)p);

and

Arduino/libraries/tgx-main/src/Mesh3D.inl:310:41: error: 'extmem_free' was not declared in this scope
extmem_free(_vals[k]);

and

Arduino/libraries/tgx-main/src/Mesh3D.inl:333:45: error: 'extmem_malloc' was not declared in this scope
adr = extmem_malloc(size);

Any suggestions?
Best!
 
This looks really useful, I would try it with your ILI9341 t4 library, but my PCB is already set up for the t3 version and i would have to take my module apart and cut traces. will it still work with ILI9341 T3? the main thing I am interested in is alpha blending.
 
Hi,

Yes, the tgx library is independent of the screen (an driver) used. All the library does is creating the graphics inside a memory framebuffer and then you choose which ever method you prefer to display the result.

In your case, if you are using a Teensy 4 and an ILI9341 screen with hardwired screen CS to pin 10 of the teensy, then indeed you cannot use my ILI9341_T4 library to drive the display but the default ILI9341_t3 library will work just fine.

However, I suggest you use instead Kurte's ILI9341_t3n library which is a drop-in replacement for ILI9341_t3 but can also use DMA to do asynchronous transfer. While the results will still not be as smooth as with ILI9341_T4, it will provide a substantial speed boost since you can draw the next frame while the current one is being uploaded to the screen.

Setting tgx with these libraries is pretty straightforward but I can write a code snippet if you need.
 
That's good to hear, thanks! A simple example set up with t3/t3n with a few 2d primitives and text using alpha blending would be a big help getting people started.

another question, can you mix tgx with ILI9341_t3(n) commands? Im wondering that because someone might want to add it to existing code.
 
That's good to hear, thanks! A simple example set up with t3/t3n with a few 2d primitives and text using alpha blending would be a big help getting people started.

another question, can you mix tgx with ILI9341_t3(n) commands? Im wondering that because someone might want to add it to existing code.

Yes, you can freely mix tgx with any other library since it does not access any peripheral hardware. The only thing tgx does is to write inside a tgx:Image which is basically a just buffer in RAM...

Here is a very basic example combining tgx with Kurte's ILI9341_t3n library. The animation we want to display is first drawn in memory using tgx methods and then pushed to the screen using the ILI9341_t3n lib. Here we use double buffering to prevent flickering.

Code:
#include <SPI.h>
#include <ILI9341_t3n.h>
#include <tgx.h>
#include <font_tgx_Arial_Bold.h>

// we assume that the display is connected on SPI0
#define TFT_CS  10
#define TFT_DC  9
#define TFT_RST 6
#define SPI_SPEED 30000000

ILI9341_t3n tft = ILI9341_t3n(TFT_CS, TFT_DC, TFT_RST);  

// let's use double buffering (each framebuffer occupies 150Kb in DMAMEM)
DMAMEM uint16_t fb1[320 * 240];
DMAMEM uint16_t fb2[320 * 240];

tgx::Image<tgx::RGB565> im(fb1, 320, 240);  // the tgx image pointing to the framebuffer we are currently drawing on

// display the content of im and swap the frame buffers (update is done async via DMA).
void displayAndSwapFB()
    {
    while (tft.asyncUpdateActive()) { yield(); } // wait for previous screen update to complete. 
    if ((uint16_t *)im.data() == fb1)
        { // we were drawing on fb1
        tft.setFrameBuffer(fb1);
        im.set(fb2, 320, 240);
        }
    else
        { // we were drawing on fb2
        tft.setFrameBuffer(fb2);
        im.set(fb1, 320, 240);
        }
    tft.updateScreenAsync(false); // start new update; 
    }

void setup() 
    {
    Serial.begin(9600);
    tft.begin();
    tft.setRotation(1);
    tft.useFrameBuffer(true);
    }


elapsedMillis em; 

void loop()
    {
    const float t = (em % 10000) / 10000.0f;

    // erase the image
    im.fillScreen(tgx::RGB565_White);


    // draw a fixed ellipse with solid colors (red outline and green interior)
    im.fillEllipse({ 230, 120 }, { 70,120 }, tgx::RGB565_Green, tgx::RGB565_Red);

    // draw a moving text
    im.drawText("Hello World!", { t * 100, t * 290 }, tgx::RGB565_Orange, font_tgx_Arial_Bold_32, true,0.7f);

    // draw a moving gradient triangle
    tgx::fVec2 center = { 150, 120 };
    const float c = cosf(t * M_PI * 2);
    const float s = sinf(t * M_PI * 2);
    const tgx::fVec2 P1 = center + tgx::fVec2{180 * c, 180 * s};
    const tgx::fVec2 P2 = center + tgx::fVec2{ 80 * c, -80 * s };
    const tgx::fVec2 P3 = center + tgx::fVec2{ -110 * c, 110 * s };
    im.drawGradientTriangle(P1, P2, P3, tgx::RGB565_Green, tgx::RGB565_Blue, tgx::RGB565_Red, 0.4f);
    
    // display the image on the screen
    displayAndSwapFB(); 
    }

With SPI set at 30Mhz, this gives about 24FPS, limited by the time it takes to upload full frames to the screen...

This is not a bad frame rate but let me just (shamelessly) mention that, if you can swap the CS and DC pin and use my ILI9341_T4 library, then the same animation runs smoothly at 120FPS without any screen tearing and with the SPI bus clocked at only 15Mhz using the code below:

Code:
#include <ILI9341_T4.h>
#include <tgx.h>
#include <font_tgx_Arial_Bold.h>

#define SPI_SPEED 15000000

#define PIN_CS      9
#define PIN_DC      10
#define PIN_RESET   6

#define PIN_SCK     13
#define PIN_MISO    12
#define PIN_MOSI    11


// the screen driver object
ILI9341_T4::ILI9341Driver tft(PIN_CS, PIN_DC, PIN_SCK, PIN_MOSI, PIN_MISO, PIN_RESET);

ILI9341_T4::DiffBuffStatic<6000> diff1;
ILI9341_T4::DiffBuffStatic<6000> diff2;

uint16_t internal_fb[320 * 240];
uint16_t fb[320*240];

tgx::Image<tgx::RGB565> im(fb, 320, 240);


void setup() 
    {
    Serial.begin(9600);
    tft.output(&Serial);
    while (!tft.begin(SPI_SPEED))
        {
        Serial.println("Initialization error...");
        delay(1000);
        }
    tft.setRotation(3);                 // portrait mode 240 x320
    tft.setFramebuffers(internal_fb);   // set 1 internal framebuffer -> activate double buffering.
    tft.setDiffBuffers(&diff1, &diff2); // set the 2 diff buffers => activate differential updates. 
    tft.setDiffGap(4);                  // use a small gap for the diff buffers
    tft.setRefreshRate(120);            // around 120hz for the display refresh rate. 
    tft.setVSyncSpacing(1);             // set framerate = refreshrate/2 (and enable vsync at the same time). 
    }


elapsedMillis em; 

void loop()
    {
    const float t = (em % 10000) / 10000.0f;

    // erase the image
    im.fillScreen(tgx::RGB565_White);

    // draw a fixed ellipse with solid colors (red outline and green interior)
    im.fillEllipse({ 230, 120 }, { 70,120 }, tgx::RGB565_Green, tgx::RGB565_Red);

    // draw a moving text
    im.drawText("Hello World!", { t * 100, t * 290 }, tgx::RGB565_Orange, font_tgx_Arial_Bold_32, true,0.7f);

    // draw a moving gradient triangle
    tgx::fVec2 center = { 150, 120 };
    const float c = cosf(t * M_PI * 2);
    const float s = sinf(t * M_PI * 2);
    const tgx::fVec2 P1 = center + tgx::fVec2{180 * c, 180 * s};
    const tgx::fVec2 P2 = center + tgx::fVec2{ 80 * c, -80 * s };
    const tgx::fVec2 P3 = center + tgx::fVec2{ -110 * c, 110 * s };
    im.drawGradientTriangle(P1, P2, P3, tgx::RGB565_Green, tgx::RGB565_Blue, tgx::RGB565_Red, 0.4f);
    
    // display the image on the screen
    tft.update(fb);
    }
 
That was a big help, thanks! I got it working with t3n, single buffer for now with 50mHz SPI, its getting the same FPS as before without any TGX primitives, (36FPS which is quite acceptable, its just a 2D interface) and a few alpha blended objects dont slow it down much as long as they aren't too large. I may use double buffer later, but i'm not yet sure if i will need that memory for the rest of the project.
 
@vindar - is there an example of configuring your 3D renderer to use just a single non-DMA partial framebuffer? I'm interested to try this on my 800*480 16 bit parallel screen which, given the size and the nature of using GPIO to drive the 16 bit interface, uses a partial non-DMA buffer to build up the display.
 
Yes, the 3D renderer is part of the TGX library which is completely independent of the screen/graphic driver used. All the renderer does is to draw graphics in regular RAM and then it is up to you to decide what you want to do with it.

There a two mechanisms that may help you achieve what you want:

1. the 3D renderer draws into a tgx:Image<color_t> but this is nothing more than a plain C array of pixels colors with a given stride. This means that is you have a memory buffer of size LX*LY in memory that represent your screen, then you can create a tgx::Image that maps exactly to the portion [i, i + u]x[j, j + v] of the buffer by setting the image pointer to &buffer[i + LX*j], the dimension of the image to (u,v) and its stride to LX. In this way, you can perform the 3D rendering only on part of your screen, independently of its full size.

2. Furthermore, the 3D renderer also allows you to draw only part the 3D viewport at a time. In this way, you can draw a large viewport using multiple call (and as an added benefit also using a smaller z-buffer). For example, you may want to draw it in 4 calls, each call drawing only 1/4th of the viewport. Then you may upload that image to it position on the screen and then go n drawing the other ones... To do this, you should use the setOffset() and setImage() of the renderer3D class.

Finally, you can combine both method 1. and 2. to draw only part of the screen but with multiples calls to the renderer... It is quite flexible really !

If you tell me exactly how you upload pixels to your screen (by tile, line by line, ...) I may write a short snippet of code. I never tried the library on a Teensy with a screen larger the 320x240 so I am curious to see how it performs :)
 
If you tell me exactly how you upload pixels to your screen (by tile, line by line, ...) I may write a short snippet of code. I never tried the library on a Teensy with a screen larger the 320x240 so I am curious to see how it performs :)

Great, sounds awesomely flexible! I tried with a full 480*800 buffer with all storage in EXTMEM, and "buddha" worked, managing 6.3 FPS :) "mars" also worked (slowly as expected) using fullscreen EXTMEM buffer. Sample code would be awesome to break it into chunks, if you could provide. This is how I sent using a fullscreen EXTMEM buffer:
Code:
    // update the screen
    uint16_t *pBuf = fb;
    uint16_t *pBufEnd = fb + (SLX * SLY) - 1;  
    tft.setAddrWindow(0, 0, SLX - 1, SLY - 1);
    tft.pushPixels16bit(pBuf, pBufEnd);

pushPixels16bit basically loops through the buffer and pushes the 16 bit data pixel by pixel to the screen via GPIO6, and then returns. The buffer size is expected to be N full lines of WIDTH pixels, for whatever value of N will fit in "regular" memory :)
 
Last edited:
Here is an (untested) snippet of code for drawing a 400x700 viewport centered on a 480x800 screen (in portrait mode) in two passes. First we draw and upload the upper half of the viewport then we do the same with the lower half.

The buddha demo requires the following changes:

1. Defining the image buffer and zbuffer
Code:
// screen size
const int SLX = 480;
const int SLY = 800;

// viewport size
const int VLX = 400;
const int VLY = 700;

// buffer large enough for half a viewport. Size VLX x (VLY/2) uses 274kb of RAM (in DTCM, fastest memory). 
uint16_t fb[VLX * VLY/2];  

// z-buffer with the same size (274kb in DMAMEM, slower than DTCM but faster than EXTMEM)
DMAMEM uint16_t zbuf[VLX * VLY/2];

// image using buffer 'fb' that the renderer will draws into. 
Image<RGB565> im(fb, VLX, VLY/2);

2. Drawing the mesh and uploading to the screen (note that we must now clear the screen and zbuffer before each draw call and not a the beginning of the loop).
Code:
// draw the first half
im.fillScreen(RGB565_Blue); // erase the image
renderer.clearZbuffer(); // and the zbuffer
renderer.setOffset(0, 0); // set the offset inside the viewport for the upper half
renderer.drawMesh(buddha_cached, false);   // draw that part on the image    
tft.setAddrWindow( (SLX - VLX)/2 , (SLY - VLY)/2, VLX, VLY); // I assume here setAddrWindow() take parameters (x,y, w, h) right ? But then why is there a -1 in you example ? 
tft.pushPixels16bit(fb, fb + (VLX * VLY)); // should that be (VLX * VLY) - 1 instead ?


// draw the second half
im.fillScreen(RGB565_Blue); // we must erase it
renderer.clearZbuffer(); // and do not forget to erase the zbuffer also
renderer.setOffset(0, VLY); // set the offset inside the viewport for the lower half
renderer.drawMesh(buddha_cached, false);   // draw that part on the image
tft.setAddrWindow((SLX - VLX) / 2, SLY / 2, VLX, VLY);
tft.pushPixels16bit(fb, fb + (VLX * VLY));

Drawing the viewport with multiple calls as above permits to use less memory and so we can put the buffers in faster memory region. However, this approach also has its drawback because, for each call, the renderer must iterate over all the triangles of the mesh to see which ones should be drawn and which one are clipped/discarded... Therefore, the speed up obtained by using a faster memory can be lost because of the increase in the number of triangle to iterate over. I think that splitting the viewport instead of using EXTMEM will be efficient when the screen is large but the mesh has relatively few triangles (maybe 5K like 'bunny" in the texture example) but it may not prove very good for detailed mesh like 'buddha' (20K).

A solution for large meshes would be to split it into is many smaller sub-meshes. Then the renderer will discard the sub-meshes whose bounding box do not intersect the image being drawn without having to loop over its triangles. But splitting a mesh in sub-meshes like this is a tedious process...
 
Back
Top