tgx: a 2D/3D graphics library for Teensy.

Drawing the viewport with multiple calls as above permits to use less memory and so we can put the buffers in faster memory region. However, this approach also has its drawback because, for each call, the renderer must iterate over all the triangles of the mesh to see which ones should be drawn and which one are clipped/discarded... Therefore, the speed up obtained by using a faster memory can be lost because of the increase in the number of triangle to iterate over. I think that splitting the viewport instead of using EXTMEM will be efficient when the screen is large but the mesh has relatively few triangles (maybe 5K like 'bunny" in the texture example) but it may not prove very good for detailed mesh like 'buddha' (20K).

A solution for large meshes would be to split it into is many smaller sub-meshes. Then the renderer will discard the sub-meshes whose bounding box do not intersect the image being drawn without having to loop over its triangles. But splitting a mesh in sub-meshes like this is a tedious process...

Thanks for this! I went with this loop in the end, which works for even multiples of the screen height:
Code:
static const int SLX = 480;
static const int SLY = 800;
static const int chunks = 4;
static const int chunkSize = SLY / chunks;

// main screen framebuffer 
uint16_t fb[SLX * chunkSize];

// zbuffer in 16bits precision
DMAMEM uint16_t zbuf[SLX * chunkSize];           

// image that encapsulates fb.
Image<RGB565> im(fb, SLX, chunkSize);

...

for (uint16_t y=0; y<SLY; y+=chunkSize) {
      // draw chunk
      im.fillScreen(RGB565_Blue); // we must erase it
      renderer.clearZbuffer(); // and do not forget to erase the zbuffer also
      renderer.setOffset(0, y); // set the offset inside the viewport for the lower half
      renderer.drawMesh(buddha_cached, false);   // draw that part on the image
      
      tft.setAddrWindow(0, y, SLX -1, y + chunkSize - 1);
      // update the screen
      tft.pushPixels16bit(fb,  fb + (SLX * chunkSize) - 1);
}

Works like a charm! On an SSD1963 800*480 screen over 16 bit parallel GPIO6, frame rate is a steady 13.57fps; that includes the rendering calculations and the (blocking) screen transfer time. Thanks, this is fun!

(My setAddressWindow takes x1, y1, x2, y2 rather than x1, y1, w, h, hence the -1 in the calls, and for iterating an N byte block of memory, that's contained in <start> to <start> + N-1 , hence the -1 in the pushPixels bufferEnd)
 
Good to know it is working :)

I think it should be faster using only 3 chunks instead of 4 no ?

By the way, I just push an update to the library with some breaking change: the view port dimensions are not template parameter of the Renderer3D class anymore but just regular parameters that can be set at runtime. THis porvides more flexibility but this means that previous sketches must now replace
Code:
Renderer3D<RGB565, SLX, SLY, LOADED_SHADERS, uint16_t> renderer;
with
Code:
Renderer3D<RGB565, LOADED_SHADERS, uint16_t> renderer;
and add a line during setup:
Code:
renderer.setViewportSize(SLX,SLY);
 
That was a big help, thanks! I got it working with t3n, single buffer for now with 50mHz SPI, its getting the same FPS as before without any TGX primitives, (36FPS which is quite acceptable, its just a 2D interface) and a few alpha blended objects dont slow it down much as long as they aren't too large. I may use double buffer later, but i'm not yet sure if i will need that memory for the rest of the project.

Hi,

I just want to let you know that I updated the ILI9341_T4 library so that it now accepts any pin for CS and DS (but using a hardware CS pin for DS will provide the best performance). So you can now try it with your PCB if you wish. If you are using it to display a UI, then you might see big improvements in the framerate and smoothness of the transitions compared with to the ILI9341_t3(n) libraries.
 
Thanks! i will try it.

I was wondering if there is a way to set a Viewport "clipping window" for 2D as well as the renderer? it can be quite useful.
 
Yes, it is very easy to clip any drawing operation to a given rectangular region of an image by creating a sub image referencing this region and drawing on it instead of the main image. Let me illustrate this. Say you have a 320x240 image:
Code:
tgx::Image<tgx::RGB565> im(buffer,*320,240);
And you want to draw a red circle centered at (200,120) with radius 100 but you want to clip it to the region [150,320]x[110,200]. Then you create a sub image for this region (thus sharing the same memory buffer) with
Code:
auto subim = im.getCrop({150,320,110,200});
And then draw the circle on subim but do not forget to subtract the offset of the upper left corner of the sub image to convert coordinate between im and subim:
Code:
subim.drawCircle({200 - 150, 120 - 110}, 100, tgx::RGB565_Red);
Note that creating images is very cheap, the object itself is only 16 bytes so you can create sub images whenever needed and then discard them without any performance concerns.

As you can see, this method is quite powerful because any method that draws on an image can also be clipped to any région. This is the same idea as the «view» in Numpy (the Python lib) and I think it is very elegant solution :). The only thing needed to implement it is to specify that an image has a stride that may be different from its width.
 
That is really good. I made a new image 128x64, for an oscilloscope overlay, and draw directly on to it with "persistence" done by drawing "transparent black" over it every frame, then blit it on to my main image buffer (also with transparency) and it only slowed down by about 1.2 FPS. (t3_n, single buffered). Thank you for the useful and cool library!
 
I don't know If you have done something like this already and I could not find it, or if it is too slow, or this is asking a bit much! but is there something like the layer blending modes in graphics programs like Photoshop when you do a blit? It would be useful for things like fake lighting.

Screenshot 2022-05-31 200009.jpg

https://www.youtube.com/watch?v=i1D9ijh3_-I
 
Hi,

The blitting operations that are currently implemented use only basic alpha blending (and you must call the methods with an opacity parameter as the last parameter to activate blending, otherwise colors are simply overwritten). This applies for 'blit()', 'blitMasked()' and 'blitScaledRotated()'. I do not understand much about alpha blending so I do not really know other blending modes or how they would be implemented in the library while still remaining compatible with all the color types (RGB16, RGB32, RGBf...). Most color types do not even have an alpha channel...

However, if you code a blending operator say (col1,col2, opacity) -> blend(col1, col2, opacity) which return the blending of two colors, then it is straightforward to adapt the blitting methods above to use this operation instead of the usual alpha blending.

Maybe I could create a generic 'blit()' method which takes as parameter a user-defined blending operator. This might be a reasonable solution as it will allow users to perform fancy custom blending operation without adding too much code to the library.
 
That is a nice solution! "milkdrop" here we come :D

Done :D

I just added a new 'blend()' method to the library that performs the blitting of a sprite onto an image using a custom blending operator. If it works fine, I will also later add 'blendMask()' and 'blendScaleRotated()' methods (similar to the existing 'blitMask()' and 'blitScaleRotated()' methods). The documentation for using the method is, as usual, located above its declaration (Image.h, line 876). Here is an almost complete example for performing multiplicative blending of two images:

Code:
#include <tgx.h>
using namespace tgx;

uint16_t dst_buf[320 * 240];  // buffer
Image<RGB565> dst(dst_buf, 320, 240); // image

uint16_t src_buf[200*200]; // buffer
Image<RGB565> src(src_buf, 200, 200); // sprite


/* blending operator: perform multiplicative blending of two RGBf colors */
RGBf mult_op(RGBf col_src, RGBf col_dst)  {
    return RGBf(col_src.R*col_dst.R, col_src.G*col_dst.G, col_src.B*col_dst.B);
    }

void setup() {
    // source image: draw an horizontal gradient and a filled circle
    src.fillScreenHGradient(RGB565_Purple, RGB565_Orange);
    src.fillCircle({ 100, 100 }, 80, RGB565_Salmon, RGB565_Black);

    // destination image: draw a vertical gradient
    dst.fillScreenVGradient(RGB565_Green, RGB565_White);

    // perform the blending of src over dst using the 'mult_op' operator
    dst.blend(src, { 60 , 20 }, mult_op);

   // *** add code to display dst on screen ***
    }
   
void loop() {
    }

As you can see, the blending operator does not need to have the same color type as the sprite and destination images (which can also have different color types!). Color conversion is performed automatically when needed. However, one should favor a blending operator with the same color types as the images as this will give the best performance. Here, I used the RGBf color type just because I was lazy and it was simpler to write the blending operator with floating point valued channels...

Note also that the blending operator does not need to be a function, it can be a functor object so it can have an internal state. For example, here is a (stupid) custom blending operator that depends on some parameter p:

Code:
struct MyBlend
    {
    /* Set the p parameter in the constructor */
    MyBlend(float p) : _p(p) {}

    RGBf operator()(RGBf col_src, RGBf col_dst) const
      {
      return RGBf(col_src.R*_p  + col_dst.R*(1-_p), col_src.G*(1-_p) + col_dst.G*_p , max(col_src.B,col_dst.B));
      }  

    float _p;
    };

Then, as in the example before, we can use this operator by simply calling 'blend()' with a temporary instance like so:
Code:
// perform the blending of src over dst using the 'MyBlend' operator with param 0.7
dst.blend(src, { 60 , 20 }, MyBlend(0.7f));
 
I did a little more testing and everything seems to work nicely. However, since my previous post, I have changed the name of the new method from 'blend()' to 'blit()' because, in retrospect, it is just an extension of the already existing blit() method... [so, in the code of the post above, all references to 'blend' should be changed to 'blit'].

I also extended the following 4 methods to support user-defined blending operators:

1. BlitScaledRotated()
2. copyFrom()
3. drawTexturedTriangle()
4. DrawTexturedQuad()

Just as for blit(), the blending operator can be a function/functor/lambda. Note that, in the same spirit, the 'iterate()' method also takes an operator as parameter and applies it to each pixel of the image. All the doc/details for these methods can be found in the 'image.h' header file.

For example, in order draw a 'sprite' image centered on 'im', rotated by 65 degrees, but blitting only the sprite's red channel, we can use the 'BlitScaledRotated()' method with a anonymous lambda function:
Code:
im.blitScaledRotated(sprite, sprite.dim()/2, im.dim()/2, 1.0f, 65.0f, [](RGB565 src, RGB565 dst) { return RGB565(src.R, dst.G, dst.B); });

I hope you will find these methods useful ! :)
 
I just noticed a function called "_drawCircleHelper" that appears to draw quarter circles for the rounded rectangles. is there a way to use it from the sketch? It would be quite handy for what i'm doing (drawing virtual "patch cables")
 
Hi,

The _drawCircleHelper() method is a legacy from the Adafruit GFX library and it is indeed used to draw quarter circles. It is not currently a public method but making it accessible would not a problem. However, you can already achieve the same result (and more !) by combining clipping with circle drawing methods. For example, below is a function that draws the bottom right quarter of a circle:
Code:
template<typename color_t> 
void drawBottomRightCorner(tgx::Image<color_t> & im, int x, int y, int r, color_t color)
	{
	tgx::Image<color_t> sub_im = im.getCrop({ x, x + r + 1, y, y + r + 1}); // create a sub image for clipping
	sub_im.drawCircle({ 0,0 }, r, color); // draw the circle on the sub image 
	}

I did not perform benchmarking but I expect that this code should be almost as fast as the _drawCircleHelper(). Also, with this approach you can choose whether the endpoints are drawn (they are not drawn with _drawCircleHelper() and the same result can be obtained in the method above by replacing r+1 by r).

BTW, I will certainly add methods for drawing Bezier curves and splines "soon" (for some good definition of soon..).
 
Is it possible to render objects directly from SD card or only internal/soldered flash to save ram?
Wish to know while I wait for my Teensy to arrive in the mail. :)

Also, what kind of performance boost do you see if you OC to 1Ghz?
 
Is it possible to render objects directly from SD card or only internal/soldered flash to save ram?
Wish to know while I wait for my Teensy to arrive in the mail. :)

Also, what kind of performance boost do you see if you OC to 1Ghz?

Hi,

Rendering directly from the SD card is not supported and not really feasible because mesh data is not read linearly: accessing random positions in the SD card is really too slow. However, on Teensy 4.1, you can load the mesh as needed in external ram and then display it from there.

Mesh can be stored in any other memory location on the Teensy. From experience, we have, from fastest to slowest access time:
1. mesh in DTCM (lower 512Kb of RAM) : fastest access time (but this memory is also shared with your code).
2. mesh in DMAMEM (upper 512Kb of RAM). Almost as fast as DTCM in practice.
3. mesh in EXTMEM (8MB, soldered on the back of T4.1). Much slower than DMAMEM but still reasonably efficient for storing meshes.
4. mesh in PROGMEM (internal 8MB flash). Still a bit slower, especially when doing non-contiguous access (such as texture mapping for certain orientations) but works OK.
5. mesh in EXTFLASH (16MB Flash soldered on the back T4.1). Should be similar to PROGMEM, but not tested yet.

It is also possible to mix memory locations for a given mesh. For example, putting the vertex array in fast memory (DTCM or DMAMEM) and the large texture images in EXTMEM. This approach is used in some examples of the library and gives good result. There are specific methods in the library to move (all/part) of a mesh to other memory locations. You can check the method 'copyMeshEXTMEM()' and 'cacheMesh()' in 'Mesh3D.h'.

I did not try OC to 1GHz. It should provide a boost but maybe not as much as expected because the bottleneck in 3D rendering is usually not the math/computation needed to process vertices but mostly time required to read/write to memory...
 
Hi,

Rendering directly from the SD card is not supported and not really feasible because mesh data is not read linearly: accessing random positions in the SD card is really too slow. However, on Teensy 4.1, you can load the mesh as needed in external ram and then display it from there.

Mesh can be stored in any other memory location on the Teensy. From experience, we have, from fastest to slowest access time:
1. mesh in DTCM (lower 512Kb of RAM) : fastest access time (but this memory is also shared with your code).
2. mesh in DMAMEM (upper 512Kb of RAM). Almost as fast as DTCM in practice.
3. mesh in EXTMEM (8MB, soldered on the back of T4.1). Much slower than DMAMEM but still reasonably efficient for storing meshes.
4. mesh in PROGMEM (internal 8MB flash). Still a bit slower, especially when doing non-contiguous access (such as texture mapping for certain orientations) but works OK.
5. mesh in EXTFLASH (16MB Flash soldered on the back T4.1). Should be similar to PROGMEM, but not tested yet.

It is also possible to mix memory locations for a given mesh. For example, putting the vertex array in fast memory (DTCM or DMAMEM) and the large texture images in EXTMEM. This approach is used in some examples of the library and gives good result. There are specific methods in the library to move (all/part) of a mesh to other memory locations. You can check the method 'copyMeshEXTMEM()' and 'cacheMesh()' in 'Mesh3D.h'.

I did not try OC to 1GHz. It should provide a boost but maybe not as much as expected because the bottleneck in 3D rendering is usually not the math/computation needed to process vertices but mostly time required to read/write to memory...

Thank you kind sir. If you do get any benchmarks on EXTFLASH, please do share.

It looks like we can do approx 0.5Million poly/sec at 30+ FPS on a Teensy.
Would like to see if tgx largely Drawcall, Geometry or Fillrate bound on this chip?

I am getting a Teensy 4.1 with 8MB PSRAM and 256MB EXTFLASH to build a low poly PS1 style game.
Wondering if that will do me better compared to one with 16MB of PSRAM and relying on SD Card to dynamically load/unload textures without IO spikes.
 
That is very cool, ill wait for that instead!

Hi,

Done :) I finally added methods for drawing Bezier curves...

The declarations (with the associated docstrings) can be found around line 1350 in Image.h. More precisely, the new methods are:

  • drawQuadBezier(). Draw a quadratic rational Bezier curve between 2 points with 1 control point.
  • drawCubicBezier(). Draw a cubic Bezier curve between 2 points with 2 control points.
  • drawQuadSpline(). Draw a quadratic b-spline passing through a given set of points.
  • drawCubicSpline(). Draw a cubic b-spline passing through a given set of points.
  • drawClosedSpline(). Draw a closed (quadratic) b-spline passing through a given set of points.

Next I will add a 'flood fill' method in order to easily color the interior of shapes created with these drawing primitives...
 
OK, I have tried beziers, they are very nice, thanks!


Do you think there could be versions of them that draws a anti aliased/thick line using "drawSpot" instead of pixels, (or however it is you do the drawWideLine) or would it be too slow?
 
OK, I have tried beziers, they are very nice, thanks!


Do you think there could be versions of them that draws a anti aliased/thick line using "drawSpot" instead of pixels, (or however it is you do the drawWideLine) or would it be too slow?

Very nice !

Drawing thick curves is challenging and I honestly do not know how to achieve in an efficient manner... However, it should not be difficult to create anti-aliased versions of the spline/bezier methods (but only single pixel wide). I will probably do that next.

BTW: I just added 'fill()' methods that allows seed filling a region of the image (the declarations can be found around line 1050 in Image.h).
 
Do you think there could be versions of them that draws a anti aliased/thick line using "drawSpot" instead of pixels, (or however it is you do the drawWideLine) or would it be too slow?

The problems I see with simply replacing drawPixel() by drawSpot() are:

1. Each pixel will be written about r times (for a line of thickness r) so it will be quite inefficient.
2. Doing so will mess with alpha/blending and opacity.

However, it is indeed a workable temporary solution is transparency is not needed. I will test it.
 
I have a quick question about blitting part of an image on to another.

my main image is "im" and i have another called "ssc" which has a "list" of images drawn while data is loaded from SD card. (I did confirm the images were drawn on ssc by blitting the entire thing on a blank im)

Code:
 im.blit(ssc.getCrop(tgx::iBox2(0,U8_patch[8][sSN_oride]*33,220,33)), {8,192});

if i take out the "getcrop" it works, but only shows the top image, but once i add in the "getcrop" which is supposed to move the 220x33 box down it compiles, but there is no blit.

I expect i misunderstood what it says on line 632 of image.h

"
Code:
 * Remark: If only part of the sprite should be blit onto this image, one simply has to create a
         * temporary sub-image like so: 'blit(sprite.getCrop(subbox),upperleftpos)'. No  copy is
         * performed when creating (shallow) sub-image so this does not incur any slowdown.
"
 
[I deleted my previous message by mistake as I was trying to edit it so I am posting it again...]

Hi,

I think the problem is that a box/region is defined with the constructor iBox2(minx, maxx, miny, maxy) and not iBox2(minx, maxx, lx, ly) as you can check in Box2.h around line 120. So I believe you should write instead:
Code:
im.blit(ssc.getCrop(tgx::iBox2(0,219, U8_patch[8][sSN_oride]*33, U8_patch[8][sSN_oride]*33 + 32)), {8,192});
Notice the +219 and +32 because the sub-box [minx,maxx]x[miny,maxy] is closed (i.e. contains the boundary). It may feel counterintuitive to define a box this way instead of specifying the upper left corner and size but it turns out it is very convenient/natural in many cases...

Also, I am in the process (and nearly done) rewriting all the 2D graphics primitives! In the new version of the library, each drawing primitive has at least two methods: one that does fast drawing and one that is slower but performs high quality rendering (with anti-aliasing, sub-pixel precision and adjustable thickness). For example, for drawing a triangle, we now have:
  1. drawTriangle() which draws a simple triangle (same as before).
  2. drawSmoothThickTriangle() which draw an anti-aliased triangle with vertex position given by floating point values (i.e. with sub-pixel precision that is useful for smooth animations) and with a given line thickness !
And all methods support alpha-blending/opacity multiplier

The drawback of this new version of the library is that it will create a few unavoidable breaking change in the API so previous code may have to be altered a bit to compile but I think it is really worth it...

This new version of the library is located in the 'improved-drawing-primitive' branch on github but is not in a really usable state as is. However, I hope to merge it with the main branch within a week or so.


I have a quick question about blitting part of an image on to another.

my main image is "im" and i have another called "ssc" which has a "list" of images drawn while data is loaded from SD card. (I did confirm the images were drawn on ssc by blitting the entire thing on a blank im)

Code:
 im.blit(ssc.getCrop(tgx::iBox2(0,U8_patch[8][sSN_oride]*33,220,33)), {8,192});

if i take out the "getcrop" it works, but only shows the top image, but once i add in the "getcrop" which is supposed to move the 220x33 box down it compiles, but there is no blit.

I expect i misunderstood what it says on line 632 of image.h

"
Code:
 * Remark: If only part of the sprite should be blit onto this image, one simply has to create a
         * temporary sub-image like so: 'blit(sprite.getCrop(subbox),upperleftpos)'. No  copy is
         * performed when creating (shallow) sub-image so this does not incur any slowdown.
"
 
thank you, that worked !
I tried a few things but none of them were x1,x2,y1,y2.

Looking forward to try the new version.
 
Back
Top