3D Rendering on Teensy


Well-known member
Had some fun porting my tiled software rasterizer to Teensy & ILI9341 :)

Cheers, Jarkko
Looks really good. Interesting note is that you are using 4k triangles. Didn't think there was enough space so would be interesting to see the code as well. I did mange to get Michael Rule's Arduino 3d code ported over which you can see in this video: https://vimeo.com/150386845, works even better on a Teensy 4.
This 3D model is taking only 37kb of memory and stored in MCU flash so it's not an issue at all. I have a tool which processes and compresses 3D objects with custom vertex formats so that they can be embedded to your program and be directly rendered from the program memory. The model is broken into meshlets and vertices processed in batches (max 64/batch) to reduce vertex transform cost. There's also vertex cache to reduce retransforms across tiles but I had it disabled for this demo.

I have thought of releasing the 3d graphics lib at some point, but it needs some more work. For example I like to implement DMA transfer of tiles to ILI9341, add texture sampling, visibility cone culling, some other further optimizations, some cleanup & restructuring, etc.
Indeed impressive. Looking forward to see how you implemented it.

Wait till you get yourself a T4.
Yep DMA support makes a big difference as well as frame buffering which most of the current display libraries support. Inspired me to keep working on my merging of two libraries highly modified opengl (v1) for the T4 and the Arduino3D rendering libs that I posted previously. While not as good as yours I did manage to get the beginning of shading incorporated but still need to work on back face culling, etc. Heres a video I just posted showing a meshlab teapot imported into the lib (only 256 faces for this test):


To start I would like to say that I am impressed by the work you have provided.
Do you think it would be possible to adapt it for the gameduino 3x and more generally for the FT81x and BT81x chips?

They have a rather particular functioning but it has the advantage of managing large resolutions and having a very good 2D acceleration.
That said it does not (or almost not) 3D rendering.
@mjs513 Very cool! Nice to have other 3D graphics enthusiast working on Arduino as well :) I got the lib working on T4 via synchronous SPI transfer, but I have to port the DMA code to T4 as well. Should be trivial with code references from KurtE's graphics lib.

@Armadafg Thanks! It's definitely possible as long as the device supports sending pixel data to it. I have designed this lib portability in mind and abstracted the actual display device in the code so that you only need to implement HW-specific tile submission to the display. You can also write the pixel shading to either output to the native pixel format directly, or to some intermediate format and convert to native format upon tile submission.
Precisely, these graphics chips have a big defect which is to have a very limited instruction number compared to the power of the chips (from 1000 to 2000 instruction around).
This is due to the fact that the CPU sends in a well-defined part of graphics chips the instructions that it will have to execute. And this part of memory is very very limited.

However it can stream images stored at the cpu level, here the RAM is the most suitable.
The image can be in format BMP, DXT1, JPG or PNG. Do you think it would be possible to generate an image of one of this format from your 3D graphics engine?
The rasterizer renders the image in tiles (64x64px tiles in the video), so it doesn't require much memory for operation. For ILI9341 after rendering each tile I submit the pixel data for that tile to the display and then use that same memory to render the next tile, so instead of requiring 320x240px frame buffer, it only requires 64x64px buffer. For asynchronous DMA transfer on ILI9341, it copies the tile data to another buffer, so that requires some extra memory, but that's configurable how asynchronous rendering you want. But this is just how I implemented it for ILI9341. If you would want to, you could encode each tile to DXT for example and submit the compressed image to the display (if this is what your display supports), or copy the tile data to larger image and encode it in one go.
Some more fun optimizing geometry processing in the graphics lib :D This video visualizes "cluster cone culling", which omits processing of back-facing and occluded geometry clusters and helps in rendering more complex models. "Stanford dragon" model in the video has 23490 triangles and 11745 vertices split into 248 clusters, running on T4.

Thanks! Added texture support, so here's happy cube :D Textures are stored in program memory to save RAM. It's doing perspective correct interpolation with point sampling, and supports different pixel formats. Sorry about the washed out colors, it's tricky to record decent video of an LCD screen.

I also fixed rasterization fill rules and implemented Hi-Z cluster occlusion culling which reduces geometry processing further.
Very cool indeed. Love the references keep them coming. Haven't had much time to play with 3d stuff for awhile other things got me tied up for now.

Are you using the ILI9341_t3n library?
I implemented my own ILI9341 lib using Kurt's lib as a reference. It's further optimized and the DMA transfer supports partial tiled updates, so only updated regions are transferred over SPI. And it doesn't require holding the entire frame buffer in memory either, so could run this in higher resolutions without memory issues :)
I implemented my own ILI9341 lib using Kurt's lib as a reference. It's further optimized and the DMA transfer supports partial tiled updates, so only updated regions are transferred over SPI. And it doesn't require holding the entire frame buffer in memory either, so could run this in higher resolutions without memory issues :)

Cool - did you ever think about doing a PR back to @KurtE's library. Think Kurt was looking at implementing something similar.
The design is quite a bit different and I think Kurt would have to reimplement all the draw functions. Effectively his lib is immediate mode rendering while I defer rendering by recording draw commands to a command buffer and dispatch them to tiles when they are being processed. Here's what the rendering code of a 3D model looks like:
  // render mesh
  for(uint16_t seg_idx=0; seg_idx<s_mesh.num_segments(); ++seg_idx)
    test_shader sh;
    sh.m_o2c=o2c; // object->camera matrix
    sh.m_o2p=o2p; // object->projection matrix
  s_gfx_dev.commit(); // kick off tile rendering
dispatch_shader() queues commands and commit() performs the actual rendering of all the tiles at the end of the frame.