All interesting stuff. However a lot of this may also depend on your usage cases. For example if you are always drawing full screen images, or
filling the screen with a solid color, then using DMA could be slower if you can not do anything else in between operations. That is it takes time to fill your buffer, either with full color or a piece of your image... And for example with something like writeRect:
...
They choose to split up the SPI transaction after every other line. Why? Mainly because the SPI code could cause the Audio code to
sputter to get the next chunk of output, even if it coming from SDIO...
So if you do this with DMA and you don't break it up, then you can run into the same issue with Audio...
Always trade offs.
Sure. But usually you have to make that choice during the design cycle, by which i mean you typically can't just pick and choose. You're either backbuffering or not, so generally you don't have the luxury of deciding for each individual draw scenario.
In my "real world ish" tests - dogfooding version 1.x (direct writes) vs 2.x (backbuffering) of my own libs i virtually always get better raw frame rates for the latter, because you just aren't usually drawing simple rectangles or straight horiz or vert lines, which is the best case scenario for direct writes. In my 1.x lib I went to great lengths to batch operations into rectangular buffers so i could reduce the transaction overhead where possible. Even my best efforts, backbuffering gave me better frame rates pretty much across the board for apps I ported from 1.x to 2.x.
That's why i brought up real world scenarios initially - because it comes down to what you're going to end up doing. Do you plan on rendering text, for example? That will kill direct write perf.. And overall that's why LVGL is typically faster than say, TFT_eSPI doing the same sorts of "real world" applications.
Where direct writes really shine is where you don't have the memory for backbuffering. That's just not the case on the teensy.
Given that you generally will have to decide between back buffering and direct writes, the above matters a lot.
Also regarding audio. My UIX user interface lib uses dirty rectangles to update the display. it can do partial updates which is crucial here (I've noticed a lot of teensy libs only do fullscreen DMA refresh, which frankly, i don't find practical or performant for most scenarios). It basically uses coroutine so that it never blocks very long, and if you pass (false) to update it won't even up the entire dirties, just one. so you can interlace it with audio just fine, as I have and do, DMA or no. (also usually you have more than one DMA channel on an ESP32 or a Teensy 4 at least)