Non DMA ST7735_t3 Framebuffer vs GFXcanvas16

Status
Not open for further replies.

vjmuzik

Well-known member
I'm working on a project that uses 4 160x80 ST7735 displays from buydisplay.com and in order to support this many displays without being slowed down by SPI speeds I have to have the displays buffered. Right now I'm using the stock Adafruit ST7735 Library for control and GFXcanvas16 objects for the buffers then I just draw the canvas as a RGB bitmap to the display. I've settled for updating the displays at 24 frames per second as this leaves me with roughly 1.3-1.4 million cycles for the rest of the sketch and reading peripherals devices. Since I'm running so many displays at one time is there any advantage to using the modified ST7735_t3 with it's framebuffer if I can't use the DMA support?
 
Sorry, I am not really sure what your question is or more importantly what your setup is.

And for example why you say you can not use DMA support? With the ST7735_t3/ST7789 for example we have the uncanny eyes working. For example I have one version that runs the 240x240 displays, both on different SPI busses and I can setup up an update and continue to do other things, while the display is updating.

I have not used their canvas stuff, but I think it works sort of in a similar way. That is it redirects all GFX like primitives to write to memory and then you have a function to then outputs that bitmap to the actual screen (more or less our updateScreen() method with frame buffers.

But again I don't know for example what your setup is? T4.x? T3.x? Can you put maybe two on SPI and two on SPI1? Or potential maybe have one on SPI2 depending on your device and if you can get access to bottom pins or the like...

Regardless. I have not looked at how the current Adafruit_st7735 does something like our writeRect (or updateScreen, or more simply fillScreen...

Does it still go through the SPI library doing SPI.transfer and SPI.transfer16? If so on almost all of our boards we should be faster, as the SPI.transfer functions do not make use of the FIFO queues and they wait for the return value to be ready to return from the transfer, which leaves gaps of time between each output.

But again that would be easy to do simple timing test on.

But again if you can setup your system such that you use multiple SPI ports and the like example display 1 and 3 on SPI and 2 and 4 on SPI1.
Then you can do things like:

Code:
<do updates to screen 1>
<check to see if screen 3 has finished any updates>
<start update tft1.updateScreenAsync();

<do updates to screen 2>
<check to see if screen 4 has finished any updates>
<start update tft2.updateScreenAsync();

<do updates to screen 3>
<check to see if screen 1 has finished any updates>
<start update tft3.updateScreenAsync();

<do updates to screen 4>
<check to see if screen 2 has finished any updates>
<start update tft4.updateScreenAsync();

But again don't know if this is doable or not...
 
Sorry for the lack of information, I’m using a Teensy 4.1, but I was under the impression that the DMA would have to have a dedicated SPI bus per display, now I see that’s not necessarily the case if you do it in one shot mode. However, the way the canvas works is how you describe it, any calls you would make to a display object can be made to a canvas object instead and it draws everything to memory without making any calls to SPI. Once you have your canvas completely drawn you can then make a call to drawRGBBitmap on your display and use the canvas as the bitmap buffer so this is the only call to SPI that would slow anything down.
 
Another thing to consider is that if I can get a considerable boost in performance I’ll be adding more displays to the one Teensy, as it is there will probably be a total of 12 displays split across 3 Teensy 4.1s. Right now I just have everything setup on a couple breadboards running SPI as fast as it would go, the other peripherals I’m going to be using are encoders and buttons corresponding to each display as well as probably a CAN controller. I already have the encoders setup with the 4 displays so that’s already taken into account with the 1.3-1.4 million cycles left. Adding one more display with how I have it setup now dropped the performance to less than 1000 cycles left last time I tested it.
 
There could be several different strategies one might try, depending on your setup, data, and usage patterns.

The one I mentioned above, which is to round robin update the displays, using DMA. Only one SPI can be active at a time per SPI buss, but that can be managed by the code.

Again I don't know if for example what each of your displays are. If for example they are each something like a gauge, where for example you move a needle depending on value. Again our updateScreen call is probably the same as their drawRGBBitmap function. However there is another capability with our updateScreen code, which they may not have. We have the ability to set a clip rectangle. So for example if your needle moves up a bit, you can calculate a bounding rectangle of the new and positions (union of two rectangles), set that as a clip rectangle and do an updateScreen. This will only output the pixels in that rectangle of the screen. And if for example you only touch 1/4 of the pixels, your update is that much faster. So far we have not added the functionality to use the clip rectangle as part of the Async update, but it would not be terribly difficult, it would simply require us to resetup the dmaSetting objects associated with the display.

Likewise if you are just updating some text field(s), it may be faster to just update them instead of updating whole display. Our library supports both the ILI fonts as well as the GFX fonts. And unlike the Adafruit library, our GFX font output code supports doing this in opaque mode. So you don't have to fillRect or the like the old text first, but can do some reasonable text output to only update each pixel once. Note: depending on your layout, this may require you to setup clip rectangles (works both using frame buffer or not), as Opaque text output, uses the specified background color for the full height of the text up to the beginning of where the next row of text would be, which you may want or not. If not you can set the clip rectangle to the bounds of your logical field and simply output the text... As to handling if the new text is shorter than previous text. There are a few ways to handle this. Such as output a few extra blanks at the end... Or get the new text cursor X position after your new text outputs and fillRect with background color to end of field or to where the previous text output went to....

Sounds like fun, with 12 displays 8)
 
I thought about only updating the part of the display that changes but I want to program it for a “worse” case scenario so I know that updating the entirety of all the displays doesn’t cause the rest of the code to slow down. Reason being is that the controls and displays need to look and feel smooth and any kind of inconsistency would be noticeable as it will be affecting things in real-time.

Realistically, without DMA and the builtin framebuffer is the modified Teensy library faster at drawing a RGB bitmap? If so I can just switch to using it and still draw the GFXcanvas16 objects that way.
 
Again I have not used their library in awhile and never used their canvas stuff. But for example with a really simple fillScreen test like:
Code:
#define USE_ADAFRUIT
#ifdef USE_ADAFRUIT
#include <Adafruit_GFX.h>    // Core graphics library
#include <Adafruit_ST7735.h> // Hardware-specific library for ST7735
#include <Adafruit_ST7789.h> // Hardware-specific library for ST7789
#else
#include <Adafruit_GFX.h>    // Core graphics library
#include <ST7789_t3.h> // Hardware-specific library
#include <ST7735_t3.h> // Hardware-specific library
#endif
#include <SPI.h>

// T4.0
#define TFT_SCLK 13  
#define TFT_DATA 11  
#define TFT_CS   10  
#define TFT_DC    9  
#define TFT_RST   28  

#ifdef USE_ADAFRUIT
Adafruit_ST7789 tft = Adafruit_ST7789(TFT_CS, TFT_DC, TFT_RST);
#else
ST7789_t3 tft = ST7789_t3(TFT_CS, TFT_DC, TFT_DATA, TFT_SCLK, TFT_RST);
#endif

void setup() {
  while (!Serial && millis() < 5000) ;
  Serial.begin(115200);
  tft.init(240, 320);
}

void loop() {
  elapsedMillis em = 0;
  tft.fillScreen(ST77XX_RED);
  tft.fillScreen(ST77XX_GREEN);
  tft.fillScreen(ST77XX_BLUE);
  tft.fillScreen(ST77XX_BLACK);
  tft.fillScreen(ST77XX_WHITE);
  Serial.println((uint32_t)em, DEC);
  delay(1000);
}
Note this is the ST7789 240x320 as that is what I had easily setup...
Running with our library the elapsed millis is printing about 256 and the Adafruit is printing about 317...

And I don't know how much different it would be using their draw function versus our writeRect...
But again probably not hard to setup a test to see.

Edit: as for worst case timing... again all of that may be controlled, like you only start the next update after some period has elapsed from the previous update...

But again I don't know your setup.
 
So after looking through the Adafruit library I found that since they didn’t have any Teensy specific defines in a couple places, it was transferring each 16 bit pixel as 2 bytes to SPI. After fixing that I got I nice speed improvement, but then my SPI speed became unstable so after making it stable it ended up being a little slower. Following that route I decided to try sending 2 pixels at a time by adding a transfer32 function to SPI.h (not sure why that’s not already there), this got me up to 2.05 million spare cycles. I tried to get a transfer64 function working, but I couldn’t get a frame size over 32 working, so after that failure I tried copying the transfer buffer code and modifying it for a transfer16 and transfer32 buffer. Transfer16 buffer got me up to 2.69 million cycles and transfer32 buffer got me to 2.71 million cycles with 4 displays, definitely an improvement that I’m happy with so far. Adding 2 more displays to this I get 1.5 million cycles which should be plenty left for me to do the other things I need to do. Now that I have it a little more optimized I can cut down from 3 Teensy 4.1s to 2 and save a little bit of PCB real estate since it’s already going to be a tight fit with the area I’m working with.
 
Status
Not open for further replies.
Back
Top