Playing around with yet another version of ILI9341_t3 library (ILI9341_t3n)

@kjn - Looks good, Also looks like I need a different version of SDFat .... I think mine is still going to his Beta project...

One thing I see different which explains some of the speed differences. I was reading a raw BMP file, with 3 bytes per pixel and the overhead of the header.... Where yours is the previously processed one...

As for Max Speed you could actually draw the screen, see if can sort of do the math again.

Lets say in continuous mode, you are only outputting the pixels: so 320*240*16, but will say 17 instead of 16 for one SPI clock gap between pixel
And if your SPI was running lets say at a maximum 30mhz, then 30000000/(320x240*17) = 22.977 so about 23 frames per second.

Which is in line with what you observe.

Now how to achieve that, with minimal tearing and the like?

One thing I considered and have not done to ili9341_t3n is to have the DMA operation, give some more possible feedback as part of the operation.

Suppose, that in order to output the ILI9341 frame buffer of 320x240x2= 153600 bytes you have to split this up into something like 3 or 4 DMASettings as each setting can only output 32767 words...
Note: The actual mechanics are different for T3.6 versus T3.5 (especially SPI1/2) and T4...

Currently in theory I may interrupt once per frame and update frame counter (or stop) depending on mode.

What I thought about doing is to maybe interrupt more often, and maybe have some form of partial frame count or percentage of frame or ??? So suppose we have a 50% marker we detect.

Then code could be written, that says, Ok top half of screen output, so read in the first half of next file, when it says frame is done, read in bottom half of the next file... Again this would assume that reading in is faster than display update...
 
@kjn - Looks good, Also looks like I need a different version of SDFat .... I think mine is still going to his Beta project...

One thing I see different which explains some of the speed differences. I was reading a raw BMP file, with 3 bytes per pixel and the overhead of the header.... Where yours is the previously processed one...

So the file I am using still is a full bitmap with header, but colour depth is 16bit (so not technically an "official" bitmap).
I used some conversion utilities (and a few programming tricks) from the ILI9341_due library, which was forked from the ILI9341_t3 library many years ago: http://marekburiak.github.io/ILI9341_due/
Check out the arcs function - pretty cool.

I've actually been looking at if I add another 8bpp to add an alpha channel - but this is the next project.


As for Max Speed you could actually draw the screen, see if can sort of do the math again.

Lets say in continuous mode, you are only outputting the pixels: so 320*240*16, but will say 17 instead of 16 for one SPI clock gap between pixel
And if your SPI was running lets say at a maximum 30mhz, then 30000000/(320x240*17) = 22.977 so about 23 frames per second.

Which is in line with what you observe.

So the implementation is already running at max "stock" speed - is there scope to o/c the FBUS/SPI and see if we can get any higher? How do you do this on a 3.6?
I just had a look Frank said he had overclocked to 60MHz bus speed, which using the above math would allow ~44FPS and 22ms refresh, this is still longer than an SD card read and would leave 6-8ms spare for "work".

Now how to achieve that, with minimal tearing and the like?

One thing I considered and have not done to ili9341_t3n is to have the DMA operation, give some more possible feedback as part of the operation.

Suppose, that in order to output the ILI9341 frame buffer of 320x240x2= 153600 bytes you have to split this up into something like 3 or 4 DMASettings as each setting can only output 32767 words...
Note: The actual mechanics are different for T3.6 versus T3.5 (especially SPI1/2) and T4...

Currently in theory I may interrupt once per frame and update frame counter (or stop) depending on mode.

What I thought about doing is to maybe interrupt more often, and maybe have some form of partial frame count or percentage of frame or ??? So suppose we have a 50% marker we detect.

Then code could be written, that says, Ok top half of screen output, so read in the first half of next file, when it says frame is done, read in bottom half of the next file... Again this would assume that reading in is faster than display update...

Breaking up the frame-buffer is a good idea - I'm not sure about execution in code.

One option is the drawBMP routine, as it draws in line by line, after it fills the first line to the FB it could call the refreshAsync(0) routine. As the refresh flushes in the same direction as the draw routine fills the buffer. The fill routine would always be a few rows ahead of the refresh. It really only works when trying to draw frames as quick as possible but would get you towards a simultaneous operation of the two processes.
 
So I have been playing with t3n on T4 using the framebuffer with good results but I wondered if it would make sense to have two framebuffers - you could be composing to one while the other one was being sent via dma in the background. I'm trying to figure out if it would be worth it or really get you anything, does it make any sense to do this?
 
So I have been playing with t3n on T4 using the framebuffer with good results but I wondered if it would make sense to have two framebuffers - you could be composing to one while the other one was being sent via dma in the background. I'm trying to figure out if it would be worth it or really get you anything, does it make any sense to do this?

I think it would be really interesting to explore this idea but I don't know enough about the low level mechanics to know if it's feasible. I think though if you had two framebuffers both dedicated to one screen then you might be able to do a comparison of the two and only send pixel data for pixels that have changed since the last update? Can anyone who knows this better comment on if this is remotely feasible for the way we're interfacing with these screens?
 
@1of9 and @sandalhat - Double buffering - Hard to know if it be beneficial to you or not. Most of the time for me, not so much. Although maybe depending on mechanics... That is when does the DMA output code choose to use the other Frame? When it finishes output of the current one? Does it then have to copy all of the current contents of one frame to another... Then not sure...

If however you are doing some form of like TV screen like buffers you load in and want a full one loaded before it displays and while displaying you wish to read in the next... Then maybe... Note only the T4 has enough memory to fully double buffer the whole TFT screen
 
Just an FYI - I keep meaning to remove the usage of the SPIN library...

So now have a new branch: "No Spin" which now builds and I have started to do a little testing.

So far I have run a couple of programs on T4 (The gauge one, and my Frame buffer and Clip test), and they are running (at least on SPI) still need to test SPI1 and SPI2.

Also run the FB and clipping test on T3.6 with non-standard SPI pins and it's running...

Anyway if anyone wishes to play along, that would be great!
 
That is when does the DMA output code choose to use the other Frame?

This is handled by graphics API's by client code calling flip() once it's done drawing a frame. Then you can decide to start DMA transfer or stall depending on the state of the previous DMA transfer.
 
@1of9 and @sandalhat - Double buffering - Hard to know if it be beneficial to you or not. Most of the time for me, not so much. Although maybe depending on mechanics...

For my use case, it does. It takes about 12ms for me to compose a frame, then another 28ms to send. So I can send a new frame every every 40ms or right at 25fps. If I could compose to a 2nd frame while the DMA is sending first one then I could send a frame every 28ms for a FPS of 35, quite a difference.

That is when does the DMA output code choose to use the other Frame? When it finishes output of the current one?

That works for me in my use case.

Does it then have to copy all of the current contents of one frame to another...

Nope, I would not need/want it to do that for me, I compose the full frame for what I am doing.
 
@mjs513 - I need to get back to this testing NoSPin - It worked on a couple of tests, but I have not yet tested on different SPI Busses and the like...

@1of9 and @JarkkoL ... If you are updating full frames at a time, lets say from top to bottom and your time to generate a new frame is < time to display, there are probably several ways to do this.

Again I am assuming you are using the DMA Async updates...

a) Sub-frame counts - Currently on T4 I don't DMA output directly from frame buffer, but have some other internal buffers, that I copy one portion at a time form frame buffer into them, to have the DMA output from... As to take care of issues of DMA physical memory versus cache consistency. So internally I keep a sub-frame counter. Which could be exported, and when a certain percentage of the display is done, you can fill in the memory above it with new contents, so your update code would be simply be making sure it keeps ahead (or depending on how you look at it behind) the dma update... I think this is how FrankB does it with his game stuff.

b) Dual buffers - Currently the DMA code for T4 as I mentioned does copy function from the frame buffer to internal buffer... It would be pretty easy to change the code slightly to allow two buffers here.
That is the DMA code could have a new variable _pfbtft_dma or some such name, that gets set to _pfbtft at the start of the updateAsynch function, which the ISR would use....

So in theory with this you could have two buffers allocated and at the start one of them selected...
Then once you finish a page: you might do something like:
Code:
tft.waitUpdateAsyncComplete();  // wait until the previous update if any has completed.
tft.updateScreenAsync();          // Start up the next async update
tft.setFrameBuffer(OtherFrameBuffer);  // alternate which one you use
...
do what you need to do to create a new page.
All of the graphic functions will work on the new page...
 
For more compact memory usage, you could double buffer the sub-frames. Just let client to define in how many slices they want to update the screen, and then just call "flip" as many times. E.g. if I init the device for 10 slices for 320x240 screen, the API returns me 32x240 buffers where I can draw. Basically something like:
Code:
  tft.init(320, 240, 16, 10); // allocates two 16bit 32x240 buffers
  for(int i=0; i<10; ++i)
  {
    void *bbuf=tft.getBackBuffer(); // return currently active back-buffer slice
    // draw to the 32x240 slice of the screen
    tft.flip();  // stalls until DMA transfer of previous buffer is done, flip the back-buffer & start new transfer
  }
So depending on the rendering routines and memory constraints, clients could optimize the code accordingly.
 
b) Dual buffers - Currently the DMA code for T4 as I mentioned does copy function from the frame buffer to internal buffer... It would be pretty easy to change the code slightly to allow two buffers here.
That is the DMA code could have a new variable _pfbtft_dma or some such name, that gets set to _pfbtft at the start of the updateAsynch function, which the ISR would use....

So in theory with this you could have two buffers allocated and at the start one of them selected...
Then once you finish a page: you might do something like:
Code:
tft.waitUpdateAsyncComplete();  // wait until the previous update if any has completed.
tft.updateScreenAsync();          // Start up the next async update
tft.setFrameBuffer(OtherFrameBuffer);  // alternate which one you use
...
do what you need to do to create a new page.
All of the graphic functions will work on the new page...

b) was what I had in mind. My draws to are not top to bottom and would require a total rewrite to do it in chunks like that. On the T4 there is plenty of memory available for a 2nd full buffer and the pseudocode you posted was exactly the kind of thing I would want.
 
b) was what I had in mind. My draws to are not top to bottom and would require a total rewrite to do it in chunks like that. On the T4 there is plenty of memory available for a 2nd full buffer and the pseudocode you posted was exactly the kind of thing I would want.

This has probably been posted before, but if you are directly doing DMA, etc. on the Teensy 4.0, be sure to understand which memory to use:
 
@1of9 - I put some simple changes in my "No Spin" branch that does what I mentioned to hopefully allow the b) method to work.


I did not try b), but did verify that some of current test program still work...

@MichaelMeissner - Yep DMA memory is bad to do DMA in... Which is why I have my class to have two smaller buffers in lower memory such that I can copy down to it and use it for DMA...

@JarkkoL - Sorry but that type of change would be far more extensive of a change. That is with Frame buffer code, it assumes the memory is contiguous, so it can do quick and dirty walking of memory...
Instead this would require it to keep lists of memory and memory pointers, and lets assume memory segments are N rows of data each, so when you go down to next row you have to see if still within same memory segment... Again not hard. I have/had an earlier version I was playing around with on an ESP32, where I had to do 2 memory allocations, as their memory was fragmented and could not allocate one that was large enough...
 
@KurtE

Just tested the noSPIIN on SPI1 and all seems to work. Used GraphicsTest, DemoSauce, ScrollTest, and your latest Gauge and it all passed with flying colors.

Tried SPI2 on the loglow board but does seem to work - guess its going to be busted since pin37 is one of those pins shorted to 38 and 39. Have to see about a work around - may have to wait until I get the other board and it see if it works
 
Thanks @mjs513 - I went ahead and tested as well and SPI2 had issue. I pushed up fix.

Tested now on
T4 (SPI, SPI1, SPI2)
T3.6 dito...
 
Merged NO_SPIN version into master
But put up a backup SPIN_VERSION branch in case someone needs/wants it...
 
@1of9 - I put some simple changes in my "No Spin" branch that does what I mentioned to hopefully allow the b) method to work.

That works just fine, thank you! Not seeing any issues with this No Spin version. Happy New year!
 
@KurtE Ok, it's perhaps too difficult of a change to an existing API with various drawing functions which assume random access to the frame buffer. I'm just working on a graphics project which can ensure more sequential FB update and it's feasible to update N scanlines at the time to save memory. I'll need to check the DMA transfer code in your project to try it out.
 
Quick question,
I just started playing with this library today.
I’m using the adafruit 320-240 display, ili9341 obviously. Have it up and running with a teensy3.2 @ 120Mhz .

What caught my eye was the vertical scrolling example. It is essentially exactly what I need for a project I’ve been working on for a few weeks.

However, after my initial elation at the fact that the example sketch was pretty much tailor made for my purpose, I realized after taking out the sketches delays, that it was far too slow for my needs. I’ll run some tests later but it seems to take it about 2 seconds to scroll through 20 lines. I need at least four times that speed.

So the quick question is simply why that particular scetch is so slow? I thought the teensy 3.2 could get much higher frame rate than that. Is there something I’m missing?

It’s wired as per typical, hardware spi. And the graphics test returns the same numbers I’ve seen earier in this thread.
 
@joey120373

ran a couple tests timing tests on the first scroll area using a T3.6 (don't have a T3.2 handy) with the delay(100) removed:
Code:
[B]No-FrameBuffer:  3.316 seconds
With FrameBuffer: 0.918 seconds.[/B]
This is using SPI Clock of 30Mhz. As @Chris O. pointed out SPI clock is going to be the biggest driver. The obvious answer is that data transfer.

The way scroll text area works is that it first reads the pixels and then redraws the scrollarea so it is time consuming.

By way of comparison on a T4 running at 600Mhz and the default 30Mhz SPI clock:
Code:
[B]With FrameBuffer: 0.842 seconds.[/B]
with a 60 Mhz SPI clock
Code:
[B]With FrameBuffer: 0.432 seconds.[/B]

NOTE: If you can increase the SPI Clock for the T3.2 you should see a large improvement in screen update times.

Framebuffer example:
Code:
/***************************************************
  This is our GFX example for the Adafruit ILI9341 Breakout and Shield
  ----> http://www.adafruit.com/products/1651

  Check out the links above for our tutorials and wiring diagrams
  These displays use SPI to communicate, 4 or 5 pins are required to
  interface (RST is optional)
  Adafruit invests time and resources providing this open source code,
  please support Adafruit and open-source hardware by purchasing
  products from Adafruit!

  Written by Limor Fried/Ladyada for Adafruit Industries.
  MIT license, all text above must be included in any redistribution
 ****************************************************/


#include <SPI.h>
#include <ILI9341_t3n.h>
#include <ili9341_t3n_font_ComicSansMS.h>

// For the Adafruit shield, these are the default.
#define ILI9341_RST 8
#define ILI9341_DC 9
#define ILI9341_CS 10

// Use hardware SPI (on Uno, #13, #12, #11) and the above for CS/DC
ILI9341_t3n tft = ILI9341_t3n(ILI9341_CS, ILI9341_DC, ILI9341_RST);

// If using the breakout, change pins as desired
//Adafruit_ILI9341 tft = Adafruit_ILI9341(TFT_CS, TFT_DC, TFT_MOSI, TFT_CLK, TFT_RST, TFT_MISO);

void setup() {

  Serial.begin(9600);
 
  tft.begin();
  tft.setRotation(3);
  tft.useFrameBuffer(true);

  tft.fillScreen(ILI9341_BLACK);
  while (!Serial) ; 
  tft.setTextColor(ILI9341_WHITE);  tft.setTextSize(1);
  tft.enableScroll();
  tft.setScrollTextArea(0,0,120,240);
  tft.setScrollBackgroundColor(ILI9341_GREEN);

  tft.setCursor(180, 100);

  tft.setFont(ComicSansMS_12);
  tft.print("Fixed text");

  tft.setCursor(0, 0);
  tft.useFrameBuffer(true);

  tft.setTextColor(ILI9341_BLACK); 
  uint32_t timer = millis();
  for(int i=0;i<20;i++){
    tft.print("  this is line ");
    tft.println(i);
    tft.updateScreen();
  }
  Serial.println(millis()-timer);

  tft.fillScreen(ILI9341_BLACK);
  tft.setScrollTextArea(40,50,120,120);
  tft.setScrollBackgroundColor(ILI9341_GREEN);
  tft.setFont(ComicSansMS_10);

  tft.setTextSize(1);
  tft.setCursor(40, 50);
  
  timer = millis();
  for(int i=0;i<20;i++){
    tft.print("  this is line ");
    tft.println(i);
    tft.updateScreen();
  }
  Serial.println(millis()-timer);


}



void loop(void) {


}
 
Of course we should mention that the T3.2 does not have enough memory to support a frame buffer, so that approach would not be available. I have from time to time considered seeing how hard it would be to add a pallet based frame buffer like we did for ILI9488, but T3.2 does not even have enough memory to have one byte per pixel, would probably need to be a nibble. so Max of 16 colors, and the manipulation of the image in memory is a little more complex.

In the past when I have done something like a scroll-able text region and did not have hardware support for it (i.e. enough memory for frame buffer), I would do it at a higher level. That is I would keep some form of list of text lines in whatever format was convenient. And when a new text line came in, that needed me to scroll, I would redraw that region starting with the first line that was now visible and draw the text lines...
I might keep additional information for each text line output, to remember how long it is in pixels. So when I draw a line over it, I can know if I need to blank the data to right of it. i.e. if the previous text output on that line went beyond where the new text line writes to, you need to clear out the old data.

With the library you can use a few more things to help out in this. That is you can potentially make use of the offset values and clip rectangle.

Good luck.
 
@KurtE
Of course you are right but I keep forgetting about the T3.2 having no framebuffer. So use to working with the T4 at this point with graphics
 
Back
Top