Playing around with yet another version of ILI9341_t3 library (ILI9341_t3n)

I did some mods - wanting to show I could selectively update one half or the other.

No real complaints - nothing debugged or confirmed or cleaned up enough to share. I did set rotate==1 as my display hangs that way.

One fun effect was top and bottom shapes out of sync - when drawn on same frame#.

Another odd effect was ( an EARLY change I don't recall ATM ) the split line between frames was a flickering/flashing lineout of sync with wither frame

> A bit odd the lower subframe gets to Frame X and draws and then upper subframe would be drawn at X+1 with matching shapes and colors?
> I did something that now has top drawing at X and the bottom at X+1 Frame# and they are generally in Sync except as noted below.

Then a quick hack for speed adjustment - runs FAST<>SLOW but some draws show odd sync - that is possibly how I hacked the 'frame&0xf"

I think it is likely a cusp in the frame&0x# logic - right now in SLOW mode top and bottom fail to have matching every 240 frames and the draws are every 16 frames - which suggests byte wrapping is the cusp - possibly the 'color_index = (frameCount >> 4) & 0x7;' or other 'rounding error' caused by the

But if Both are drawn on the same Frame Number that is when the shapes are out of sync - and or when the central line never matches the two halves?

I probably spent more time noodling this in writing this post than I did yesterday … the fast display {generally smooth and perfect} rate is too fast to watch and totally mesmerizing.
 
Quick update: I updated the DMA code for the T3.6 in the same branch to allow the callback as well.

As part of this I reworked part of the DMA code, which I think makes it a bit cleaner.

Before in NON-continuous mode it had the DMASettings chain 0->1->2 and kills off 3. It starts off outputting the 2nd pixel in 0 as it needed us to do a PUSH to properly set the proper 32 bits of the PUSHR register...
With Continuous mode, I had 0->1->2->3->0... where 3 only output 1 pixel...
Now 3 is setup to be sames as 0 except it starts at first pixel and for the full size, so loop is: 0->1->2->3->1...
Just feels cleaner. So I don't have to muck with chain. I always setup for interrupt and completion on 2. Each time I start I just make sure to copy 0->TX...

So tried with my DMA clip test sketch plus with the continuous test sketch...

I pushed the changes up to same branch as above.
 
Thanks,

Next up T3.5... Which is always interesting.

So far with T3.6/and T4 I have only tested the changes on SPI... Will do a quick check on SPI1 on T3.6 probably... Only real difference is FIFO queue size.

But with T3.5 SPI1 and SPI2 do not have two DMA sources...

Again maybe I will revisit this and see if it can be used for TX? But first SPI object.
 
I tried a couple of things with T3.5 to see if I could get the scatter/gather stuff to work... Failed (like before), So I added the callback in, which was pretty easy as I am having to field interrupts anyway.

Appears to work on SPI.. Probably should try on SPI1...
 
@KurtE and @defragster
I combined the example asynch updates with the Buddhabrot example. Works really smoothly. This is before the latest changes for the t3.6 and on SPI. Attached is the test sketch.
 

Attachments

  • ILI9341_BUDDHABROT_Frames.zip
    4.6 KB · Views: 68
Good Morning...

I did some more cleanup this morning in the branch I am working on.

I removed the older ways of doing the DMA transfers for T4... I like the new way better, as I mainly only use the Update Once type function, which has less overhead... And the continuous update stuff is working pretty well with the optional callbacks. Also the T3.6 stuff is cleaner.

Also like the ILI9488_t3 and HX8357_t3n stuff I converted to newer library format with src directory.

This morning I tried both T4 and 3.6 with SPI1 and they are working.

I also verified and fixed it compiling on T3.2.

So maybe in a couple hours I will merge these changes in...

The only other issue I see, which may not be an issue, which may not be an issue is the Font test program, that I was trying out with the RubikRegular12 font is not working... But now remember their font was not correct... So will discard those changes.

Let me know if you think I should delay the update.
 
What kind of performance gains are you seeing with this new method as opposed to the current available release?
Can you share some figures on the display test sketches to show comparison?
 
Sorry, I am not sure...

T3.x - I would expect no change. 3.5 same code as before except it can optionally call out. Note T3.5 has issues... That is could not get scatter gather to work on it so has to do it anyway in chunks for SPI object and SPI1 and SPI2 only have DMA_SOURCE setting for them... So doing tricks. I should at some point see if I can avoid those tricks as the SOURCE should be able to do RX or TX... But code was modeled after what I needed to do in SPI library.

T4 - Not sure how exactly what to measure. That is, the time it takes to output to the screen should be pretty close, as the old code round robin between two linked DMASettings, so SPI should stay busy. What performance differences is that, the new code does not require extra buffers. The old code would get an interrupt each time one of these smaller buffers finished transferring, and then it would copy the next chunk of frame buffer memory into it, while the DMA continued to output the data in the other buffer. So what you would see, is a reduction of DTCM memory used, plus a lot more interrupts (new code is 1 or 2 per frame, old code each of the two buffers was 960 words long, so you would process something like: 80 interrupts per frame.

But again if your code is not doing anything else than might not matter, but ...
 
The only other issue I see, which may not be an issue, which may not be an issue is the Font test program, that I was trying out with the RubikRegular12 font is not working... But now remember their font was not correct... So will discard those changes.

Let me know if you think I should delay the update.
I would say if it works with all the other fonts I would go ahead and release it. The Rubrik font format was incorrect. If you do want to try the RubrkRegular font I had updated it in post 280
 
PR -> Merged....
Created a release version just before this, so one can get back to it if needed.
 
Does anyone have advice on the best way to write to the frame buffer? I'm working on a program that sends frame data over USB. Whenever I try to push more than about 50 FPS to my T4 I get an error where the last scanline of the display sometimes displays what should have been on the first line. It essentially shifts it over a line. It's somewhat sporadic, sometimes it shifts more than one line, and most of the time it's fine. I'm just wondering if i'm writing to the frame buffer correctly.

Code:
#include "SPI.h"
#include "ILI9341_t3n.h"
#define SPI0_DISP1

ILI9341_t3n tft = ILI9341_t3n(9, 10, 8);

const int width = 320;
const int height = 240;

unsigned volatile char sBuf[width * height * 3] = { 0 };

void setup() {
  tft.useFrameBuffer(true);
  tft.begin();
  tft.setRotation(3);
  Serial.begin(1);
  tft.fillScreen(0);
}

uint16_t *sm[256];

void loop(void) {}

void serialEvent()
{
  switch (Serial.read())
  {
    case 0x05: //request for definition
      Serial.println(width);
      Serial.println(height);
      break;


    case 0x12: //frame data
      Serial.readBytes(sBuf, width * height * 2);
      uint8_t *sData = sBuf;
      
      tft.waitUpdateAsyncComplete(); //wait until draw finished
      
      uint16_t *buffer = tft.getFrameBuffer();
      
      for (int i = 0; i < width * height; i++)
        *buffer++ = (uint16_t)(*sData++ << 8 | *sData++);

      Serial.write(0x06); //acknowledge
      tft.updateScreenAsync(false);
      break;
  }
}

There isn't any errors if I use UpdateScreen, but that gets rid of the benefit of being able to read serial data while the screen is still refreshing. Any thoughts?
 
Timing KurtE's dual subFrame example looked to me about 13 ms between half updates - 26 ms for a full screen. If you could split the screen then each alternate half can be updated about 76 FPS - but the whole would be 38 FPS max at current SPI speeds.

That dual subFrame example is :: ...\libraries\ILI9341_t3n\examples\ili9341_t3n_UpdateAsyncCont_Test\ili9341_t3n_UpdateAsyncCont_Test.ino
 
Alright, I got it working. First of all, I'm running SPI at a cool 122Mhz, which seems to be running fine, but that's as high as I can push it. I've essentially solved my problem by running the display in continuous async mode, reading the serial data into a separate buffer, and then doing a fast memcpy to move the data into the frame buffer. The memcpy only takes about 160 microseconds, so it essentially solves my problem, but it'd work better if the library had an option for true double buffering. That way you could just swap whatever pointer the DMA references for the buffer between two buffers once one's been written to, but it certainly would undo the memory improvements from yesterday... anyways here's the code

Code:
#include "SPI.h"
#include "ILI9341_t3n.h"
#define SPI0_DISP1

ILI9341_t3n tft = ILI9341_t3n(9, 10, 8);

const int width = 320;
const int height = 240;

unsigned volatile char sBuf[width * height * 3] = {0};
uint16_t cBuf[width * height] = {0};

uint8_t *sP;
uint16_t *cP;

void setup() {
  tft.useFrameBuffer(true);
  tft.begin(112000000);
  tft.setRotation(3);
  Serial.begin(1);
  tft.updateScreenAsync(true);
}

void loop(void) {}

void serialEvent()
{
  switch (Serial.read())
  {
    case 0x05: //request for matrix definition
      Serial.println(width);
      Serial.println(height);
      break;

    case 0x12: //frame data
      Serial.readBytes(sBuf, width * height * 2);

      sP = sBuf + (width * height * 2);
      cP = cBuf + (width * height);

      for (int i = 0; i < width * height; i++)
        *--cP = (*--sP | *--sP << 8);
        
      memcpy(tft.getFrameBuffer(), cBuf, (width * height * 2));
      
      Serial.write(0x06); //acknowledge
      break;
  }
}
 
As mentioned, I may have broken some psuedo double buffering stuff. That is with the old way, you might have gotten away with the start up the async, then tell it to change frame buffer, do your updates, wait for the previous frame to complete and start using that frame buffer for it's updates... Then repeat. But what it actually did was to do your memcpy, one chunk at a time every time.

There are a few different ways to do this, If what you want is the ability to write to two different buffers one could hack up the code to add in second buffer.

Example one could hack up the DMASetting chain... The new T4 code sets it up like:
Code:
	_dmasettings[0].sourceBuffer(_pfbtft, (COUNT_WORDS_WRITE)*2);
	_dmasettings[0].destination(_pimxrt_spi->TDR);
	_dmasettings[0].TCD->ATTR_DST = 1;
	_dmasettings[0].replaceSettingsOnCompletion(_dmasettings[1]);

	_dmasettings[1].sourceBuffer(&_pfbtft[COUNT_WORDS_WRITE], COUNT_WORDS_WRITE*2);
	_dmasettings[1].destination(_pimxrt_spi->TDR);
	_dmasettings[1].TCD->ATTR_DST = 1;
	_dmasettings[1].replaceSettingsOnCompletion(_dmasettings[2]);
	if (_frame_callback_on_HalfDone) _dmasettings[1].interruptAtHalf();
	else  _dmasettings[1].TCD->CSR &= ~DMA_TCD_CSR_INTHALF;

	_dmasettings[2].sourceBuffer(&_pfbtft[COUNT_WORDS_WRITE*2], COUNT_WORDS_WRITE*2);
	_dmasettings[2].destination(_pimxrt_spi->TDR);
	_dmasettings[2].TCD->ATTR_DST = 1;
	_dmasettings[2].replaceSettingsOnCompletion(_dmasettings[0]);
	_dmasettings[2].interruptAtCompletion();
You could have 3 additional _dmasettings One could do it as part of library and/or external. Where you make it look logically like:
Code:
	_dmasettings[0].sourceBuffer(_pfbtft, (COUNT_WORDS_WRITE)*2);
	_dmasettings[0].destination(_pimxrt_spi->TDR);
	_dmasettings[0].TCD->ATTR_DST = 1;
	_dmasettings[0].replaceSettingsOnCompletion(_dmasettings[1]);

	_dmasettings[1].sourceBuffer(&_pfbtft[COUNT_WORDS_WRITE], COUNT_WORDS_WRITE*2);
	_dmasettings[1].destination(_pimxrt_spi->TDR);
	_dmasettings[1].TCD->ATTR_DST = 1;
	_dmasettings[1].replaceSettingsOnCompletion(_dmasettings[2]);
	if (_frame_callback_on_HalfDone) _dmasettings[1].interruptAtHalf();
	else  _dmasettings[1].TCD->CSR &= ~DMA_TCD_CSR_INTHALF;

	_dmasettings[2].sourceBuffer(&_pfbtft[COUNT_WORDS_WRITE*2], COUNT_WORDS_WRITE*2);
	_dmasettings[2].destination(_pimxrt_spi->TDR);
	_dmasettings[2].TCD->ATTR_DST = 1;
	_dmasettings[2].replaceSettingsOnCompletion([COLOR="#FF0000"]_dmasettings[3][/COLOR]);
	_dmasettings[2].interruptAtCompletion();

	_dmasettings[3].sourceBuffer(second_buffer, (COUNT_WORDS_WRITE)*2);
	_dmasettings[3].destination(_pimxrt_spi->TDR);
	_dmasettings[3].TCD->ATTR_DST = 1;
	_dmasettings[3].replaceSettingsOnCompletion(_dmasettings[4]);

	_dmasettings[4].sourceBuffer(&second_buffer[COUNT_WORDS_WRITE], COUNT_WORDS_WRITE*2);
	_dmasettings[4].destination(_pimxrt_spi->TDR);
	_dmasettings[4].TCD->ATTR_DST = 1;
	_dmasettings[4].replaceSettingsOnCompletion(_dmasettings[5]);
	if (_frame_callback_on_HalfDone) _dmasettings[1].interruptAtHalf();
	else  _dmasettings[4].TCD->CSR &= ~DMA_TCD_CSR_INTHALF;

	_dmasettings[5].sourceBuffer(&second_buffer[COUNT_WORDS_WRITE*2], COUNT_WORDS_WRITE*2);
	_dmasettings[5].destination(_pimxrt_spi->TDR);
	_dmasettings[5].TCD->ATTR_DST = 1;
	_dmasettings[5].replaceSettingsOnCompletion(_dmasettings[0]);
	_dmasettings[5].interruptAtCompletion();
So the complete pass through the DMA chain will update the screen twice, once with original buffer than with second buffer...
But other code would probably need to change as well. Maybe not if you don't do any graphic primitives. That is the tft.setFrameBuffer(....) will tell system that next time through it needs to reinitialize the the dmasettings...

But could see a quick and dirty version, that maybe adds a couple of new members, like:
tft.setFrameBuffer2(....). and either tft.switchFrameBuffer() or tft.setActiveFrameBuffer or ??? to tell all of the graphic primitives which buffer to update. Note: probably none of these would actually do anything to synchronize the two buffers. Although one could add another parameter, that optionally has it do a memcpy from the previous active buffer to the new active buffer...
 
@KurtE and @defragster
I combined the example asynch updates with the Buddhabrot example. Works really smoothly. This is before the latest changes for the t3.6 and on SPI. Attached is the test sketch.

Forgot to mention, that this is running great!
 
Cool ! Thought it would make a neat little test case for the changes. Still haven't updated to the latest though - on to do list.
 
Can anyone shed light on how to the the frameCount() function?
I'm using this specifically on the optimized HX8357 library.

I'm setting the frame buffer to true in the setup, and then setting and clearing the clip rectangle in the loop (code is based off of the gauge test Kurt put together for me).
At the end of the loop, im saving the value of frameCount into a uint_32t variable, and printing it out to the serial monitor. But it just prints out 0 after every loop cycle.

I'm setting and clearing the clip rectangle multiples times in each loop (10-12 times for different areas of the display).

Is there a code snippet that demonstrates how to use this function to measure FPS?
 
May depend on how you are using the library. If it is running in continuous mode, each updateScreenAsync will reset the counter to 0. If you are running it in continuous mode, the dma operaration should process an ISR at the end of each page update and increment the count
 
May depend on how you are using the library. If it is running in continuous mode, each updateScreenAsync will reset the counter to 0. If you are running it in continuous mode, the dma operaration should process an ISR at the end of each page update and increment the count

I'm not using updateScreenAsync, just updateScreen after setting each clip rectangle - perhaps if I call updateScreen once in a loop it might yield better results?
Here is the entire sketch (minus the CANBUS Rx/Tx content, as that is some of the secret sauce of the project :D)
 
Wondering out loud, from time to time I get PMs or Issues raised on github or ...

About wanting to do double buffering of the screen. With the T4 and especially with T4.1 with external memory, we have enough memory to allocate at least two full buffers for a display.

There are many ways for the code to maybe handle double buffering.

One currently in place (hopefully still working) is for using DMA to do updates, but not in a continuous update mode. That is we can something psuedo like:

Set to FB1, draw stuff, updateDisplayAsync, Set to FB2, draw stuff, Wait for Update Aysnc to complete, UpdatedisplayAsync... (wait for competele) repeat...


We now also have a continuous update operation, where you can set an optional call back function that is called when a frame completes and optionally on the half frame. So currently you can setup to know when the first half has completed and fill in that part of the image, while the 2nd half completes, and when the 2nd half completes and starts up again to draw the first half, you can update the 2nd half.
I have an example simple test sketch that does this.


What is also possible, but not implemented is to extend this 2 half update code, for continuous updates.

That is, it should not be hard to setup the code to have two full buffers, lets say first buffer used for even pages and the second buffer used for odd pages.

I could then setup to have more DMASettings objects, that are chained to each other. Such that when the DMA completes the output of the first buffer, it simply continues on to the 2nd buffer. We have an ISR that is called when this happens that can do the callback like mentioned and/or can also automatically change which Frame buffer to use for outputting graphic primitives...

Does this make sense? Is it worth it?
 
I would be happy to see something that utilizes the extra RAM on the 4.1 assuming it does significantly speed up writing to the display. The questions is how much will it improve performance, and also, where is the possible bottleneck? (Perhaps the limitations of the SPI bus vs QSPI or something alike)

After migrating my gauge display from the ILI9341 to the HX8357 I've noticed now that the smoothness of needle drawing has been impacted going to a higher resolution display. If allocating a portion of the RAM to a single or dual frame buffer will help us smoothen and speed up writing of frames then the question of if it should be done is a no brainer. The real questions as stated above is how much of an improvement will we see visually and performance wise
 
In itself it would have probably 0% difference in possible speed.

That is I am mainly talking about changing how continuous DMA could work, where the number of frames per second is limited by how fast you can write a full page of pixels to the screen over SPI.

So larger display with double the pixels running at the same SPI speed will take twice as long to update the display.

But the changes mentioned could make it easier for those who are doing full image updates and it is not as easy to update the frame buffer a half frame at a time, which one can do now, without worrying about partial flicker, ...

As for Gauge drawing with flicker and the like. I added code to the ili9341_t3n to help speed things up. In particular the code with each graphic primitive in frame buffer mode, would keep a bounding rectangle of what parts of the image changed since the last updateScreen was called and when you call updateScreen it would only update that portion of the display, which in cases of a needle changing a little bit was only a small fraction of the full pixels of the display.

I do not believe that we ever updated the HX8357 or ILI9488 or ... to have this same code. So again sort of double issue with going to the larger display. Maybe at some point we should try to add that code in.
 
In itself it would have probably 0% difference in possible speed.
....
But the changes mentioned could make it easier for those who are doing full image updates and it is not as easy to update the frame buffer a half frame at a time, which one can do now, without worrying about partial flicker, ...

If it helps applying draw functions in an easier way than we have now, having to use setClipRect on each portion of the screen we want to update, then defiantly will make the implementation process a whole lot quicker and error free.

As for Gauge drawing with flicker and the like. I added code to the ili9341_t3n to help speed things up. In particular the code with each graphic primitive in frame buffer mode, would keep a bounding rectangle of what parts of the image changed since the last updateScreen was called and when you call updateScreen it would only update that portion of the display, which in cases of a needle changing a little bit was only a small fraction of the full pixels of the display.

I do not believe that we ever updated the HX8357 or ILI9488 or ... to have this same code. So again sort of double issue with going to the larger display. Maybe at some point we should try to add that code in.

If you are referring to setClipRect that you had implemented on the last test sketch you put together for the ILI, I moved the whole thing over to the HX display/library and it works well at the same size (240x240 gauge), but the needle seems to chop a bit if it moves too fast on a 320x320 gauge.
If you're referring to an additional sub function within setClipRect implemented specifically on the ILI3941 library, then I'd be happy to try and port that over to the HX if it isn't too complicated.
 
Back
Top