Highly optimized ILI9341 (320x240 TFT color display) library

KurtE, not using transparent text. Each frame I was updating 7 different numeric values at the bottom of the display in yellow, and I'm clearing them between draws by filling a rectangle with black. It seemed to me that would actually be faster than re-displaying the same text in black to clear the old values. I assume clearing the entire area would be faster than re-rendering the text values in black.
 
KurtE, not using transparent text. Each frame I was updating 7 different numeric values at the bottom of the display in yellow, and I'm clearing them between draws by filling a rectangle with black. It seemed to me that would actually be faster than re-displaying the same text in black to clear the old values. I assume clearing the entire area would be faster than re-rendering the text values in black.

During previous testing last year I played around with both. For my particular probject it ended up faster to only blank text instead of the rectangles. My main reason for not doing large blanks was that if you blank text and don't change it, it causes a flashing effect.
 
You might want to take a look at the implementation of the various drawing routines. The actual drawing is all done at the individual pixel level. Roughly speaking, more pixels means more time. There's a little bit of saving at the protocol level when drawing multiple pixels vs drawing individual pixels, but computation on the Teensy isn't the bottleneck.

Probably the fastest way to update a text field is to draw the diff between the old glyphs and the new glyphs. Earlier in this thread someone posted code to do just that. I'm sorry I don't remember who.
 
You might want to take a look at the implementation of the various drawing routines. The actual drawing is all done at the individual pixel level. Roughly speaking, more pixels means more time. There's a little bit of saving at the protocol level when drawing multiple pixels vs drawing individual pixels, but computation on the Teensy isn't the bottleneck.

Probably the fastest way to update a text field is to draw the diff between the old glyphs and the new glyphs. Earlier in this thread someone posted code to do just that. I'm sorry I don't remember who.

That makes a lot of sense, if that is available. I'll try checking values and not redrawing if they haven't changed, and if they have, redraw in black then display new value. Part of what I've been displaying is the time it takes for the various parts of the display, in microseconds, and a padding value that spins wheels to keep it a constant overall time. That also provides dwell time so the display has more contrast. I can easily time the text redraw vs clearing a rectangle and see which is faster.

If you update it as fast as possible, you get the previously mentioned flicker. For the waveform display I redraw the old vectors in black, then redraw the grid, then draw the new waveform and numeric values, then spin for enough microseconds to pad it to 40,000. Thinking about it, if you update as fast as possible, and clearing takes about the same time as the display, you end up with 50% duty cycle, which explains the dim display.
 
... Probably the fastest way to update a text field is to draw the diff between the old glyphs and the new glyphs. Earlier in this thread someone posted code to do just that. I'm sorry I don't remember who.

That is what I was getting at in my earlier post a couple back. I once posted a sample of tracking the old value - and only re background writing the old when it changed then writing the new text/value.

Here is one version that : assumes the new .vs. old was done before calling?
 
That's not the function I was referring to. Since it calls print twice, it typically draws more than strictly necessary. To reduce flicker to an absolute minimum and draw as fast as possible, for each update, each pixel that needs to change should be written just once.

For a moment, let's assume a glyph is drawn by copying a simple 2D black and white array with white as the background. To change a 3 to an 8, our updateGlyph() function scans all the pixels in the bounding box of the 3. Any pixel that's white in both or black in both the 3 and the 8 images is skipped (no need for change). Any pixel that's different between the two is either drawn in the foreground or background color as needed.

The actual data structure for drawing characters is more complicated, but the principle is the same.

Maybe this post was the one I was thinking of.
 
Last edited:
KurtE, not using transparent text. Each frame I was updating 7 different numeric values at the bottom of the display in yellow, and I'm clearing them between draws by filling a rectangle with black. It seemed to me that would actually be faster than re-displaying the same text in black to clear the old values. I assume clearing the entire area would be faster than re-rendering the text values in black.
Actually this is probably the slowest way.

If you look at the code, it will say drawing with non-transparent maybe 5 times faster and I don't think he was counting the fill rect to begin with.

I will try to explain why I believe the non-transparent is so much faster, again everything has to do with how much stuff goes out the SPI pins.

Current stuff: You do a fill rect: Which will output: Address stuff about 10 bytes, plus Command byte: plus 2 bytes for every pixel

Then you do the draw characters: When it transparent mode, it finds each section it needs to write to screen (bits are on in text font, it figures out how many consecutive in the right direction) and then depending on if it is scaling or not, it may draw a line or a rectangle. But again for each of these segments it has to output all of the preamble stuff (again may 11 bytes) plus the two bytes for each pixel being touched... These rectangle/line preambles really add up.

This is versus: drawing text with FG/BG color set (Not transparent). In this mode, it is more or less doing the exact same thing as one fill rect. The only difference is instead of a content value going out, it outputs either FG or BG color depending on if the corresponding bits are on or off in the font.

The main downside of not doing the clear and then draw text is if the string lengths are different. You need to add code to blank out the now empty spaces on the left or right depending on left/right justified. If just one or two characters, probably simplest to simply output ' ' characters. If Potentially several characters can instead calculate the pixels to be cleared and do one fill rect to clear those.

Hope that makes sense.

Kurt
 
I've never really delved into reading datasheets to know what bits to flip for registers. The datasheet suggests a framerate of 119 FPS can be set.

I think I did it, but without building a second arduino project to independently verify the refresh rate of the screen, is it really as simple as changing "3, ILI9341_FRMCTR1, 0x00, 0x18," to something like
"3, ILI9341_FRMCTR3,0x10, 0x10" on line 391 of ILI9341_t3.cpp? Also is there a cleaner way for me to be doing this?

I'm using it for an oscilloscope view on a synth running on a 3.6. I'm drawing the scope line in white, then each frame drawing over just the last scope line's pixels in black, drawing each frame averages 5ms with the synth code running heavily. At 119 FPS each frame is 8.4ms so I should be good. It looks fantastic in person. My synth is a polyphonic procedural graintable type, and my scope triggers off the loudest voice's wavetable buffer index, so it's rock solid even as the graintable morphs while multi-note chords are played.

Also does the SPI max out at 30mhz as is the default setting, or can I get something higher to work with a teensy 3.6? I've tried increasing the SPICLOCK define but my benchmark numbers are unchanged. Got me wanting the parallel interface model.
 
Last edited:
I've never really delved into reading datasheets to know what bits to flip for registers. The datasheet suggests a framerate of 119 FPS can be set.


Also does the SPI max out at 30mhz as is the default setting, or can I get something higher to work with a teensy 3.6? I've tried increasing the SPICLOCK define but my benchmark numbers are unchanged. Got me wanting the parallel interface model.

I recently tried to get graphictest going with teensy3.6, starting with CPU clock at 180, found that not to work. I've been working my way through this long thread but one thing that seems clear from the code is that at the start of SPIFIFO.h there is a section:
Code:
#ifdef KINETISK

#if F_BUS == 120000000
#define HAS_SPIFIFO
#define SPI_CLOCK_24MHz   (SPI_CTAR_PBR(3) | SPI_CTAR_BR(0) | SPI_CTAR_DBR) //(120 / 5) * ((1+1)/2)
#define SPI_CLOCK_16MHz   (SPI_CTAR_PBR(0) | SPI_CTAR_BR(2))                //(120 / 2) * ((1+0)/4) = 15 MHz
#define SPI_CLOCK_12MHz   (SPI_CTAR_PBR(3) | SPI_CTAR_BR(0))                //(120 / 5) * ((1+0)/2)
#define SPI_CLOCK_8MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(4) | SPI_CTAR_DBR) //(120 / 5) * ((1+1)/6)
#define SPI_CLOCK_6MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(2))                //(120 / 5) * ((1+0)/4)
#define SPI_CLOCK_4MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(4)) 		    //(120 / 5) * ((1+0)/6)

#elif F_BUS == 108000000
#define HAS_SPIFIFO
#define SPI_CLOCK_24MHz   (SPI_CTAR_PBR(3) | SPI_CTAR_BR(0) | SPI_CTAR_DBR) //(108 / 5) * ((1+1)/2) = 21.6 MHz
#define SPI_CLOCK_16MHz   (SPI_CTAR_PBR(0) | SPI_CTAR_BR(2))                //(108 / 2) * ((1+0)/4) = 13.5 MHz
#define SPI_CLOCK_12MHz   (SPI_CTAR_PBR(1) | SPI_CTAR_BR(4) | SPI_CTAR_DBR) //(108 / 3) * ((1+1)/6)
#define SPI_CLOCK_8MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(4) | SPI_CTAR_DBR) //(108 / 5) * ((1+1)/6) = 7.2 MHz
#define SPI_CLOCK_6MHz    (SPI_CTAR_PBR(1) | SPI_CTAR_BR(4))                //(108 / 3) * ((1+0)/6) 
#define SPI_CLOCK_4MHz    (SPI_CTAR_PBR(5) | SPI_CTAR_BR(2)) 		    //(108 / 7) * ((1+0)/4) = 3.86 MHz

#elif F_BUS == 96000000
#define HAS_SPIFIFO
#define SPI_CLOCK_24MHz   (SPI_CTAR_PBR(0) | SPI_CTAR_BR(0))                //(96 / 2) * ((1+0)/2)
#define SPI_CLOCK_16MHz   (SPI_CTAR_PBR(0) | SPI_CTAR_BR(4) | SPI_CTAR_DBR) //(96 / 2) * ((1+1)/6)
#define SPI_CLOCK_12MHz   (SPI_CTAR_PBR(0) | SPI_CTAR_BR(6) | SPI_CTAR_DBR) //(96 / 2) * ((1+1)/8)
#define SPI_CLOCK_8MHz    (SPI_CTAR_PBR(1) | SPI_CTAR_BR(2))                //(96 / 3) * ((1+0)/4)
#define SPI_CLOCK_6MHz    (SPI_CTAR_PBR(0) | SPI_CTAR_BR(6))                //(96 / 2) * ((1+0)/8)
#define SPI_CLOCK_4MHz    (SPI_CTAR_PBR(1) | SPI_CTAR_BR(6)) 		    //(96 / 3) * ((1+0)/8)

seems like this needs additional elif to expect 180 to work?
 
another troubling thing I see (and this may be taken care of inside SPI, I'm not sure) is that in ILI9341_t3.cpp, most calls to SPI begin transaction look like this:

SPI.beginTransaction(SPISettings(SPICLOCK, MSBFIRST, SPI_MODE0));

but a few are:

SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));

Now if you assume SPIClock is 3000000 that implies one thing but at points in this thread discussants said they'd gotten better results by reducing SPIClock 3-4 fold; then it implies a faster bus in the second instance.
 
SPIFIFO.h: That's F_BUS not F_CPU.
F_BUS is the frequency of the BUS

Reducing the clock does only help if your wiring is bad.
I have a version where i use 72MHz (SPI!). Works great for me.
 
Last edited:
another troubling thing I see (and this may be taken care of inside SPI, I'm not sure) is that in ILI9341_t3.cpp, most calls to SPI begin transaction look like this:

SPI.beginTransaction(SPISettings(SPICLOCK, MSBFIRST, SPI_MODE0));

but a few are:

SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));

Now if you assume SPIClock is 3000000 that implies one thing but at points in this thread discussants said they'd gotten better results by reducing SPIClock 3-4 fold; then it implies a faster bus in the second instance.
also the cases where you are seeing the: 2000000

Are probably in places like readPixel. This was done as the display may have issues reading data back when the clock is running at too high of a speed. So for reads, the code chooses a slower clock rate.
 
I suspected the 2000000 came from that sort of thinking. seems like it would be better practice to use a symbol, of course.

So far, for me, the display on the teensy 3.6 uploads and runs graphictest with CPU speed 120 but not 180 or 144. Unfortunately as a separate (mac/arduino?) issue I'm getting an error about name for the serial window so my ability to see anything at 180 is close to zero (Ok, I can light LEDs). It looks in Kinetis.h like the bus speed is actually the same at CPU 180 as at 120mHz. code runs at 120mHz when uploaded but not from flash when I unplug and reapply power. I added a delay(2500) at the start of setup(), no joy. the line

ILI9341_t3 tft = ILI9341_t3(TFT_CS, TFT_DC, TFT_RST);

of course runs before setup(), but I don't think that sends anything to the the ILI9341. the longest wire in my bread board hookup is about 4 inches. admittedly the longer t3.6 board means wires will be a little longer than with a t3.2. Obviously things are lightning fast with this library and the incredible horsepower of the t3.6. I'm just puzzled at the issues I see. Is this just my wiring?
 
Not sure, I am running Graphic Test on mine right now at 180mhz... Actually at this moment I am running my own hacked up version of the library. (Different thread)

I do know that some of these displays don't work well if you don't use the rest pin. So I would probably make sure yours is hooked up.

As for Mac and Serial Terminal. What version of Teensyduino are you running. In particular have you upgraded to Beta2? If not you should. Why? There is an issue with the processor over 120mhz, that certain things like getting the Serial number and writing to EEProm fails above 120mhz and as such you get a bogus serial #, which MAC can give you random connection...

Anyway Beta2 has fixes for this!
 
Yes, and as said above, the F_CPU does not matter much. More important is the speed of the SPI.
All my Displays work for faster SPI than 30MHz.

Use a good, short connection.
 
I am on arduino 1.6.12 and the beta2 teensyduino and I do have reset line hooked to pin 7.

it is good to hear you can talk to the display at 180

With CPU 96mHz I did manage to get it to reboot from flash.
 
As with playing around with my SPIN version of the library, Defragster pointed out that there is an issue ( with the graphictest program, that it would end up calling things like: fillRect(-1, 39, 240, 240, color1)

It is bad enough that in this library we pass the x=-1 to the display over SPI, which I am not exactly sure what it would do with it. But in my version when I am playing around with using a logical Frame buffer, this could cause me to access bogus memory.

So I fixed the test and added additional testing into some of the graphic primitives. I then back ported these back to here.

Pull Request 33
 
As with playing around with my SPIN version of the library, Defragster pointed out that there is an issue ( with the graphictest program, that it would end up calling things like: fillRect(-1, 39, 240, 240, color1)

It is bad enough that in this library we pass the x=-1 to the display over SPI, which I am not exactly sure what it would do with it. But in my version when I am playing around with using a logical Frame buffer, this could cause me to access bogus memory.

So I fixed the test and added additional testing into some of the graphic primitives. I then back ported these back to here.

Pull Request 33

Where is this bug in the graphicstest, Kurt ?
 
Hi Frank, there were I believe 3 places, Defragster pointed out two of them. Details are i: https://forum.pjrc.com/threads/3875...-(ILI9341_t3n)?p=121508&viewfull=1#post121508

Example 1: if you look at this function:
Code:
unsigned long testFilledRects(uint16_t color1, uint16_t color2) {
  unsigned long start, t = 0;
  int           n, i, i2,
                cx = tft.width()  / 2 - 1,
                cy = tft.height() / 2 - 1;

  tft.fillScreen(ILI9341_BLACK);
  n = min(tft.width(), tft.height());
  for(i=n; i>0; i-=6) {
    i2    = i / 2;
    start = micros();
    tft.fillRect(cx-i2, cy-i2, i, i, color1);
    t    += micros() - start;
    // Outlines are not included in timing results
    tft.drawRect(cx-i2, cy-i2, i, i, color2);
  }

  return t;
}
Quick sanity test. Assume width=240, height=320
cx = 119, cy=159, n=240, i=240, i2=120 so first fillRect(-1, 39, 240, 240, color1). So the problem is the -1, note the drawRect will be called with the same values.

What I don't understand is wny the testFilledRects hs cx and cy with the -1, but testRects does not. Alternative fix would be to not decrement cx and cy...
 
Frank: Sounds good, looks like we have more work to do as Paul merged in a lot of new functionality into the main ILI9341_t3 library. So I will look to merge the same stuff into my version...

Paul: I am getting a compiler warning with new code: in the fillRectHGradient new function.
in the section:
Code:
		uint16_t color;
		for(x=w; x>1; x--) {
			color = RGB14tocolor565(r,g,b);
			writedata16_cont(color);
			r+=dr;g+=dg; b+=db;
		}
		writedata16_last(color);
It says that on the writedata16_last(color) line that color may be uninitialized. My guess is it will have a value, but it will duplicate the previous pixel... My guess is it needs to duplicate the line: color = RGB14tocolor565(r,g,b) before it... Can do PR if desired.
 
Changed merged in.

Just issued new PR to add all of the new method names to keywords.txt
 
Back
Top