Highly optimized ILI9341 (320x240 TFT color display) library


Thanks to KurtE for adding the setscrollmargins function, this works completely as planned in our TeensyBat project. I am currently looking for an example that uses this library with a framebuffer located in PSRAM.

Looking through the examples I havent seen one (maybe missed it) but I saw the library now has a function:
void	setFrameBuffer(uint16_t *frame_buffer);
Would a standard PSRAM on the T4.1 be fast enough for this purpose ? If nobody has tested this I might dive into that and see what comes up.


KurtE likely to reply in hours ... but it should be worth expecting it to work.

AFAIK - larger displays needing a buffer that fits only in PSRAM have been tested and designed to work.

Seems there was testing or a sample on that ... though too long ago to expect to find it in context ... it may have been on MMod or other ...
KurtE likely to reply in hours ... but it should be worth expecting it to work.

AFAIK - larger displays needing a buffer that fits only in PSRAM have been tested and designed to work.

Seems there was testing or a sample on that ... though too long ago to expect to find it in context ... it may have been on MMod or other ...

It at least seems to work from code without any problems, since I do use scrolling 1pixel lines in the code I will have to find a good way to do that in memory. The recent
addition of setscrollmargins was not yet (as KurtE mentions on Github) tested much and scrolling in memory is another "affair". But I am sure that there can be a good
way to achieve this.

My app includes a filled rect which must move left and right one pixel at a time. The simplistic way to move it is to call fillRect with background color and call it again with new coordinates and foreground color. That is slow and causes flicker. A better way is to redraw only the pixels which must change. Call drawFastVLine for the leading edge in foreground color and call it again for the trailing edge in background color. It's efficient and prevents flicker. I call it differential animation.

I also have a filled circle to move, but that's more complex. You might think to call drawCircleHelper for a semicircle at the trailing edge in background color and call it again for a shifted semicircle in foreground color. But that doesn't quite work because the trailing semicircle removes too many edge pixels in the upper and lower octants. One fix is to draw a shifted full circle in foreground color to restore the edges. That looks good but it really draws about twice as many pixels as needed. An optimal solution is to draw leading and trailing semicircles, but replace the horizontal line segments with only endpoints.

Here's the code, adapted from drawCircleHelper. I also pulled the first segments out of the loop and combined them to cover the left and right edges completely. I hope someone will find it useful.
// Shift a filled circle left or right by 1 pixel. Draws only leading and trailing edge pixels.
// Adapted from ILI9341_t3::drawCircleHelper
void shiftCircle( int16_t x0, int16_t y0, int16_t r, uint16_t lcol, uint16_t rcol) {
  int16_t f = 1 - r;
  int16_t ddFx = 1;
  int16_t ddFy = -2 * r;
  int16_t x = 0;
  int16_t y = r;
  int xold;

  xold = x;
  while (f<0) {
    ddFx += 2;
    f += ddFx;
  } // draw first line segments
  tft.drawFastVLine(x0+y+1, y0-x, x-xold+x-xold+1, rcol);
  tft.drawPixel(x0+x+1, y0+y, rcol);
  tft.drawPixel(x0+x+1, y0-y, rcol);
  tft.drawPixel(x0-x, y0+y, lcol);
  tft.drawPixel(x0-x, y0-y, lcol);
  tft.drawFastVLine(x0-y, y0-x, x-xold+x-xold+1, lcol);
  xold = x;
  while (x<y) {
    if (f >= 0) {
      ddFy += 2;
      f += ddFy;
    ddFx += 2;
    f += ddFx;
    if (f >= 0 || x == y) { // time to draw the next line segments
      tft.drawPixel(x0+x+1, y0+y, rcol);
      tft.drawFastVLine(x0+y+1, y0+xold+1, x-xold, rcol);
      tft.drawPixel(x0+x+1, y0-y, rcol);
      tft.drawFastVLine(x0+y+1, y0-x, x-xold, rcol);
      tft.drawFastVLine(x0-y, y0+xold+1, x-xold, lcol);
      tft.drawPixel(x0-x, y0+y, lcol);
      tft.drawFastVLine(x0-y, y0-x, x-xold, lcol);
      tft.drawPixel(x0-x, y0-y, lcol);
      xold = x;

is this the right thread to talk about KurtE's ILI9341_t3n?

I'll make it very short ;o)
There is a "driver" optimized for the Teensy 4.x that does nothing more than get a framebuffer onto the display as fast as it can, the ILI9341_T4.
There is an example program which is smoothly impressive at 30MHz SPI, "99 Luftballons".

Here the test: https://youtu.be/0VGeh5ThRIw
*It even goes much faster!

I have now ported the program for the ILI9341_t3n and also used 30MHz SPI clock, here the result: https://youtu.be/0e9oGl-4Ht0

What am I doing wrong?
I would like to reach this speed with the ILI9341_t3n.

Here the used test program for ILI9341_t3n:

#include "SPI.h"
#include <ILI9341_t3n.h>

// set the pins: here for SPI0 on Teensy 4.0
// ***  Recall that DC must be on a valid cs pin !!! ***
#define PIN_SCK         13  // (needed) SCK pin for SPI0 on Teensy 4.0
#define PIN_MISO        12  // (needed) MISO pin for SPI0 on Teensy 4.0
#define PIN_MOSI        11  // (needed) MOSI pin for SPI0 on Teensy 4.0
#define PIN_DC          10  // (needed) CS pin for SPI0 on Teensy 4.0
#define PIN_RESET        6  // (needed) any pin can be used 
#define PIN_CS           9  // (needed) any pin can be used
#define PIN_BACKLIGHT    5  // only required if LED pin from screen is connected to Teensy 
#define PIN_TOUCH_IRQ  255  // 255 if touch not connected
#define PIN_TOUCH_CS   255  // 255 if touch not connected

// drawing size in portrait mode
#define LX  240
#define LY  320

/** fill a framebuffer with a given color*/
void clear(uint16_t* fb, uint16_t color = 0) {
    for (int i = 0; i < LX * LY; i++) fb[i] = color;

/** draw a disk centered at (x,y) with radius r and color col on the framebuffer fb */
void drawDisk(uint16_t* fb, double x, double y, double r, uint16_t col) {
    int xmin = (int)(x - r);
    int xmax = (int)(x + r);
    int ymin = (int)(y - r);
    int ymax = (int)(y + r);
    if (xmin < 0) xmin = 0;
    if (xmax >= LX) xmax = LX - 1;
    if (ymin < 0) ymin = 0;
    if (ymax >= LY) ymax = LY - 1;
    const double r2 = r * r;
    for (int j = ymin; j <= ymax; j++) {
        double dy2 = (y - j) * (y - j);
        for (int i = xmin; i <= xmax; i++) {
            const double dx2 = (x - i) * (x - i);
            if (dx2 + dy2 <= r2) fb[i + (j * LX)] = col;

/** return a uniform in [0,1) */
double unif() {
    return random(2147483647) / 2147483647.0;

/** a bouncing ball */
struct Ball {
    double x, y, dirx, diry, r; // position, direction, radius. 
    uint16_t color;

    Ball() {
        r = unif() * 25; // random radius
        x = r; // start at the corner
        y = r; //
        dirx = unif() * 5; // direction and speed are random...
        diry = unif() * 5; // ...but not isotropic !
        color = random(65536); // random color

    void move() {
        // move
        x += dirx;
        y += diry;
        // and bounce against border
        if (x - r < 0) { x = r;  dirx = -dirx; }
        if (y - r < 0) { y = r;  diry = -diry; }
        if (x > LX - r) { x = LX - r;  dirx = -dirx; }
        if (y > LY - r) { y = LY - r;  diry = -diry; }

    void draw(uint16_t* fb) {
        drawDisk(fb, x, y, r, color);

// 99 luftballons
Ball balls[99];

// Instantiate display object.

// Framebuffer
DMAMEM uint16_t fb[LX * LY];

void setup() {

    // make sure backlight is on
    if (PIN_BACKLIGHT != 255) {
        pinMode(PIN_BACKLIGHT, OUTPUT);
        digitalWrite(PIN_BACKLIGHT, HIGH);


void loop() {

    // move and then draw all the balls onto the framebuffer
    for (auto& b : balls) {

Tried your sketch on a T4 and a teensy micro mod and worked with begin(30000000). I did use my arducam config for cs= 10. DC =9. and rat = 8. Still have to try on a t41
Nice, then you can now also try the original demo from here: https://github.com/vindar/ILI9341_T4/tree/main/examples/99luftballons
The question is, how do you get the ILI9341_t3n just as fast?
Please compare the two videos I made of both versions.

Well I just finished testing the T4.1 up to 50Mhz without an issue - at 60Mhz your screen stays white with your example sketch.

As to the second part of your question as to how do you get the ILI9341_t3n just as fast that is a whole different question.

You need to look at what that library is doing and why the author says its optimized for the T4.x only and does not support the T3.x etc. You might be able to realize faster frame rates by only updating the sections of the screen that you are changing or trying double or triple buffering as is done in the library but that will only be good for the t4.1.
Hello, everyone,

I found a way to accelerate ILI9341_t3n with a double buffer, for my purposes. Here are the results of the graphic test in three stages:
----- None buffered -----
Benchmark                Time (microseconds)
Screen fill              205325
Text                     10503
Lines                    70012
Horiz/Vert Lines         17663
Rectangles (outline)     11341
Rectangles (filled)      421561
Circles (filled)         69425
Circles (outline)        58405
Triangles (outline)      16459
Triangles (filled)       148537
Rounded rects (outline)  25366
Rounded rects (filled)   467870

---- Single buffered ----
Benchmark                Time (microseconds)
Screen fill              215476
Text                     43358
Lines                    3196949
Horiz/Vert Lines         1411518
Rectangles (outline)     680040
Rectangles (filled)      2294
Circles (filled)         428258
Circles (outline)        409702
Triangles (outline)      413635
Triangles (filled)       1236
Rounded rects (outline)  665510
Rounded rects (filled)   661642

---- Double buffered ----
Benchmark                Time (microseconds)
Screen fill              176706
Text                     24565
Lines                    293851
Horiz/Vert Lines         324239
Rectangles (outline)     231521
Rectangles (filled)      2099
Circles (filled)         368022
Circles (outline)        332165
Triangles (outline)      120915
Triangles (filled)       1238
Rounded rects (outline)  197219
Rounded rects (filled)   516751

With the following code I was able to achieve significantly higher frame rates in the GameBoy emulator:
    for double buffer
#define SCREEN_WIDTH 240
#define SCREEN_HEIGHT 320
#define DRAW_ON_X 0
#define DRAW_ON_Y 0
#define CHECK_LENGTH 14 // A value between 10 and 14 gives the best results
uint16_t frame[TFT_BUFFERSIZE];
uint16_t dblFrameBuffer[TFT_BUFFERSIZE];

   updateScreen for double buffer
void updateScreen()
  tft.startWrite(); // Small changes to ILI9341_t3n were necessary so that the data could be written continuously

  uint32_t bufferIndex = 0; // Start at index 0

  while (bufferIndex < TFT_BUFFERSIZE)
    uint32_t testIndex = bufferIndex;
    uint32_t unchangedLength = 0;

    // Get changed length
    while (unchangedLength < CHECK_LENGTH && testIndex < TFT_BUFFERSIZE)
      if (frame[testIndex] != dblFrameBuffer[testIndex])
        dblFrameBuffer[testIndex] = frame[testIndex]; //copy new byte
        unchangedLength = 0;                          //reset unchanged length to 0
    uint32_t changedLength = testIndex - bufferIndex - unchangedLength;

    // If something was changed within the CHECK_LENGTH:
    if (changedLength > 0)
      uint16_t currentX = bufferIndex % SCREEN_WIDTH;
      uint32_t changedEndBufferIndex = bufferIndex + changedLength - 1;
      uint16_t changedEndX = changedEndBufferIndex % SCREEN_WIDTH;
      uint16_t changedLines = (currentX + changedLength) / SCREEN_WIDTH + ((currentX + changedLength) % SCREEN_WIDTH != 0 ? 1 : 0);
      uint16_t pxW;
      uint16_t pxH;
      uint16_t ix0;
      uint16_t iy0;
      uint16_t ix1;
      uint16_t iy1;
      uint16_t oneLineLength;

      if (changedLines == 1) // If the changes are within a single line
        //Calculate the necessary data for writing the new line
        oneLineLength = changedLength;
        pxW = changedLength * DRAW_SCALEFACTOR;
        ix0 = currentX * DRAW_SCALEFACTOR;
        iy0 = bufferIndex / SCREEN_WIDTH * DRAW_SCALEFACTOR;
        ix1 = ix0 + pxW - 1;
        iy1 = iy0 + pxH - 1;
      else // If the changes affect more than one line
        uint16_t remainToFullLineEnd = SCREEN_WIDTH - 1 - changedEndX;

        // Copy the remaining bytes of the line in case there is something new, as we will write that too.
        for (uint16_t r = 0; r < remainToFullLineEnd; r++)
          dblFrameBuffer[changedEndBufferIndex + r] = frame[changedEndBufferIndex + r];
        //Calculate the necessary data for writing the new window
        oneLineLength = SCREEN_WIDTH;
        changedLength = changedLength + remainToFullLineEnd + currentX;
        bufferIndex -= currentX; //set bufferIndex to x0
        pxH = changedLines * DRAW_SCALEFACTOR;
        ix0 = 0;
        iy0 = bufferIndex / SCREEN_WIDTH * DRAW_SCALEFACTOR;
        ix1 = pxW - 1;
        iy1 = iy0 + pxH - 1;

      tft.setAddrWindow(DRAW_ON_X + ix0, DRAW_ON_Y + iy0, DRAW_ON_X + ix1, DRAW_ON_Y + iy1, true);

      // Write to SPI
      for (uint16_t iCL = 0; iCL < changedLines; iCL++) // Number of changed lines
        for (uint8_t yScale = 0; yScale < DRAW_SCALEFACTOR; yScale++) // Repeat the line in the Y-direction according to the scale
          for (uint32_t iLL = 0; iLL < oneLineLength; iLL++) // Number of bytes per line
            for (uint8_t xScale = 0; xScale < DRAW_SCALEFACTOR; xScale++) // Repeat writing a pixel according to scale
              tft.pushColor(dblFrameBuffer[bufferIndex + iCL * oneLineLength + iLL], true);
      bufferIndex += changedLength; // Continue with bufferIndex after the changes found
      bufferIndex += CHECK_LENGTH; // No changes within CHECK_LENGTH

@KurtE, @mjs513
I would be happy if there is interest in it and it may be included in your libs! :)
...updated the google-fronts repo. It now contains more than 3300 converted *.ttf


Github reaches a limit here... tried to just upload a *.zip... did not work.. so back to individual directories.. but it shows max 1000 :)
So, only way to use it, is to download the whole repo.

I also extended the font sizes: 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 24, 26, 28, 32, 40, 48, 60, 72, 96

Converting took more than 3 hrs
You're welcome.

Anyone can do it themselves now. Either with Linux - as described by Paul - or with Windows now. I have translated the necessary programs to windows, and written a batch that does everything.


Call of the batch:
tftcovert.cmd filemask [-r] [sizes]

- filemask is a (optional path +) + file(s)
- r recourse subdirectories (optional)
-optional sizes

tftconvert e:\fonts\*.ttf -r "8,9,10"

Converts all *.tff in e:\fonts + subdirectories with sizes 8,9,10

tftconvert e:\fonts\pretty.ttf

Converts pretty.ttf only.

Without the sizes parameter, a default is used.
All three files (both .exe and the batch) must be in the same directory (of your choice)

It was intended for an other library, but hey, it's useful here, too - I hope.

Have fun,
i am using a 240x320 tft display ILI9341. I am using the optimized ILI9341 library. is there any way to turn off the backlight? i noticed there is a Display off defined in the library as 0x28. i dont have a seperate lite pin on my display to control the backlight and can't power the backlight from a teensy digital pin either
My guess is probably not. That is with most of the displays you have a backlight pin, which you can feed it power. You can also hook up circuit to then control brightness using PWM pin... Some like Adafruit I think have this circuit built in.

But without the pin, don't know any way

Edit: out of curiosity which display are you using? Have a link to it? It might have additional information
Last edited:
I don't have a link for the datasheet but its the same one listed in the PRJC website
it does have a LED pin to power the backlight but I believe it needs 3.6 to 5V for normal operation and would draw like 20mA and teensy digital pin can't do that
You are right that you could not connect it directly to an IO pin.

The Adafruit display has stuff on it to allow it's backlight pin to be connected to IO pin... It takes the negative part of the backlight circuit through a transistor.

Earlier I did a board based off of stuff I saw on FrankB's board, using two transitors.
Where at the time (a few years ago Q1/2 were:

But these days don't have clue what things are available and/or others may have easier/better circuits.

In my experience, some «*red ili9341*screens » have a transistor already attached to the LED pin in which case you can directly PWM the backlight from a teensy pin… you can probably check on the PCB if that is the case for your screen.
thanks for the replies!
yes my screen does have that transistor Q1. My screen is the same one that is in the github link that you posted. Is there any command that I can send to the ILI9341 to control this transistor? I don't see a way to directly connect a teensy pin to this transistor
Thanks again
If you have the transistor on the board, then you can directly connect the LED pin from your screen to any digital pin of the Teensy without any additional circuitry. Then, you can PWM the pin to dim the screen as you wish...

With the transistor in place, the LED pin will only draw a few mA of current. However, jut to be on the safe side, you can check that this is the case before connecting it to the Teensy: just wire the VCC and LED pin to +5V and GND to ground and measure the current flowing respectively through VCC and LED with a multi-meter. You should get something like 50mA for VCC but only maybe 5mA for LED...
This is a pretty old thread, so apologies for revisiting it, but it's still linked at at the ILI9341_t3 library on GitHub via the README

I've been digging through the driver code, because seeing it in the wild helps me to understand the manual when I finally encounter it.

Well, I've been looking for the manual and trying to cross reference with the code to find the *right* manual.

This appears to be relevant

And yet, many of the register definitions referred to like SPI_SR_TCF aren't coming up in my search of the PDF.

So my first question is, which manuals are you folks talking about above?

Specifically I'm primarily interested in the Teensy 4.x, but eventually I will be interested in the earlier ones too.

I also am looking for a general overview - a bit more high level, less in the weeds of how the SPI registers work as well as the strange business going on with the DC line in the t3 code.

By strange business, I mean for example how the code for some reason fiddles with a DC connected register every time you start an SPI transaction, whether you're writing a command or data.

Here's why I care. https://honeythecodewitch.com/gfx/wiki/index.md

Basically, htcw_gfx already works with the teensy. I want to make all the SPI code for every driver it supports faster for the teensy.

I can do that by changing two files which will update every single driver, and future drivers.

My current challenge aside from understanding the function of the individual registers (which should be easy once I can find the right manual!) is understanding the relationship between DC and the SPI transactions in your code.
They seem very much tied to one another.

In the code I linked to above, the SPI code is not tied to the DC line driving code, which is handled over the top of the SPI code. I see that creating issues if I try to make it work like Paul's T3 code does. The SPI code cannot manipulate the DC line directly the way my library is structured, but the code that operates the DC line *can* manipulate SPI.

So bottom line, is I was hoping someone could point me to the manual you folks are referring to in the above thread - the one with the registers used in Paul's T3 code,

and any insight on how the Teensy's CPU controls the DC line - I thought it would just be a separate pin you tweak, but it appears like maybe there's intrinsic support for it in the SPI hardware if you tie it to a hardware CS pin? I can't really tell other than from what I've seen in the code, which I don't fully understand without that manual! :)

I also would be interested in the Teensy's DMA SPI capabilities. I was fiddling a bit with the DMA library, and it looks like you can only send in 32kB chunks max. So what I was planning on doing was something like the ESP-IDF does - create some sort of list of 32kB chunks from a larger memory buffer, and kick off a series of DMA transfers, one for each chunk. I'm hoping that's realistic, otherwise I guess 32kB will have to be enough.

Lots of research and effort is clearly required for me to update the code to be optimized for the Teensy, but to my mind this is worth it, because htcw_gfx is my go to these days, and also once I do it in one place, the ILI934x, ST77xx, SSD1306 (SPI), SSD1351 and others will all be optimized in the same manner, so at least that should cover my expenditure.
And yet, many of the register definitions referred to like SPI_SR_TCF aren't coming up in my search of the PDF.

So my first question is, which manuals are you folks talking about above?
So bottom line, is I was hoping someone could point me to the manual you folks are referring to in the above thread - the one with the registers used in Paul's T3 code,

TCF bit is documented on page 1136

you might look at the ILI9341_t3n (github under kurte) library, which has DMA and the like. Also works on different SPI busses.

That code was then replicated in several of the other libraries that ship with Teensyduino, such as:
ILI9488_t3, ST7735_T3 (also does 7789) and a few others that may or may not be part of Teensyduino.

Also note the low level support is different for each of the different Teensy board.
That is the register sets are different as well as some of the nuances of DMA.

For each of the different boards you need to look at the Reference Manual for the board.
The T3.x are somewhat similar to each other, The T4.x and Micromod are all the same processor so they work the same, other than they have different pins and the like.

As for 32K for DMA, I don't restrict the code to 32K but instead I chain the DMA transfers.
Thank you, I'll definitely take a look at that. I did figure out that the procs are different, but right I'm going to support the Teensy 4.1 which I own and can test. Supply problems prevent me from getting previous versions at the moment.