Interested in forking ILI9341_t3 to improve (reduce) some of the SPI transactions

ardnew

Member
For example, any software library based on ILI9341_t3 must either use the self-contained (wrt SPI) functions available like drawPixel(), drawFastVLine(), fillRect(), etc. These functions call startTransmission() and endTransmission() at every invocation (fillRect() calls them many times, ostensibly to be a good neighbor on the SPI bus). These seems rather expensive, particularly with drawPixel().

It would be nice if there were equivalent functions for these primitives that didn't always include the SPI transaction. This way the user application can initiate the transaction manually, call all of the primitives they need, and then end transmission when they're done.

I'm curious if there is another reason (besides bus contention) that these were designed this way and if there would be any reason a pull request with these changes might get rejected. The current Adafruit_ILI9341 code seems to do what I'm proposing as well, but that's certainly not a justification for it. :)
 
This is all open source, so you can obviously fork it, or other variations of it, such as my ili9341_t3n.

But personally, I don't think you will gain much. You have not mentioned which Teensy you wish to do this for, so can not give exact info, but for the T3.x boards, often times the constructors for SPISettings (the thing you pass into SPI.beginTransaction(), will at compile time reduce down to simply being two values, that again depending on which one... Will simply set two registers with those values... Not much overhead.
For T4, a slight bit more as there is the ability to switch which system clock is used by SPI, so it saves some values as well as the baud... And the beginTransaction simply compares a saved speed setting and only if one asks for a different speed does it do anything...

And another reason that SPI.beginTransaction is used, is needing to change something. Example doing write operations to ILI9341 can maybe handle SPI speeds up to 30mhz, where as read operations from the display maybe can only handle 2mhz... (I would have to look up the actual speeds)...


Now the real time user is simply how long does it take to output the information over SPI to the display, and how can you reduce that. Example
Suppose: you wish to set the value of two pixels that are next to each other, the same color.

You could do this, with two calls to drawPixel, or one call to fillRect... What would be the difference in speed?
Note, another thing that impacts is when the state of DC is changed and lesser when CS changes, I will show those that change <>, and I will use short hand for byte values

First with call to fillRect: <DC,CS> CASET <DC> X1L X1H X2L X2H <DC> PASET <DC> Y1L Y1H Y2L Y2H <DC> MW <DC> COLORL COLORH COLORL COLORH <CS>
So this output of 2 pixels outputs something like: 15 bytes to the device with about 7 transisitions of signals CS and DC...

Now with two calls to drawPixel: For each one it is exactly the same as two calls to FillRect except just one pixel. So output for each is 13 bytes so total you output 26 bytes...

Differences in speeds of ili9341_t3 and Adafruit. is in the transition of changing states of the DC pin and to a lesser amount the CS pin, especially on T3.x chips and to some extent T4...

That is the T3.x boards have a FIFO queue on the SPI buss, and this fifo queue allows us to encode the desired state of up to 4 CS pins as part of each push onto the queue. So we can encode the state changes onto the queue, as well as information about types of transfer (8 bits versus 16 bits). What does this mean in usage? We can not change the state of CS or DC until the previous outputs at the previous state have been totally shifted out.

So most software, like Adafruit library will set the state example start of the fillRect. It will assert CS and DC, it will then output the CASET character, and then spin waiting for the status bit to be set that says the transfer has completed, it will then unassert the DC and then output the X1L character, now depending on library, if it simply calls something like SPI.transfer(X1L), it again will wait for that transfer to complete before output X1H... Or it might try SPI.transfer16(X1), which again depending on SPI library, could translated into two two calls or one...

But with ili9341_t3 library - with the FIFO and CS, DC encoded into queue instructions, the hardware will know exactly when a transfer completes, and that the next one has a CS state change, and it updates the signals, with the configured amount of delays and continues to merrily output the next bytes. If the next ones are 16 bits it will output those bits with a shorter gap between the low and high bytes....
So a lot faster.

Why do I mention all of this? Because often times the performance issues are not always what they appear like at the first glance. As I mentioned I doubt you will see any measurable differences, but you can always try.

A simple way might be to create a sub-class of the main class, as a way to get to the protected: members. ...

Good Luck
 
Back
Top