SPI - do we have to use _cont and _last?

Projectitis · Jun 14, 2022

Hi all,

I've been writing a bunch of cross-platform SPI code, and it's a bit of a pain to have to implement _cont and _last for KINETISK based hardware. It makes the code more complex, and (depending on the situation) often adds an additional compare within a loop.

It also means that in my abstraction layer for non-KINETISK hardware I still have to have a _cont and _last method (e.g. writeData8_cont and writeData8_last) even though they both map to the same underlying method.
My code end up looking like this:

Code:

        /**
         * @brief Write an SPI command
         */
        ALWAYS_INLINE void writeCommand(uint8_t c) {
            if (_dc != UNUSED_PIN) digitalWrite(_dc, LOW);
            _spi->write(c);
            if (_dc != UNUSED_PIN) digitalWrite(_dc, HIGH);
        }

        ALWAYS_INLINE void writeCommand_last(uint8_t c) {
            writeCommand(c); // Just map to writeCommand
        }

        /**
         * @brief Write SPI data
         */
        ALWAYS_INLINE void writeData8(uint8_t d) {
            _spi->write(d);
        }

        ALWAYS_INLINE void writeData8_last(uint8_t d) {
            writeData8(d); // Just map to writeData8
        }

        ALWAYS_INLINE void writeData16(uint16_t d) {
            _spi->write16(d);
        }

        ALWAYS_INLINE void writeData16_last(uint16_t d) {
            writeData16(d); // Just map to writeData16
        }

My first question is, can I just use the _cont method the entire time, and then write at the end call _last with NOOP operation (e.g. most displays have a NOOP).

Code:

while (looping_data) writeData8(d);
writeCommand_last(NOOP);

Second question is, is there a way to use SPIClass only, without KINETISK-specific code, and simplify the implementation to make it more cross-platform compatible?

For example, do I need the KINETISK code as below, or can I just use _spi->write(c); or _spi->transfer(d); on Teensy?

Code:

#if defined(KINETISK)
        ALWAYS_INLINE void writeCommand(uint8_t c) {
            KINETISK_SPI0.PUSHR = c | (_pcs_command << 16) | SPI_PUSHR_CTAS(0) | SPI_PUSHR_CONT;
            waitFifoNotFull();
        }
        ALWAYS_INLINE void writeCommand_last(uint8_t c) {
            uint32_t mcr = SPI0_MCR;
            KINETISK_SPI0.PUSHR = c | (_pcs_command << 16) | SPI_PUSHR_CTAS(0) | SPI_PUSHR_EOQ;
            waitTransmitComplete(mcr);
        }
        // ... etc

Cheers!
Peter

KurtE · Jun 14, 2022

Simple answer is no, you don't have to use it... And Yes you can use SPI library only...

That is you can simply use SPI library and do transfers.

Again I have no idea which code this is. But more or less looks like code in ILI9341_t3 library.

You could instead simply use Adafruit_ILI9341 library which does it using plain SPI calls.

However there is a trade off. That is without using the hardware specific stuff,
That is instead of doing things like the above you use digitalWrite to set the state of DC and CS pins. and then use one or more forms of the SPI.transfer, but unless you use the absolute basic transfer functions, this will
change depending on which board you are using. What do you loose? Speed/SPI throughput.

Especially with T3.x, you loose a lot of the benefits within chip. For example if you do something like drawPixel(x, y, color)
When you unwind it it does something like:

Code:

writecommand_cont(ILI9341_CASET); // Column addr set
writedata16_cont(x0);   // XSTART
writedata16_cont(x1);   // XEND
writecommand_cont(ILI9341_PASET); // Row addr set
writedata16_cont(y0);   // YSTART
writedata16_cont(y1);   // YEND
writecommand_cont(ILI9341_RAMWR);
writedata16_last(color);

With the above each time you transition between command and data (or data and command), you need to make sure that
everything has transferred that is the command or is the data before you change the DC state.

What the T3.x code can do, is assuming at least DC pin is on a hardware CS pin. The state of this pin can be encoded into the 32 bit output to the PUSHR, and as long
as the transfer register and FIFO queue is not empty, the SPI buss will continue to do the output. The hardware itself manages when to change the state of the DC pin. You can actually encode up to 4 CS pins into the pushR.

The T4.x is different. As The FIFO push register does not combine both DATA and state information into the same register write. There is two different registers, one to push Data and another to push state information like the CS status. Which has pros and cons. Pro is you can transfer 32 bits of data. Cons is you have to output to the two different registers when needed. Also the CS mechanism is different. As I mentioned the T3.x boards, the upper word you can set the state of up to 4 CS pins. SO we can for example encode both CS and DC pins. Whereas the T4.x, there boards the CS information you specify which CS pin is to be used (one, which you can change)

Note: The _cont _last has a second meaning within these transfers, on if the hardware should keep the CS pins asserted after the transfer has completed.

Now if you do it with out using the Kinetisk stuff, this would turn into:

Code:

digitalWrite(DC, LOW)
SPI.transfer(ILI9341_CASET); // Column addr set
digitalWrite(DC, HIGH);

SPI.transfer16(x0);   // XSTART
SPI.transfer16(x1);   // XEND

digitalWrite(DC, LOW);
SPI.transfer(ILI9341_PASET); // Row addr set
digitalWrite(DC, HIGH);

SPI.transfer16(y0);   // YSTART
SPI.transfer16(y1);   // YEND

digitalWrite(DC, LOW);
SPI.transfer(ILI9341_RAMWR);
digitalWrite(DC, HIGH);
SPI.transfer16(color);

During each of these transitions, where we call digitalWrite, the code has to wait until the previous transfer has totally completed, which leaves gaps in the usage of the SPI buss. Often times the gaps can be bigger than the time spent actually transferring data. Note: All of these normal calls to transfer or transfer16 wait until the data has been sent out the MOSI pin and the data has been received back on the MISO pin before it returns.

Hope some of this makes sense.

Projectitis · Jun 14, 2022

Awesome, thanks KurtE. I'll take some time to digest this, as it's obvious there is more I need to learn about SPI.
For example, why is there a relationship between the DC and the CS pin? "...assuming at least DC pin is on a hardware CS pin..."

You are correct that my base code has been taken from the ILI9341 library(s) of yourself, Paul and Frank.
I require a very minimal implementation of a TFT driver. So far I have implemented ILI9341 and ST7789 on both Teensy and ESP32. All my library needs to do is:
1) Initialise the display - once
2) Set a target area and write sequential pixels to it - many times

That is all, I do not need anything else

So at least from that point of view there is less work for me to do!
However I want to support as many displays and as many micros as possible.

Projectitis · Jun 14, 2022

Ok, I believe I understand.
T3.x has several pins designated as hardware CS pins, and each of these can all be encoded to PUSHR.
And it's possible to use a hardware CS pin as the DC pin.

Just thinking ahead about this - if you had enough hardware CS pins, could a similar technique be used to achieve fast parallel transfers?

PaulStoffregen · Jun 15, 2022

Projectitis said:
if you had enough hardware CS pins, could a similar technique be used to achieve fast parallel transfers?

No, not fast.

Each CS pin remains at the same logic level throughout a SPI transfer of 8 or 16 or more bits. So even if you hypothetically had many of these hardware controlled CS pins (the most any Teensy can control is 5, and Teensy 4.x only can control 1 at a time per SPI port), the best case scenario for a scheme to use them as parallel data output would achieve a bit rate (per signal) of 1/8th of the SPI clock speed.

It might be possible, but it definitely wouldn't be fast.

Projectitis · Jun 15, 2022

Thanks Paul!
I'm guessing there is hardware support for parallel data transfers a different way?
I'll start investigating that once I have SPI working properly

KurtE · Jun 15, 2022

Sorry not sure what all of your needs are...

I thought from start of this thread, you simply wanted support for a display or two that was easy to support, and you did not have major performance requirements for the display.
And then 2nd post that you needed support on ESP32 as well...

So one option you have is to use the Adafruit display drivers.
Like Adafruit_ILI9341 and Adafruit_ST789 which they have support for Teensy, ESP32, ESP... (I think)

As for Parallel pins usage for things, again another topic.

Projectitis · Jun 15, 2022

Hey KurtE - nothing to be sorry about. Thanks for your answers - I am one step closer to my goal now

One of my objectives is to not have any external dependencies - but I have been using Adafruit's drivers as reference for my own, as well as many others (such as Bodmer).
So far my only dependency is the SPI library, but I am happy with that as it's part of the standard Arduino libraries.
I just wanted to get some more info about the Kinetisk way of doing things, and you've been very helpful!

And yes, parallel pins is another topic that I will no-doubt be back to discuss in a few months.

SPI - do we have to use _cont and _last?

Projectitis

Well-known member

KurtE

Senior Member+

Projectitis

Well-known member

Projectitis

Well-known member

PaulStoffregen

Well-known member

Projectitis

Well-known member

KurtE

Senior Member+

Projectitis

Well-known member