Request for comments about implementation of missing SPI functionalities

MickMad · Feb 14, 2017

Hi guys,

I recently had to fiddle with the SPI module on the Teensy 3.6 to be able to interface it in the best possible way with precision ADCs and DACs (600KHz-ish sample rate) and I was wondering if it could be useful to add the following features on the standard Teensy SPI library for anyone to use or if these "advanced" features should just be left out of the public API so that any advanced user can add them on a per-use basis.

Hardware Chip Selects, using CSs to multiplex devices
The first thing is the use of the hardware chip select lines; in particular, there is a REALLY cool feature for SPI0 module (at least on the K66) that goes like this:
SPI0 has 6 PCS lines, PCS[5:0], that can be used as 6 CS lines to (guess what) select up to 6 devices on the same SPI0 bus, BUT! The PCS5 signal can act as a strobe for an external demultiplexer, that is you can connect a 32 bit demultiplexer's control lines to PCS[4:0] and use the PCS5 signal to strobe the update of the demux. This allows for control of up to 32 devices on the same SPI0 module, rather than only 6. I think this feature could come in really handy.
I am also studying a way to implement this on the SPI1 and SPI2 modules (using external glue logic), but that's out of the scope of this thread.

PCS to SCK Delay, After SCK Delay, Delay after Transfer
These three features right here are a time-changer (pun intended because delay). These ADCs and DACs I am using need a specific CS timing sequence, in particular you usually need to keep CS high until the conversion is done (input is sampled or output is settled), then pull CS down, then wait some time before starting clocking data in or out, then stop clocking, then you need to wait some more time before reasserting CS and repeating the process. When I first tried to use these, I had to use digitalWriteFast for the CS lines and nops to wait time with nanoseconds precision, then calling SPI.transfer16(), so I was basically wasting all CPU time just for timings and transferring data. BUT! with the proper settings for these three delay times (they depend on the use of hardware CS lines btw) I was able to get data in and out using DMA with no issues at all. These could come really handy for similar applications that require nanoseconds precision timings.

Please note that all these settings reside in the CTAR register, so maybe the appropriate place for these functions would be the SPISettings class.
Also note that the delay times are calculated in the following way:

delay = F_BUS x prescaler x scaler

The values of prescaler are 1-3-5-7 and the scaler values go from 2 to 65536.

What do you guys think?

KurtE · Feb 14, 2017

To me, maybe: But not sure about how to add here as the usage of using the SPI.transfer mode is often pretty different than when you wish to optimize usage of the SPI hardware.

That is when you do something like:
SPI.transfer16(100);
SPI.transfer16(200);

The code will logically put the 100 on the SPI queue and wait until it completely is sent such that it can get the response back and then it will put the 200 on and repeat. So you more or less are not making use of the SPI queues and the like. To make use of the queues, you need to understand more about the size differences and knowing when full and the like. When in that mode you are using the pushr and popr registers and looking at status...

Also when you are wanting to have the SPI handle the CS pins, you need to set the corresponding bit(s) in the pushr register and to properly do it you need to understand difference between pushing a byte/word and having it say to continue with the CS and last, which says to release the CS bit(s) after the operation...

Slightly side comment: For my own usage I created my own SPI class, such that I could modify code (example my ili9341_t3n) to be able to use different SPI buses with only needing to tell the constructor which buss to use... As part of this I made all of the functions virtual and added some stuff. Like while the current SPI class has pinIsChipSelect to first validate a chip select pin it also has setCS to use it. We had setMOSI, setMISO and setSCK to set them, but no verify functions, so my class has the verify functions.

But in addition it has functions to help with maintaining the queues.... like waitFifoEmpty....

As for your additions you would probably need to maybe add some of the stuff to SPISettings, such that you could use it with:
SPI.beginTransaction().... As maybe you have multiple devices with different requirements.
You probably then maybe need some new transfer statements, where you can pass in the CS channel mask and if this is the last transfer. You may also want to somehow separate out different types of transfers (Read, Write, Transfer). And maybe you want to start up some of the operations somehow ASYNC, that is to somehow make usage of the queues...

Not sure if this fully answers your questions. It probably just raised more questions...

MickMad · Feb 14, 2017

KurtE said:
To me, maybe: But not sure about how to add here as the usage of using the SPI.transfer mode is often pretty different than when you wish to optimize usage of the SPI hardware.

That is when you do something like:
SPI.transfer16(100);
SPI.transfer16(200);

The code will logically put the 100 on the SPI queue and wait until it completely is sent such that it can get the response back and then it will put the 200 on and repeat. So you more or less are not making use of the SPI queues and the like. To make use of the queues, you need to understand more about the size differences and knowing when full and the like. When in that mode you are using the pushr and popr registers and looking at status...

Also when you are wanting to have the SPI handle the CS pins, you need to set the corresponding bit(s) in the pushr register and to properly do it you need to understand difference between pushing a byte/word and having it say to continue with the CS and last, which says to release the CS bit(s) after the operation...

Slightly side comment: For my own usage I created my own SPI class, such that I could modify code (example my ili9341_t3n) to be able to use different SPI buses with only needing to tell the constructor which buss to use... As part of this I made all of the functions virtual and added some stuff. Like while the current SPI class has pinIsChipSelect to first validate a chip select pin it also has setCS to use it. We had setMOSI, setMISO and setSCK to set them, but no verify functions, so my class has the verify functions.

But in addition it has functions to help with maintaining the queues.... like waitFifoEmpty....

As for your additions you would probably need to maybe add some of the stuff to SPISettings, such that you could use it with:
SPI.beginTransaction().... As maybe you have multiple devices with different requirements.
You probably then maybe need some new transfer statements, where you can pass in the CS channel mask and if this is the last transfer. You may also want to somehow separate out different types of transfers (Read, Write, Transfer). And maybe you want to start up some of the operations somehow ASYNC, that is to somehow make usage of the queues...

Not sure if this fully answers your questions. It probably just raised more questions...

I might not have made myself clear: I got all the delays settings working on a custom library. So it's cool if more questions arise because I'm not looking for specific answers anyway

I'd like to investigate if these features are needed or not in the public API in the first place, and if these should be indeed added then secondly I'd like to discuss eventual nomenclature and so on.

Insight:
To drive my ADC/DAC board, I am basically using a stripped off version of the SPI library: it ignores transactions (my devices are constantly streaming) altogether and uses two DMA channels to constantly read from the ADC and write to the DAC

It works like this:
When I initialize the SPI module I set one CTAR register with my required timings, polarity, SPI mode, etc., I setup two DMA channels, I set the command word in the PUSHR register to point to my newly set CTAR register and to use the PCS line I need, then I enable DMA RX and TX FIFO requests and from then the DMA takes full control over the SPI module.

I only need to do this once because, as I said, I need constant streaming of data. To stop the data stream, I just have to disable the DMA FIFO requests.

I think that the command word in the PUSHR register could be written easily at the end of the beginTransaction() function.

KurtE · Feb 14, 2017

I don't know how well this applies to your case, but have you taken a look at the DmaSpi library? (https://github.com/crteensy/DmaSpi)

MickMad · Feb 15, 2017

KurtE said:
I don't know how well this applies to your case, but have you taken a look at the DmaSpi library? (https://github.com/crteensy/DmaSpi)

Yes, and I had it working too, by making a custom CS class to handle the CS timings, but it was slower than using the normal SPI library, because I had to use 1 word DMA transfers and the overhead imposed by calling the CS object was higher. The DMASPI library works perfectly when you need a no-delay chip select (on any pin you want), and a continuous transfer (that is, hold CS asserted until all the words are transferred); as far as I know, no SPI library implements the delays available in the module.

PaulStoffregen · Feb 17, 2017

MickMad said:
I was wondering if it could be useful to add the following features on the standard Teensy SPI library for anyone to use or if these "advanced" features should just be left out of the public API so that any advanced user can add them on a per-use basis.

The SPI library aims for Arduino compatibility. Extra features and higher performance are always nice, but compatibility is the overriding design goal.

I'm not sure how these sorts of extensions could be done without creating unexpected compatibility issues when one library uses them, and then an unsuspecting Arduino user pairs that library together with other SPI-based libraries. Projects using more than one SPI device are very common, which is I put so much time into getting Arduino to switch to the transactions API. Any changes to the SPI lib need to be very carefully considered for all cases where people using Arduino can depend upon SPI to *not* burden them with having to worry about subtle compatibility issues on the software side.

Request for comments about implementation of missing SPI functionalities

MickMad

Well-known member

KurtE

Senior Member+

MickMad

Well-known member

KurtE

Senior Member+

MickMad

Well-known member

PaulStoffregen

Well-known member