T4, SPI, DMA multiple transactions, MISO and MOSI are tristate and one pin

sicco

Well-known member
Teensy4.1 project has several SPI sensor devices tied to just one Teensy SPI port. The sensor has its output SDO and input SDI on the same sensor pin. So it's bi-directional SDIO for reading data from and writing to the sensors.

Every 10 ms, a timer interrupt triggers successive readout of 4 of those sensors. A readout means CS low, send one byte, and then receive 6 bytes back. and CS high again (it's a 3 axis accelerometer). And that 4 times in a row for the 4 sensors that all have their own CS pin, but share SCK and SDIO. SCK frequency cannot be super high (long wires).

My problem is that as this sensor interrogation happens all in a timer interrupt service routine, I am blocking other tasks while the CPU waits for each of 2*4= 8 SPI transactions to be done. 10% of the time, the CPU is waiting for SPI transactions to complete.

What I want instead is just one trigger that says go and execute this series of SPI transactions as listed below, say:
1. claim SPI port, CS1 low
2. SPI write 1 command byte 'read xyz words'
3. SPI read 6 bytes
4. CS1 high, CS2 low
5. SPI write 1 command byte 'read xyz words'
6. SPI read 6 bytes
7. CS2 high, CS3 low
8. SPI write 1 command byte 'read xyz words'
9. SPI read 6 bytes
10. CS3 high, CS4 low
11. SPI write 1 command byte 'read xyz words'
12. SPI read 6 bytes
13. CS4 high, release claim on SPI port
And all of that without ever polling (in a blocking way) for peripheral (or DMA) status bits that indicate that the LPSPI port is busy clocking bits in or out.

What I want to happen under the hood is that each of these 8 SPI transactions run in sequence, automatically. Without me firing these one by one.
As in I prepare the table with pointers to 8 buffers in RAM, indices for CS pin numbers, #bytes per transaction, and a read/write direction flag and then after one trigger call, the SPI transactions, all 13 steps, just happen, in a none-blocking way.

Do I need to rewrite a new low level SPI driver myself, or can I reuse something precooked from a library?
I looked in the current T4 SPI library sources, it seems to be doing DMA by default (does it indeed?), its interrupt/event based I think, but I cannot find anything yet on modes with bidirectional SDI/SDO in one wire, and there's no clue on how to make it execute a series of transactions to multiple devices with multiple CS pins.

Anyone any ideas, hints?
 
Only SPI I spent time with was for the: ...\hardware\teensy\avr\libraries\XPT2046_Touchscreen\XPT2046_Touchscreen.cpp

It is blocking - but does do stepwise SPI transfer in: void XPT2046_Touchscreen::update()

It might give the needed command process to complete the task.
 
Anyone any ideas, hints?

Before QSPI stood for Quad SPI, it stood for Queued SPI. Most SPI, LPSPI in T4.1 has 16-element command/data queues you can set up, then start, poll for "done" (or get an interrupt), extract the data, restart, etc. You can also have the queue repeat automatically. Command fields include specification of CS, with optional delays between commands that use different CS. The CS have to be the pins associated with the SPI module, as opposed to the way Arduino lets you assign any pin as CS. I don't think you can change clock rate between transactions, but I think you can change mode, CS polarity, and maybe other things. Since you have control of 4 CS pins, you can de-mux on your own to generate up to 16 independent CS. If you look at library source SPI.cpp, you will see some use of the command/data registers, but not in a way that you can do what you're asking. You'll have to get into the manual and do it yourself. I've done it in the past on 683xx and Coldfire, which are not as complex, but write back if you try it and I'll help if I can.
 
Before QSPI stood for Quad SPI, it stood for Queued SPI. Most SPI, LPSPI in T4.1 has 16-element command/data queues you can set up, then start, poll for "done" (or get an interrupt), extract the data, restart, etc. You can also have the queue repeat automatically. Command fields include specification of CS, with optional delays between commands that use different CS. The CS have to be the pins associated with the SPI module, as opposed to the way Arduino lets you assign any pin as CS. I don't think you can change clock rate between transactions, but I think you can change mode, CS polarity, and maybe other things. Since you have control of 4 CS pins, you can de-mux on your own to generate up to 16 independent CS. If you look at library source SPI.cpp, you will see some use of the command/data registers, but not in a way that you can do what you're asking. You'll have to get into the manual and do it yourself. I've done it in the past on 683xx and Coldfire, which are not as complex, but write back if you try it and I'll help if I can.

Thank you Joe. Now I'm curious also about how the default SD card drivers operate. Say if I write a block of data to the Teensy41 SD card, appending something like 512 bytes blockwrite to an ExFAT SDFAT file, is that also in 'blocking' mode? Or is that task pushed to the background while other tasks (that do not yet need access to the same card) can process already? I think this relates also to the thread 'what's the fastest way to write to SD card' that's active today.
 
Thank you Joe. Now I'm curious also about how the default SD card drivers operate. Say if I write a block of data to the Teensy41 SD card, appending something like 512 bytes blockwrite to an ExFAT SDFAT file, is that also in 'blocking' mode? Or is that task pushed to the background while other tasks (that do not yet need access to the same card) can process already? I think this relates also to the thread 'what's the fastest way to write to SD card' that's active today.

Yes, I just posted to that thread again with more info. I'm really only learning about SD now, so if I'm wrong about this, I hope others will jump in with more definitive answers. The comments in SdFat TeensySdioLogger example explain that SD cards have a built-in 512-byte FIFO, so the SdFat driver does all transfers in 512-byte chunks. A 512-byte call to file.write() takes about 5 us, and I don't know whether that covers the entire transfer, or just setup time and then return while the transfer occurs in the background. The important thing to understand about SD is that after the write, the SD will sometimes do wear-leveling or something else, and the card can be unavailable for another write for ~40 ms (could be different depending on tye of card and interface). Based on what I've learned so far, to minimize blocking time on SD writes, you should use file.isBusy() as shown in TeensySdioLogger. If you do other stuff while waiting for file.isBusy() to be false, file.write(buf,512) will always return in about 5 us. That example also shows how to use SdFat's RingBuf class to buffer data for logging during the periods when file.isBusy() is true.
 
Before QSPI stood for Quad SPI, it stood for Queued SPI. Most SPI, LPSPI in T4.1 has 16-element command/data queues you can set up, then start, poll for "done" (or get an interrupt), extract the data, restart, etc. You can also have the queue repeat automatically. Command fields include specification of CS, with optional delays between commands that use different CS. The CS have to be the pins associated with the SPI module, as opposed to the way Arduino lets you assign any pin as CS. I don't think you can change clock rate between transactions, but I think you can change mode, CS polarity, and maybe other things. Since you have control of 4 CS pins, you can de-mux on your own to generate up to 16 independent CS. If you look at library source SPI.cpp, you will see some use of the command/data registers, but not in a way that you can do what you're asking. You'll have to get into the manual and do it yourself. I've done it in the past on 683xx and Coldfire, which are not as complex, but write back if you try it and I'll help if I can.

I got it working, starting from the Teensy library SPI.c version. First I stripped off everything that's not Teensy41 related, then removed all 'is transfer ready polling' code where the code was waiting for LPSPI status bit. And I stripped off all related to interrupts and LPSPI, but I kept the DMAisr. As I still wanted to maintain other 'conventional' SPI library use (like for the SD card, SdFat etc...) I ended up making a new class that I called T4_DMA_SPI. Made any SPI transaction DMA based, also when it's just a 1 byte transfer. Stripped off the SPI CS parts because I have >>3 SPI devices attached to the same SPI port. Plus I cannot afford that CS goes high in the middle of a bidirectional transaction (via the same MISO or MOSI pin first tranactions that first write, and then read -which is two transactions really- without CS going high in between...). So this was not so easy.
My new T4_DMA_SPI class has a FIFO table with up to 128 SPI transactions that can be scheduled in advance, and will be executed automatically after one 'go' command. Without polling. A transactions table entry has byte pointers for source and destination buffers, #bytes, CS pin number, and some extra mode control bits that specify if and when CS goes low and high, and how to toggle output - input mode for the SDIO (pin MOSI = pin MISO) bidirectional way.

So now I can do the SPI IO that was eating up about 20% of the total CPU time in the background, and I can use that 20% for real computational tasks - or save battery because polling for SPI ready 20% of the time is not smart I think. Plus, since the trigger to start the SPI IO was timer isr based, that timer isr at its priority will no longer block for may milliseconds my other isr's at a lower priority.

View attachment T4_DMA_SPI.cppView attachment T4_DMA_SPI.h
 
Wow, good work. So, this new class can coexist with the standard SPI library?

Yes, it co-exists with the default SPI drivers / classes.
A better version attached, now with an example sketch that reads out every 10 ms seven ST MEMS 3D magnetometers, one ST 3D accelerometer and a MPS magnetic encoder IC, all on the same SPI bus, all using just one SDIO wire for MISO and MOSI (aka SDO and SDI).
View attachment Example_T4_DMA_SPI-230312a.zip
 
It wasn't perfect yet. Probably still isn't perfect, but as it is better than last week's version, here's an update of my T4_DMA_SPI code. With example code for various ST and a MPS SPI chip being read out every 10 ms, in a non-blocking fashion.

Using pullups for MISO function pins that might otherwise end up floating.

This version lets you also toggle the MOSI and MISO lines. You do so by giving the CS pin number a negative value. I needed that functionality after I found out the hard way that some of the ST MEMS family chips such as the IIS2DH do not have the option to disable I2C mode. If other users of the shared SDIO line on a shared SPI bus decide to pull down the SDIO line while SCK is high then these ST MEMS ic's interpret that as I2C start condition and thereafter the rest is unpredictable and eventually destructive for any register settings in that ST MEMS chip. The MPS MAQ473 magnetic encoder therefore cannot share the SDIO pin with IIS2DH because as soon as CS goes low it starts driving its MISO to either zero or one depending on what it is reading and outputting. The only workaround then is to find another pin for SDI(O). That other pin can now be the SDO line.
 

Attachments

  • Example_T4_DMA_SPI-230317a.zip
    23.1 KB · Views: 27
Back
Top