Teensy 4 and SPI async: how to "overlap" DMA access?

Hi there, first of all thanks for the amazing work on Teensy: it is a great platform :)

I am communicating with a peripheral device running at an SPI clock speed of around 25 MHz. I am clocking out 192 bits each 10 microseconds, and if I didn't use asynchronous SPI the microprocessor would be busy with SPI communication for around 80% of the time, which makes it impossible for the rest of the application to run. So, I use asynchronous SPI, which works great.

However, between the end of the SPI transfer and the beginning of the EventResponder callback, there is about a 1.5 us "dead time". Where does this come from? Can I reduce it in some way? Even when using double buffers, the SPI transfer function seems to return false around half of the time (probably on `_dma_state == DMAState::active`), which makes it unable to "overlap" SPI transfers.

I assume the DMA controller used in aynchronous SPI can only handle a single transfer at a time. Is there a way around this?

Relevant parts of the code, which is very much pseudo code:
C++:
EventResponder m_responder;

DMAMEM std::array<std::array<std::byte, ...>, 2> dma_double_buffer;
auto* m_empty_buffer = dma_double_buffer[0];
auto* m_filled_buffer = dma_double_buffer[1];

Buffer* buffer; // in external RAM

void startup_code() {
    m_responder.attachImmediate([](EventResponder& er) {
        buffer->push(m_filled_buffer->data());
        
        // end timing
    });
}

void function_running_at_100khz() {
    // start timing
    
    if (!SPI.transfer(nullptr, m_empty_buffer->data(), m_frame_size_bytes, m_responder)) {
        // transfer failed
    }

    std::swap(m_empty_buffer, m_filled_buffer);
}

From start to end timing, this takes 10.4 microsecond, hence the "overlap" when calling the function at 100 kHz
 
You may want to look at the replaceSettingsOnCompletion() method provided by the DMA library. This allows you to set up a new set of values for the TCD (transfer control descriptor) which is automatically loaded when the channel's Major Loop completes. If you also interrupt at that point (interruptAtCompletion()*), and have two descriptors which load each other on completion, then you can have continuous transfers, and process whichever buffer isn't in use while the other one is active servicing SPI requests.

*no, I have no idea why one is "At" and the other is "On"...
 
Thanks for your input. Do I understand correctly that the implementation should be like this:
  • Reimplement SPI.transfer(tx, rx, count, er&) where:
  • _dmaRX and _dmaTX both alternate between two DMASettings using replaceSettingsOnCompletion()?
I found this code as inspiration. Would that be the right way to go?
 
That sounds about right.

You may get some further inspiration from this (link to Teensy 4.x-specific code - explore from there). It's actually for updating a SPI display where the whole frame buffer can't be sent in one DMA transaction, so in this case not time-critical, but I'm fairly sure the SPI FIFO keeps the flow going during the TCD re-load. There's an option for continuous screen updates, too.

You may find you'll need to look in some detail at the DMA section of the Reference Manual to understand what the DMA library is doing, and check if you need to do some low-level register tweaking to get the exact performance you need.
 
Alternatively: use one large array instead of two separate ones with a single DMA TCD configured to interrupt at both halfway and completion (assuming the transfers are always of a fixed size / m_frame_size_byes is const).
Also if you're only receiving data the SPI hardware can be configured to not transmit (by using TXMSK in the TCR) so that TDR doesn't need to be fed dummy data. Keep in mind that will tri-state the SDO pin though, if the slave device accepts commands you may want to apply pulldown or pullups so the line doesn't float.
 
Back
Top