T4, SPI, DMA multiple transactions, MISO and MOSI are tristate and one pin

Rezo,

I see you've changed the buffer size to 27. It was 512 in the example. I think it must be a power of 2, and likely it must be 512 (2^9) for the
T4_DMA_SPI_SLAVE class to work as intended.

This is due to the way the DMA works, with auto incrementing address pointers and masking the last N bits (done inside the IMXRT1062 DMA hardware). If it's not that then it's due to the way the circular buffers are defined in the T4_DMA_SPI_SLAVE code. I see some & 0x1ff in there, so you bet that's assuming the buffer is 512 bytes long...

Details on that in i.MX RT1060 Processor Reference Manual, Rev. 3, 07/2021 , section eDMA.

By using a buffer of only 27 bytes, you likely had the DMA target address going above those 27 bytes, so overwriting whatever program variables reside there. With a crash as result...
 
I see you've changed the buffer size to 27. It was 512 in the example. I think it must be a power of 2, and likely it must be 512 (2^9) for the
T4_DMA_SPI_SLAVE class to work as intended.
This make so much sense now that you mentioned it!

I set the buffer size to 32 and now it works as expected!!
Code:
n=0, [>27] <4d <03 <03 <03 <03 <51 <52 <53 <54 <55 <56 <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 est. f_SCK = 9.39 MHz,
n=1, [>27] <52 <53 <54 <55 <56 <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c est. f_SCK = 9.39 MHz,
n=2, [>27] <53 <54 <55 <56 <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d est. f_SCK = 9.82 MHz,
n=3, [>27] <54 <55 <56 <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e est. f_SCK = 9.82 MHz,
n=4, [>27] <55 <56 <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e <6f est. f_SCK = 9.82 MHz,
n=5, [>27] <56 <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e <6f <70 est. f_SCK = 9.82 MHz,
n=6, [>27] <57 <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e <6f <70 <71 est. f_SCK = 9.82 MHz,
n=7, [>27] <58 <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e <6f <70 <71 <72 est. f_SCK = 9.82 MHz,
n=8, [>27] <59 <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e <6f <70 <71 <72 <73 est. f_SCK = 9.82 MHz,
n=9, [>27] <5a <5b <5c <5d <5e <5f <60 <61 <62 <63 <64 <65 <66 <67 <68 <69 <6a <6b <6c <6d <6e <6f <70 <71 <72 <73 <74 est. f_SCK = 9.82 MHz,

Thanks for the support!!

EDIT: I can receive fine on the slave, but if I set the value of TX buffer in the setup function, the Rx buffer data goes mad and the data printed out is erratic
Also, on the Master side, all I get is 0xFF values

EDIT 2: so just to be clear - the buffer needs to be at least 512 bytes long?
 
Last edited:
@sicco still not having luck with sending back data from the slave.

I’ve set the master to send 32 bytes at a time, and the slave to receive 32 bytes into the buffer

But as soon as the TX buffer on the slave has values set, the whole thing goes mad.
 
@sicco I tried both 32 byte buffer and 512 - did not work well on either.
Here is my Salve side sketch

C++:
#include "T4_DMA_SPI_SLAVE.h"

volatile int n = 0;

#define BYTE_BUF_SIZE 512

#define MOSI_SNIFFER SPI_SLAVE


uint8_t TXbuf[BYTE_BUF_SIZE];
uint8_t RXbuf[BYTE_BUF_SIZE];

int Nbytes = BYTE_BUF_SIZE;

void setup()
{
  Serial.begin(115200);
  while (!Serial);

//  Serial.print(CrashReport);
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  Serial.println("T4_DMA_SPI_SLAVE Example - 4 wire SPI bus MODE0 Slave ");
 
  for (int i=0; i<BYTE_BUF_SIZE;i++){ //Set TXbuf with some static values
    TXbuf[i] = i;
  }

  MOSI_SNIFFER.begin(SPI_MODE0, MSBFIRST, MODE_4WIRE);
  MOSI_SNIFFER.print_pin_use();
  //MOSI_SNIFFER.debug();

  prepare_for_next_transaction();

  Serial.printf("begin done\n");
}

void prepare_for_next_transaction()
{

  MOSI_SNIFFER.prepare_for_slave_transfer(TXbuf, RXbuf, Nbytes);

}

void loop()
{

  if (MOSI_SNIFFER.CS_went_high)
  {
    Serial.printf ("\nn=%d, ", n++);
    Serial.printf ("[>%d ", MOSI_SNIFFER.bytes_already_received);
    for (uint32_t i=0; i<MOSI_SNIFFER.bytes_already_received; i++)
    {
        Serial.printf ("<%02x ", RXbuf[i]);
    }

    Serial.printf ("est. f_SCK = %1.2f MHz, ", MOSI_SNIFFER.f_SCK_estimate());

      
    prepare_for_next_transaction();
  }

}

And here is my Master side sketch
C++:
#include <SPI.h>

// SPI configuration
#define SPI_CLOCK 1000000 // 1 MHz
#define SPI_CS_PIN 10     // Chip Select Pin

EventResponder spiEventResponder; // EventResponder for async transfer
#define BUFFER_SIZE 32
uint8_t txBuffer[BUFFER_SIZE] = {0};       // Data to send
uint8_t rxBuffer[BUFFER_SIZE] = {0};       // Buffer for received data

volatile bool transferInProgress = false; // Tracks transfer status

void setup() {
    Serial.begin(115200);

    // Initialize SPI
    SPI.begin();
    SPI.beginTransaction(SPISettings(SPI_CLOCK, MSBFIRST, SPI_MODE0));

    pinMode(SPI_CS_PIN, OUTPUT);
    digitalWrite(SPI_CS_PIN, HIGH); // Deselect device

    // Configure EventResponder callback
    spiEventResponder.attachImmediate(spiCompleteCallback);

    // Fill txBuffer with initial data
    for (int i = 0; i < BUFFER_SIZE; i++) {
        txBuffer[i] = i;
    }

    // Start the first SPI transfer
    startSPITransfer();
}

void loop() {
    // Main loop can handle other tasks
    delay(100); // Simulate other processing
}

// Start a new SPI transfer
void startSPITransfer() {
    if (!transferInProgress) {
        transferInProgress = true;

        // Select the device
        digitalWrite(SPI_CS_PIN, LOW);

        // Start async SPI transfer
        bool success = SPI.transfer(txBuffer, rxBuffer, sizeof(txBuffer), spiEventResponder);
        if (!success) {
            Serial.println("Failed to start SPI transfer!");
            transferInProgress = false;
            digitalWrite(SPI_CS_PIN, HIGH); // Deselect device on failure
        }
    }
}

// SPI transfer complete callback
void spiCompleteCallback(EventResponder &event) {
    // Deselect the device
    digitalWrite(SPI_CS_PIN, HIGH);

    // Print received data for debugging
    Serial.println("SPI transfer completed!");
    for (size_t i = 0; i < BUFFER_SIZE; i++) {
        Serial.print("Received byte ");
        Serial.print(i);
        Serial.print(": 0x");
        Serial.println(rxBuffer[i], HEX);
    }

    // Modify txBuffer if necessary
    for (int i = 0; i < BUFFER_SIZE; i++) {
        txBuffer[i]++; // Example: Increment data for the next transfer
    }

    // Reset the transferInProgress flag
    transferInProgress = false;

    // Start the next transfer
    startSPITransfer();
}
 
Rezo, you now have a payload of 512 bytes. Your application i think needs 27 byte payloads? So why not leave Nbytes at 27?
I’ve not had opportunity today to replicate your dual Teensy4.x setup and test.
Can you share a photo that shows what pins are the 4 SPI wires plus ground wire? This just to reassure master T4 MOSI and MISO link to slave “MISO“ and “MOSI”?
 
I assumed nbytes had to be the same as the buffer size. I will change that and try again.

On both the T4 and T4.1 I am using pins 13 for CLK, pins 11/12 for MISO/MOSI, pin 10 for CS.
I am grounding both together via the ground pin next to pin 0.
 
I tested with nbytes == 27 and get the same odd behavior if I preset any values into the TX buffer
If I don't set anything in the TX buffer I receive the data fine.
 
Can you share a photo that shows what pins are the 4 SPI wires plus ground wire? This just to reassure master T4 MOSI and MISO link to slave “MISO“ and “MOSI”?
Asking again, because you may have wired master SPI pins to exactly the same pins on the slave. And then it will not work because you wire two outputs and two inputs together, instead of outputs to inputs.
 
Sure!
Sorry for the messy wiring but here it is
90fa2084-5e7c-4247-8811-e8a94c133aa9.jpeg

MOSI to MISO
MISO to MOSI

If I swap them around I get nothing on the slave output
 
I managed to reproduce. Sometimes. There's something unexpected happening. It works only if I put just a bit of capacitive or resistive load on the SPI CS pin on the slave T4...

What I think is happening is that due to poor ground plane practices, there's crosstalk from SCK onto the CS line. That makes the SPI slave hardware decide prematurely that CS went high. I tried workarounds playing with hysteresis, SCK pin PAD drive strength and speed settings, but non of that worked.

Bit puzzled about how sensitive this is. Looks like the on board SPI peripheral hardware overrules pin PAD settings...
 
Only a true decent ground plane, or a very short GND wire in between the two Teensies makes it work reliably here on my bench.
Very short means < 5 cm.
Without that, I get spikes on the SPI CS line. Where that line hits the slave SPI CS input pin. Spikes that are the worst when SCK and MOSI toggle at the same time.
In this SPI slave implementation, the SPI CS line is not a GPIO pin controlled/sensed in software but part of the LPSPI peripheral hardware. That hardware senses a rising edge, probably bypassing the IO pin PAD options for hysteresis, bandwidth etc. That may cause more of the trouble we now see.
Similar effects will impact the SPI SCK signal, leading to false extra clock pulses seen by the slave.
So ground plane, or ground wire no longer than a few inches.
 
And were you able to set the TX buffer on the slave and receive it back on the master?
 
Yes, but there are a few catches. Code that I had working for T4 master and T4 slave attached.
(working as long as GND interconnect is short enough)

Other issues were in the way you let the master immediately trigger a next transaction. With less than a microsecond in between CS going high and low again, there was not enough time to let the master do print out of the payloads. So I made the master SPI transactions timer triggered, with a 1 ms timer.

On the slave side, similar issues. Fixed that by adding an auto_restart option so that it's ready for the next 27 bytes transaction immediately, without needing a retrigger from the loop().

Be aware that the 27 bytes buffers are not atomic. I trust your application is ok with that (i.e not passing things like int, double, float etc in the payload packets).

Maybe a better implementation for background 27 byte buffer data exchanging would be simply use serial UARTs, say 1.5 MBaud, in DMA mode: fewer wires, and less susceptible to GND wire issues as with SPI. Just stream the payloads both ways, with a coding protocol that does indisputable first payload byte detection.



1737712477861.png
 

Attachments

  • T4_DMA_SPI_SLAVE_Example_TX.zip
    9 KB · Views: 16
  • T4_SPI_Master_DMA_SLAVE_Tester.zip
    1.8 KB · Views: 16
Tested it out - works with an extra group in between the two Teensies
But was sensitive to being powered from the same source

Now, I will test with Master being a custom Teensy Dev Board v5 and the Slave a T4.1

I’ll also go 32 byte payload to round things up

Thanks!
 
Done more testing, but unfortunately I can't get it to work too well on my bench using SPI instance on both devices.
The Slave (T4.1) looses sync from the master (DB5) after a while.

I moved the slave to use SPI1, and it's MUCH more stable on the T4.1 - will let it run for a while and see it it looses sync at any point.
 
The slave likely gets 'out of sync' easily now because the SPI CS input gets more or less ignored. The code I shared just does 27 bytes for a transaction, but it no longer looks at CS rising edge. If the SPI clock fails to make it occasionally or if it gets you false double edges dur to ringing, then the next 27 bytes may end up shifted up or down in the buffer...
As mentioned last week, just using asynchronous serial over the LPUART, with a packet exchange protocol running in the background, DMA based for the receivers, could make your life easier.
 
I’ve searched the forums for async UART but came up with nothing pretty much..
I need to send these 27 byte frames at a 1Khz rate

If there is something you know of, I don’t mind having a look/go at it
 
I added the CS_went_high check alongside the existing flag_transaction_completed.
Its running stable on SPI1 with 12cm jumper wires

C++:
if (MOSI_SNIFFER.flag_transaction_completed && MOSI_SNIFFER.CS_went_high)
  {
//    Serial.printf ("n=%d, ", n++);
    /*
    Serial.printf ("\nRECEIVED ");
    for (int i=0; i<Nbytes; i++)
    {
        Serial.printf ("%02x ", RXbuf[i]);
    }
    Serial.printf ("\nSENT     ");
    for (int i=0; i<Nbytes; i++)
    {
        Serial.printf ("%02x ", TXbuf[i]);
    }
    Serial.printf ("\n");
    */
//    delay(1000);
 
    MOSI_SNIFFER.flag_transaction_completed = 0;
   // Change TX buffer here
    
    //prepare_for_next_transaction();
  }

Regardless, my previous comment is still relevant - If an async UART example exists, I'd love to try it out
 
@Rezo: Here's a chopped-up example of using Serial comms between two Teensy processors (e.g. I use Serial4). These are taken from my TeensyMIDIPolySynth, which is a 3-voice, 12-poly, multi-waveform, multi-filter, multi-envelope, multi-modulation, multi-effect, multi-capability synthesizer.

The PRIMARY sketch runs on a Teensy 4.1 which is managing the 800x480 TFT touchscreen display (buttons & sliders controlling all settings - these have been cut out for clarity), as well as the standard Serial1 MIDI interface, the USBhost interface (USBhostMIDI, etc.), the USBmidi from the standard USB interface, and the shared EA_Serial interface (EA came from the fact that I originally did everything on a single T4.x, but decided to split out the audio to a second T4.x & called it the "External Audio" processor). The SECONDARY sketch runs on a Teensy 4.0 which is managing the Teensy Audio Adapter (most of that functionality has merely been commented out).

Things to note:
- as is, these sketches will compile, but they don't really exercise the shared serial comm channel
- the shared interface runs at 500000bps & makes use of additional memory for serial buffers
- the list of EA_INDEX_XXX commands in each T4 has to match in order to process incoming msgs correctly at each end
- EA command processing is done with fault-tolerant CASE statements
- the chance of encountering the EA command header (in my case, "EA:") in your regular data needs to be very unlikely, else it my false-trigger on a header when it is actually data
- my apologies for the large amount of code that is commented out in the SECONDARY sketch . . . I thought it might be better to be able to actually see what the resulting processing is when correctly parsing an incoming EA command

Hope this helps & please feel free to ask any questions !!

[ SKETCHES IN THEIR OWN MESSAGE TO FOLLOW - COMBINED WAS TOO LONG !! ]

Mark J Culross
KD5RXT
 
Last edited:
Here's another example for Serial (LPUART), 1 wire, DMA receive and transmit, background data buffers exchanging.

Uses just 1 signaling wire in between two Teensy4's. So GND and this one LPUART TX pin is all that's needed. Can use either Serial1, Serial2, Serial5 or Serial7.
Packet size set as N 32 bit words, with N in this example hard coded to 8 (PZD_DWORDS_BUFFER_SIZE). 1.5 MBaud, ~11 UART bit times / payload byte plus some overhead.
Once every milliseconds, a timer interrupt triggers the master Teensy4 to send a packet to a slave Teensy4. The slave replies with its data packet. All interrupt driven, DMA secured, non-blocking, atomic, and independent from what's happening in loop().
It's one project file for both the master and the slave. In this example an i/o pin is used to define role (master or slave).
 

Attachments

  • T4_MemoryBuffersExchange_DMA_Serial_Master-250130a.zip
    15.2 KB · Views: 14
Back
Top