Multiple SPI at the same time

thegranes

New member
Hi all,

I have 3 separate slave devices that I'm intending to use with the 3 sets of SPI on the Teensy 4.0. Speed is very important here. I'm looking to send data to each slave device at the same time. I've looked into to TeensyTheads.h library and have tested a bit, but haven't got it to work. Is it possible to have three separate sets of SPI transfer data at once? Any help/advice would be greatly appreciated!
 
The three SPI modules operate independently of each other and separate from the main program once a transaction has been started. It's possible to do something like this:

digitalWrite(CS_pin, HIGH);
digitalWrite(CS1_pin, HIGH);
digitalWrite(CS2_pin, HIGH);
data_in = SPI.transfer(data_out);
data1_in = SPI1.transfer(data1_out);
data2_in = SPI2.transfer(data2_out);
digitalWrite(CS_pin, LOW);
digitalWrite(CS1_pin, LOW);
digitalWrite(CS2_pin, LOW);

The transfers will overlap and be simultaneous within a couple of instruction cycles. No need for threads. Either use interrupts or a timer to send data at the desired intervals.

EDIT: My apologies, this is still a sequential send. If anyone knows how to bypass the transfer() "waiting until finished' please chime in. I do this sort of thing all the time with other MCUs. Probably need to write directly to the SPI registers.
 
Last edited:
The standard SPI functions block for the time taken to send the data since they are also reading the reply.

The actual hardware has a buffer so if the only time critical situations are fairly short writes you could write the data into that buffer directly and then exit and let the hardware handle the transfer. Looking in AppData\Local\Arduino15\packages\teensy\hardware\avr\1.59.0\libraries\SPI\SPI.cpp the function void SPIClass::transfer() (around line 1686 for teensy 4) looks to load data into the buffer and then has a section commented as
// now lets wait for all of the read bytes to be returned...

If you were to create a transfer_WriteOnly function that was identical to the normal transfer function but without the read half then I suspect it will return a lot faster. You would also have to be careful that the chip select line was cleared at the correct time. e.g. if the transfers are all the same size then transfer the first two without waiting for the read, transfer the 3rd waiting for the read and then clear all 3 selects once that finished.

Also looking in that file there are also some DMA based asynchronous transfer functions. I don't know if there is any documentation on those but I suspect if you can get them to work they should do what you need. That would also allow you to perform multiple reads in parallel.
 
Couple of different ways:
For example: several of our display drivers have code built into them to allow them to do asynchronous updates.
Like my ili9341_t3n library as well as others for ST7735 and ST7789, and ILI9488_t3n library. In these libraries we have code that uses the DMAChannel stuff. The DMAChannel.h/.cpp files are in the Teensy core directories.

In addition:
You can try using the stuff we added to the SPI library, in particular calls:
C++:
// Asynch support (DMA )
#ifdef SPI_HAS_TRANSFER_ASYNC
    bool transfer(const void *txBuffer, void *rxBuffer, size_t count,  EventResponderRef  event_responder);

Sorry, there is not a lot of good documentation on the eventResponder and the like. When developing some of this, I had a few test programs, that show some of the different ways to use it:
C++:
#include <SPI.h>
#include <EventResponder.h>
#define CS_PIN 10
volatile bool event_happened = false;

EventResponder event;
static const uint8_t buffer[] = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

void asyncEventResponder(EventResponderRef event_responder)
{
  digitalWriteFast(CS_PIN, HIGH);
  event_happened = true;
  //Serial.println("Event happened");
}
void setup() {
  pinMode(CS_PIN, OUTPUT);
  digitalWriteFast(CS_PIN, HIGH);
  while (!Serial && millis() < 4000) ;  // wait for Serial port
  Serial.begin(115200);
  SPI.begin();
  Serial.println("SPI Test program");
  Serial1.begin(2000000);
  Serial2.begin(2000000);
  Serial3.begin(2000000);
  extern const uint8_t _serialEvent_default;
  extern const uint8_t _serialEvent1_default;
  extern const uint8_t _serialEvent2_default;
  extern const uint8_t _serialEvent3_default;
  Serial.printf("Default serialEvent? %d %d %d %d\n", _serialEvent_default,
                _serialEvent1_default, _serialEvent2_default, _serialEvent3_default);
#if defined(__IMXRT1062__)
  Serial4.begin(2000000);
  Serial5.begin(2000000);
  Serial6.begin(2000000);
  Serial7.begin(2000000);
  //Serial8.begin(2000000);
  extern const uint8_t _serialEvent4_default;
  extern const uint8_t _serialEvent5_default;
  extern const uint8_t _serialEvent6_default;
  extern const uint8_t _serialEvent7_default;
  //  extern const uint8_t _serialEvent8_default;
  Serial.printf("    %d %d %d %d\n", _serialEvent4_default,
                _serialEvent5_default, _serialEvent6_default, _serialEvent7_default);
#endif
}

void TimeYieldCalls(const char *sz) {
  yield();
  Serial.print(sz); Serial.flush();
  elapsedMicros em = 0;
  for (uint32_t i = 0; i < 1000; i++) yield();
  uint32_t elapsed = em;
  Serial.print(": ");
  Serial.println(elapsed, DEC);
  Serial.flush();
}

void loop() {
  while (Serial.read() != -1) ; // Make sure queue is empty.
  Serial.println("Press any key to run test");
  while (!Serial.available()) ; // will loop until it receives something
  while (Serial.read() != -1) ; // loop until queue is empty

  Serial.printf("start test yield_active_check_flags %x\n", yield_active_check_flags);
  Serial.printf("  systick ISR: %x\n", (uint32_t) _VectorsRam[15]);
  TimeYieldCalls("Start");

  // First try with immediate call.
  event.attachImmediate(&asyncEventResponder);
  Serial.printf("Test Immediate: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  SPI.endTransaction();
  TimeYieldCalls("After Immediate");

  // Use yield .
  event.detach();
  event.attach(&asyncEventResponder);
  Serial.printf("Test yield: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  TimeYieldCalls("After yield");

  // Use Interrupt .
  event.detach();
  event.attachInterrupt(&asyncEventResponder);
  Serial.printf("Test Interrupt: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  TimeYieldCalls("After Interrupt");
  Serial2.write(buffer, sizeof(buffer));
  delay(5000);

}
void XserialEvent1() {
  int ch;
  while ((ch = Serial1.read()) != -1) Serial.write(ch);

}
void XserialEvent() {
  Serial.write(Serial.read());
}
void serialEventUSB1() {
  while (SerialUSB1.available())
    Serial.write(SerialUSB1.read());
}

void serialEventUSB2() {
  while (SerialUSB2.available())
    Serial.write(SerialUSB2.read());
}

Here is another one that shows doing two at the same time:
C++:
#include <SPI.h>

EventResponder event;
EventResponder event1;

uint8_t buffer0[100];
uint8_t buffer1[100];
volatile bool spi_active = false;
volatile bool spi1_active = false;

void Event_SPI0_Responder(EventResponderRef event_responder) {
  digitalWriteFast(2, HIGH);
  SPI.endTransaction();
  Serial.println("SPI0 ended");
  spi_active = false;
}
void Event_SPI1_Responder(EventResponderRef event_responder) {
  digitalWriteFast(3, HIGH);
  SPI1.endTransaction();
  Serial.println("SPI1 ended");
  spi1_active = false;
}

void setup() {
  while (!Serial && (millis() < 2000)) ;
  Serial.println("Test SPI DMA on SPI and SPI1");

  pinMode(2, OUTPUT);
  digitalWriteFast(2, HIGH);
  SPI.begin();
  event.attachImmediate(&Event_SPI0_Responder);
  for (int i = 0; i < sizeof(buffer0); i++) buffer0[i] = i;

  pinMode(3, OUTPUT);
  digitalWriteFast(3, HIGH);
  SPI1.begin();
  event1.attachImmediate(&Event_SPI1_Responder);
  for (int i = 0; i < sizeof(buffer1); i++) buffer1[i] = sizeof(buffer1) - i;
}

void loop() {
  Serial.println("Start Two SPI transfers");
  spi_active = true;
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(2, LOW);
  SPI.transfer(buffer0, nullptr, sizeof(buffer0), event);

  spi1_active = true;
  SPI1.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(3, LOW);
  SPI1.transfer(buffer1, nullptr, sizeof(buffer1), event1);

  while (spi_active || spi1_active) ; // wait until both are done
  Serial.println("Both done");
  delay(1000);  // wait a second

}

Note: I probably should update these test sketches to align buffers to 32 byte boundaries, especially if the memory you are desiring to use is up in DMAMEM (either directly or using malloc).

Also, when you are doing DMA into or out of memory areas that use the hardware memory cache, the data stored in physical memory may not be in sync with the logical memory values, so if you are doing your own DMA code you may need to do stuff to get them insync.
The SPI transfer has some stuff in it to do this:
Code:
// lets clear cache before we update sizes...
    if ((uint32_t)buf >= 0x20200000u)  arm_dcache_flush((uint8_t *)buf, count);
    if ((uint32_t)retbuf >= 0x20200000u)  arm_dcache_delete(retbuf, count);
There is a potential issue with the retbuf one and we probably should change it to: arm_dcache_flush_delete(...)
Why? these dcache functions work on blocks of 32 bytes. and the delete function will simply throw away all current values in the cache that are in that 32 byte range. So if you have other variables in that range, that were recently updated, those changes could easily be lost. Like for example if the memory came from malloc()...

Hope that helps


My look back at current SPI.cpp, there is a potential issue:
 
The standard SPI functions block for the time taken to send the data since they are also reading the reply.

The actual hardware has a buffer so if the only time critical situations are fairly short writes you could write the data into that buffer directly and then exit and let the hardware handle the transfer. Looking in AppData\Local\Arduino15\packages\teensy\hardware\avr\1.59.0\libraries\SPI\SPI.cpp the function void SPIClass::transfer() (around line 1686 for teensy 4) looks to load data into the buffer and then has a section commented as
// now lets wait for all of the read bytes to be returned...

If you were to create a transfer_WriteOnly function that was identical to the normal transfer function but without the read half then I suspect it will return a lot faster. You would also have to be careful that the chip select line was cleared at the correct time. e.g. if the transfers are all the same size then transfer the first two without waiting for the read, transfer the 3rd waiting for the read and then clear all 3 selects once that finished.

Also looking in that file there are also some DMA based asynchronous transfer functions. I don't know if there is any documentation on those but I suspect if you can get them to work they should do what you need. That would also allow you to perform multiple reads in parallel.
I used your advice and simply removed the parts after the read bytes comment.

C++:
void SPIClass::transfer_writeOnly(const void * buf, void * retbuf, size_t count)
{

    if (count == 0) return;
    uint8_t *p_write = (uint8_t*)buf;
    uint8_t *p_read = (uint8_t*)retbuf;
    size_t count_read = count;

    // Pass 1 keep it simple and don't try packing 8 bits into 16 yet..
    // Lets clear the reader queue
    port().CR = LPSPI_CR_RRF | LPSPI_CR_MEN;    // clear the queue and make sure still enabled.

    while (count > 0) {
        // Push out the next byte;
        port().TDR = p_write? *p_write++ : _transferWriteFill;
        count--; // how many bytes left to output.
        // Make sure queue is not full before pushing next byte out
        do {
            if ((port().RSR & LPSPI_RSR_RXEMPTY) == 0)  {
                uint8_t b = port().RDR;  // Read any pending RX bytes in
                if (p_read) *p_read++ = b;
                count_read--;
            }
        } while ((port().SR & LPSPI_SR_TDF) == 0) ;

    }
}

The before and after clock cycles are shown below. Thank you for your help! Now, I want to see if I can go even faster...
 

Attachments

  • old3spi.png
    old3spi.png
    377.6 KB · Views: 33
  • better3spi.png
    better3spi.png
    428 KB · Views: 88
Couple of different ways:
For example: several of our display drivers have code built into them to allow them to do asynchronous updates.
Like my ili9341_t3n library as well as others for ST7735 and ST7789, and ILI9488_t3n library. In these libraries we have code that uses the DMAChannel stuff. The DMAChannel.h/.cpp files are in the Teensy core directories.

In addition:
You can try using the stuff we added to the SPI library, in particular calls:
C++:
// Asynch support (DMA )
#ifdef SPI_HAS_TRANSFER_ASYNC
    bool transfer(const void *txBuffer, void *rxBuffer, size_t count,  EventResponderRef  event_responder);

Sorry, there is not a lot of good documentation on the eventResponder and the like. When developing some of this, I had a few test programs, that show some of the different ways to use it:
C++:
#include <SPI.h>
#include <EventResponder.h>
#define CS_PIN 10
volatile bool event_happened = false;

EventResponder event;
static const uint8_t buffer[] = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

void asyncEventResponder(EventResponderRef event_responder)
{
  digitalWriteFast(CS_PIN, HIGH);
  event_happened = true;
  //Serial.println("Event happened");
}
void setup() {
  pinMode(CS_PIN, OUTPUT);
  digitalWriteFast(CS_PIN, HIGH);
  while (!Serial && millis() < 4000) ;  // wait for Serial port
  Serial.begin(115200);
  SPI.begin();
  Serial.println("SPI Test program");
  Serial1.begin(2000000);
  Serial2.begin(2000000);
  Serial3.begin(2000000);
  extern const uint8_t _serialEvent_default;
  extern const uint8_t _serialEvent1_default;
  extern const uint8_t _serialEvent2_default;
  extern const uint8_t _serialEvent3_default;
  Serial.printf("Default serialEvent? %d %d %d %d\n", _serialEvent_default,
                _serialEvent1_default, _serialEvent2_default, _serialEvent3_default);
#if defined(__IMXRT1062__)
  Serial4.begin(2000000);
  Serial5.begin(2000000);
  Serial6.begin(2000000);
  Serial7.begin(2000000);
  //Serial8.begin(2000000);
  extern const uint8_t _serialEvent4_default;
  extern const uint8_t _serialEvent5_default;
  extern const uint8_t _serialEvent6_default;
  extern const uint8_t _serialEvent7_default;
  //  extern const uint8_t _serialEvent8_default;
  Serial.printf("    %d %d %d %d\n", _serialEvent4_default,
                _serialEvent5_default, _serialEvent6_default, _serialEvent7_default);
#endif
}

void TimeYieldCalls(const char *sz) {
  yield();
  Serial.print(sz); Serial.flush();
  elapsedMicros em = 0;
  for (uint32_t i = 0; i < 1000; i++) yield();
  uint32_t elapsed = em;
  Serial.print(": ");
  Serial.println(elapsed, DEC);
  Serial.flush();
}

void loop() {
  while (Serial.read() != -1) ; // Make sure queue is empty.
  Serial.println("Press any key to run test");
  while (!Serial.available()) ; // will loop until it receives something
  while (Serial.read() != -1) ; // loop until queue is empty

  Serial.printf("start test yield_active_check_flags %x\n", yield_active_check_flags);
  Serial.printf("  systick ISR: %x\n", (uint32_t) _VectorsRam[15]);
  TimeYieldCalls("Start");

  // First try with immediate call.
  event.attachImmediate(&asyncEventResponder);
  Serial.printf("Test Immediate: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  SPI.endTransaction();
  TimeYieldCalls("After Immediate");

  // Use yield .
  event.detach();
  event.attach(&asyncEventResponder);
  Serial.printf("Test yield: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  TimeYieldCalls("After yield");

  // Use Interrupt .
  event.detach();
  event.attachInterrupt(&asyncEventResponder);
  Serial.printf("Test Interrupt: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  TimeYieldCalls("After Interrupt");
  Serial2.write(buffer, sizeof(buffer));
  delay(5000);

}
void XserialEvent1() {
  int ch;
  while ((ch = Serial1.read()) != -1) Serial.write(ch);

}
void XserialEvent() {
  Serial.write(Serial.read());
}
void serialEventUSB1() {
  while (SerialUSB1.available())
    Serial.write(SerialUSB1.read());
}

void serialEventUSB2() {
  while (SerialUSB2.available())
    Serial.write(SerialUSB2.read());
}

Here is another one that shows doing two at the same time:
C++:
#include <SPI.h>

EventResponder event;
EventResponder event1;

uint8_t buffer0[100];
uint8_t buffer1[100];
volatile bool spi_active = false;
volatile bool spi1_active = false;

void Event_SPI0_Responder(EventResponderRef event_responder) {
  digitalWriteFast(2, HIGH);
  SPI.endTransaction();
  Serial.println("SPI0 ended");
  spi_active = false;
}
void Event_SPI1_Responder(EventResponderRef event_responder) {
  digitalWriteFast(3, HIGH);
  SPI1.endTransaction();
  Serial.println("SPI1 ended");
  spi1_active = false;
}

void setup() {
  while (!Serial && (millis() < 2000)) ;
  Serial.println("Test SPI DMA on SPI and SPI1");

  pinMode(2, OUTPUT);
  digitalWriteFast(2, HIGH);
  SPI.begin();
  event.attachImmediate(&Event_SPI0_Responder);
  for (int i = 0; i < sizeof(buffer0); i++) buffer0[i] = i;

  pinMode(3, OUTPUT);
  digitalWriteFast(3, HIGH);
  SPI1.begin();
  event1.attachImmediate(&Event_SPI1_Responder);
  for (int i = 0; i < sizeof(buffer1); i++) buffer1[i] = sizeof(buffer1) - i;
}

void loop() {
  Serial.println("Start Two SPI transfers");
  spi_active = true;
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(2, LOW);
  SPI.transfer(buffer0, nullptr, sizeof(buffer0), event);

  spi1_active = true;
  SPI1.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(3, LOW);
  SPI1.transfer(buffer1, nullptr, sizeof(buffer1), event1);

  while (spi_active || spi1_active) ; // wait until both are done
  Serial.println("Both done");
  delay(1000);  // wait a second

}

Note: I probably should update these test sketches to align buffers to 32 byte boundaries, especially if the memory you are desiring to use is up in DMAMEM (either directly or using malloc).

Also, when you are doing DMA into or out of memory areas that use the hardware memory cache, the data stored in physical memory may not be in sync with the logical memory values, so if you are doing your own DMA code you may need to do stuff to get them insync.
The SPI transfer has some stuff in it to do this:
Code:
// lets clear cache before we update sizes...
    if ((uint32_t)buf >= 0x20200000u)  arm_dcache_flush((uint8_t *)buf, count);
    if ((uint32_t)retbuf >= 0x20200000u)  arm_dcache_delete(retbuf, count);
There is a potential issue with the retbuf one and we probably should change it to: arm_dcache_flush_delete(...)
Why? these dcache functions work on blocks of 32 bytes. and the delete function will simply throw away all current values in the cache that are in that 32 byte range. So if you have other variables in that range, that were recently updated, those changes could easily be lost. Like for example if the memory came from malloc()...

Hope that helps


My look back at current SPI.cpp, there is a potential issue:
Thank you for all this information. I'm kind of a noob and tried to implement your second example in my own code. It's shown below.

C++:
#include <SPI.h>
#include <EventResponder.h>
uint32_t fclk_w = 1000000;
unsigned int numbytes = 0;
uint8_t RXdata[512];

const int chipSelectPin = 10;
// clk is 13
// data is 11

// clk1 is 27
// data1 is 26

// clk2 is 37
// data2 is 35

//#if defined(__IMXRT1062__)
//extern "C" uint32_t set_arm_clock(uint32_t frequency);

#ifdef SPI_HAS_TRANSFER_ASYNC
    bool transfer(const void *txBuffer, void *rxBuffer, size_t count,  EventResponderRef  event_responder);
#endif

EventResponder event;
EventResponder event1;
EventResponder event2;

int buffer0[100];
int buffer1[100];
int buffer2[100];
int rxdNum[1];
int rxd1Num[1];
int rxd2Num[1];
volatile bool spi_active = false;
volatile bool spi1_active = false;
volatile bool spi2_active = false;

void Event_SPI0_Responder(EventResponderRef event_responder) {
  SPI.endTransaction();
  Serial.println("SPI0 ended");
  spi_active = false;
}

void Event_SPI1_Responder(EventResponderRef event_responder) {
  SPI1.endTransaction();
  Serial.println("SPI1 ended");
  spi1_active = false;
}

void Event_SPI2_Responder(EventResponderRef event_responder) {
  SPI2.endTransaction();
  Serial.println("SPI2 ended");
  spi2_active = false;
}


void setup() {
  // put your setup code here, to run once:
 
  //set_arm_clock(800000000);
  Serial.begin(115200);  //configure serial port to 115200 baud
  while (!Serial) {
    ;  // wait for serial port to connect. Needed for native USB port only
  }
  //Serial.println(F_CPU_ACTUAL);
  pinMode(chipSelectPin, OUTPUT);
  // initialize SPI:
  SPI1.setMOSI(26);
  SPI1.setSCK(27);
  SPI2.setMOSI(35);
  SPI2.setSCK(37);
  SPI.begin();
  event.attachImmediate(&Event_SPI0_Responder);
  for (uint8_t i = 0; i < sizeof(buffer0); i++) buffer0[i] = i;
  SPI1.begin();
  event1.attachImmediate(&Event_SPI1_Responder);
  for (uint8_t i = 0; i < sizeof(buffer1); i++) buffer1[i] = i;
  SPI2.begin();
  event2.attachImmediate(&Event_SPI2_Responder);
  for (uint8_t i = 0; i < sizeof(buffer2); i++) buffer2[i] = i;
  digitalWrite(chipSelectPin, LOW);
}
/*
#else
void setup() {
  // put your setup code here, to run once:
  // set_arm_clock(24000000);
  Serial.begin(115200);  //configure serial port to 115200 baud
  while (!Serial) {
    ;  // wait for serial port to connect. Needed for native USB port only
  }
  Serial.println(F_CPU_ACTUAL);

  pinMode(chipSelectPin, OUTPUT);
  // initialize SPI:
  SPI.begin();
  digitalWrite(chipSelectPin, LOW);
}
#endif
*/

void loop() {
  // put your main code here, to run repeatedly:
  if (Serial.available() > 0) {
    Serial.println("spi test started");
    String rxd = Serial.readStringUntil(',');
    String rxd1 = Serial.readStringUntil(',');
    String rxd2 = Serial.readString();
    rxdNum[0] = rxd.toInt();
    rxd1Num[0] = rxd1.toInt();
    rxd2Num[0] = rxd2.toInt();

    digitalWriteFast(chipSelectPin, HIGH);
    spi_active = true;
    SPI.beginTransaction(SPISettings(fclk_w, MSBFIRST, 0));
    SPI.transfer(rxdNum,nullptr,sizeof(buffer0),event);
    spi1_active = true;
    SPI1.beginTransaction(SPISettings(fclk_w,MSBFIRST, 0));
    SPI1.transfer(rxd1Num,nullptr,sizeof(buffer1),event1);
    spi2_active = true;
    SPI2.beginTransaction(SPISettings(fclk_w,MSBFIRST, 0));
    SPI2.transfer(rxd2Num,nullptr,sizeof(buffer2),event2);
    //while(spi_active || spi1_active || spi2_active);
    digitalWriteFast(chipSelectPin, LOW);
    Serial.println(rxd);
    Serial.println(rxd1);
    Serial.print(rxd2);
  }
}

However, I'm not able to generate any SPI signals (I can't see anything on my scope). I'm not 100% sure what I'm doing wrong. Any help would be appreciated!
 
Sorry did not have time today to fully diagnose the issue. But I dos see some issues with the code, although
I would think you would still see something.
Code:
int buffer0[100];
int buffer1[100];
int buffer2[100];
int rxdNum[1];
int rxd1Num[1];
int rxd2Num[1];
...
    rxdNum[0] = rxd.toInt();
    rxd1Num[0] = rxd1.toInt();
    rxd2Num[0] = rxd2.toInt();

    digitalWriteFast(chipSelectPin, HIGH);
    spi_active = true;
    SPI.beginTransaction(SPISettings(fclk_w, MSBFIRST, 0));
    SPI.transfer(rxdNum,nullptr,sizeof(buffer0),event);
    spi1_active = true;

You pass in rxdNum to be sent and nullptr which is fine it says you don't care about what is returned. But then you say you wish to transfer
sizeof(buffer0) bytes or 100. But you are not using buffer0 at all here. and rxdNum is 1 byte long.

I believe you are using default pins, so you don't need to use the setMOSI and the like.

I would add some more debug outputs, that print out what you are trying to send. As to make sure your code is actually calling off to the SPI.transfer statements.
 
Sorry did not have time today to fully diagnose the issue. But I dos see some issues with the code, although
I would think you would still see something.
Code:
int buffer0[100];
int buffer1[100];
int buffer2[100];
int rxdNum[1];
int rxd1Num[1];
int rxd2Num[1];
...
    rxdNum[0] = rxd.toInt();
    rxd1Num[0] = rxd1.toInt();
    rxd2Num[0] = rxd2.toInt();

    digitalWriteFast(chipSelectPin, HIGH);
    spi_active = true;
    SPI.beginTransaction(SPISettings(fclk_w, MSBFIRST, 0));
    SPI.transfer(rxdNum,nullptr,sizeof(buffer0),event);
    spi1_active = true;

You pass in rxdNum to be sent and nullptr which is fine it says you don't care about what is returned. But then you say you wish to transfer
sizeof(buffer0) bytes or 100. But you are not using buffer0 at all here. and rxdNum is 1 byte long.

I believe you are using default pins, so you don't need to use the setMOSI and the like.

I would add some more debug outputs, that print out what you are trying to send. As to make sure your code is actually calling off to the SPI.transfer statements.
Wow cool, that essentially gives SPI almost DMA speed but not messing with all the DMA thing
Thanks for the tip :) , I don't know we could just attach interrupt with the spi.tranfer() function.
Though I just completed the coding to use T4 DMASPI with your modified version yesterday😂😂

Meanwhile, I found with DMASPI library : you need to use a
"ActiveLowChipSelect cs(pin, spisetting);" object in
"DmaSpi::Transfer trx(src, DMASIZE, des, 0, &cs); " to give full spisetting speed used, not really sure why...
probably related somewhere in the datasheet with" PCS must negated to change clock divider" thing.
 
Here's an example on how to do SPI masters on T4 with DMA. In a non blocking way. I also did a SPI DMA Slave library, but it looks like it's the master that you want here, and then three times in parallel, so SPI0, SPI1 and SPI2 hardware that is wired out on the Teensy 4.1.
When using the DMA_SPI mode in this library, it needs quite a bit of planning upfront on how you wish a series of SPI byte interrogations to be scheduled. For that, a table is filled that says how many bytes, where to get output bytes from, where to store input bytes, what SPI mode, how and when to make what CS signal low and high, and so on. This version also supports 3-wire SPI (MOSI and MISO over one SDIO pin). And swapped MOSI-MISO pins. And it supports using alternative pins.
It's just the T4_SPI_DMA*.* files that you need, but I thought it's easier to follow if I shared a full project for multiple SPI chips attached to one SPI bus.
 

Attachments

  • Example_T4_DMA_SPI-240423a.zip
    41 KB · Views: 276
Couple of different ways:
For example: several of our display drivers have code built into them to allow them to do asynchronous updates.
Like my ili9341_t3n library as well as others for ST7735 and ST7789, and ILI9488_t3n library. In these libraries we have code that uses the DMAChannel stuff. The DMAChannel.h/.cpp files are in the Teensy core directories.

In addition:
You can try using the stuff we added to the SPI library, in particular calls:
C++:
// Asynch support (DMA )
#ifdef SPI_HAS_TRANSFER_ASYNC
    bool transfer(const void *txBuffer, void *rxBuffer, size_t count,  EventResponderRef  event_responder);

Sorry, there is not a lot of good documentation on the eventResponder and the like. When developing some of this, I had a few test programs, that show some of the different ways to use it:
C++:
#include <SPI.h>
#include <EventResponder.h>
#define CS_PIN 10
volatile bool event_happened = false;

EventResponder event;
static const uint8_t buffer[] = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

void asyncEventResponder(EventResponderRef event_responder)
{
  digitalWriteFast(CS_PIN, HIGH);
  event_happened = true;
  //Serial.println("Event happened");
}
void setup() {
  pinMode(CS_PIN, OUTPUT);
  digitalWriteFast(CS_PIN, HIGH);
  while (!Serial && millis() < 4000) ;  // wait for Serial port
  Serial.begin(115200);
  SPI.begin();
  Serial.println("SPI Test program");
  Serial1.begin(2000000);
  Serial2.begin(2000000);
  Serial3.begin(2000000);
  extern const uint8_t _serialEvent_default;
  extern const uint8_t _serialEvent1_default;
  extern const uint8_t _serialEvent2_default;
  extern const uint8_t _serialEvent3_default;
  Serial.printf("Default serialEvent? %d %d %d %d\n", _serialEvent_default,
                _serialEvent1_default, _serialEvent2_default, _serialEvent3_default);
#if defined(__IMXRT1062__)
  Serial4.begin(2000000);
  Serial5.begin(2000000);
  Serial6.begin(2000000);
  Serial7.begin(2000000);
  //Serial8.begin(2000000);
  extern const uint8_t _serialEvent4_default;
  extern const uint8_t _serialEvent5_default;
  extern const uint8_t _serialEvent6_default;
  extern const uint8_t _serialEvent7_default;
  //  extern const uint8_t _serialEvent8_default;
  Serial.printf("    %d %d %d %d\n", _serialEvent4_default,
                _serialEvent5_default, _serialEvent6_default, _serialEvent7_default);
#endif
}

void TimeYieldCalls(const char *sz) {
  yield();
  Serial.print(sz); Serial.flush();
  elapsedMicros em = 0;
  for (uint32_t i = 0; i < 1000; i++) yield();
  uint32_t elapsed = em;
  Serial.print(": ");
  Serial.println(elapsed, DEC);
  Serial.flush();
}

void loop() {
  while (Serial.read() != -1) ; // Make sure queue is empty.
  Serial.println("Press any key to run test");
  while (!Serial.available()) ; // will loop until it receives something
  while (Serial.read() != -1) ; // loop until queue is empty

  Serial.printf("start test yield_active_check_flags %x\n", yield_active_check_flags);
  Serial.printf("  systick ISR: %x\n", (uint32_t) _VectorsRam[15]);
  TimeYieldCalls("Start");

  // First try with immediate call.
  event.attachImmediate(&asyncEventResponder);
  Serial.printf("Test Immediate: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  SPI.endTransaction();
  TimeYieldCalls("After Immediate");

  // Use yield .
  event.detach();
  event.attach(&asyncEventResponder);
  Serial.printf("Test yield: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  TimeYieldCalls("After yield");

  // Use Interrupt .
  event.detach();
  event.attachInterrupt(&asyncEventResponder);
  Serial.printf("Test Interrupt: %x %x\n", yield_active_check_flags, (uint32_t) _VectorsRam[15]);
  event.clearEvent();
  digitalWriteFast(CS_PIN, LOW);
  SPI.transfer(buffer, NULL, sizeof(buffer), event);
  while (!event_happened) ;
  TimeYieldCalls("After Interrupt");
  Serial2.write(buffer, sizeof(buffer));
  delay(5000);

}
void XserialEvent1() {
  int ch;
  while ((ch = Serial1.read()) != -1) Serial.write(ch);

}
void XserialEvent() {
  Serial.write(Serial.read());
}
void serialEventUSB1() {
  while (SerialUSB1.available())
    Serial.write(SerialUSB1.read());
}

void serialEventUSB2() {
  while (SerialUSB2.available())
    Serial.write(SerialUSB2.read());
}

Here is another one that shows doing two at the same time:
C++:
#include <SPI.h>

EventResponder event;
EventResponder event1;

uint8_t buffer0[100];
uint8_t buffer1[100];
volatile bool spi_active = false;
volatile bool spi1_active = false;

void Event_SPI0_Responder(EventResponderRef event_responder) {
  digitalWriteFast(2, HIGH);
  SPI.endTransaction();
  Serial.println("SPI0 ended");
  spi_active = false;
}
void Event_SPI1_Responder(EventResponderRef event_responder) {
  digitalWriteFast(3, HIGH);
  SPI1.endTransaction();
  Serial.println("SPI1 ended");
  spi1_active = false;
}

void setup() {
  while (!Serial && (millis() < 2000)) ;
  Serial.println("Test SPI DMA on SPI and SPI1");

  pinMode(2, OUTPUT);
  digitalWriteFast(2, HIGH);
  SPI.begin();
  event.attachImmediate(&Event_SPI0_Responder);
  for (int i = 0; i < sizeof(buffer0); i++) buffer0[i] = i;

  pinMode(3, OUTPUT);
  digitalWriteFast(3, HIGH);
  SPI1.begin();
  event1.attachImmediate(&Event_SPI1_Responder);
  for (int i = 0; i < sizeof(buffer1); i++) buffer1[i] = sizeof(buffer1) - i;
}

void loop() {
  Serial.println("Start Two SPI transfers");
  spi_active = true;
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(2, LOW);
  SPI.transfer(buffer0, nullptr, sizeof(buffer0), event);

  spi1_active = true;
  SPI1.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(3, LOW);
  SPI1.transfer(buffer1, nullptr, sizeof(buffer1), event1);

  while (spi_active || spi1_active) ; // wait until both are done
  Serial.println("Both done");
  delay(1000);  // wait a second

}

Note: I probably should update these test sketches to align buffers to 32 byte boundaries, especially if the memory you are desiring to use is up in DMAMEM (either directly or using malloc).

Also, when you are doing DMA into or out of memory areas that use the hardware memory cache, the data stored in physical memory may not be in sync with the logical memory values, so if you are doing your own DMA code you may need to do stuff to get them insync.
The SPI transfer has some stuff in it to do this:
Code:
// lets clear cache before we update sizes...
    if ((uint32_t)buf >= 0x20200000u)  arm_dcache_flush((uint8_t *)buf, count);
    if ((uint32_t)retbuf >= 0x20200000u)  arm_dcache_delete(retbuf, count);
There is a potential issue with the retbuf one and we probably should change it to: arm_dcache_flush_delete(...)
Why? these dcache functions work on blocks of 32 bytes. and the delete function will simply throw away all current values in the cache that are in that 32 byte range. So if you have other variables in that range, that were recently updated, those changes could easily be lost. Like for example if the memory came from malloc()...

Hope that helps


My look back at current SPI.cpp, there is a potential issue:
Novice here. Could you explain why you can not initialize and use SPI and SPI1 interchangeably, or together, without writing some sort of transfer function? And how you would know if you need to write some transfer function (what are the limitations of the ones in the SPI library)?
 
Novice here. Could you explain why you can not initialize and use SPI and SPI1 interchangeably, or together, without writing some sort of transfer function? And how you would know if you need to write some transfer function (what are the limitations of the ones in the SPI library)?
In most cases, you can use SPI or SPI1 interchangeably. Sometimes a limiting factor, is some libraries are written, that are had coded to use the SPI object. With those libraries, if you wish to use SPI1 instead (Different IO pins), you might have to edit those libraries, and change the SPI to SPI1 (or SPI2) for your application. Many libraries now provide some mechanism to pass in which SPI object to use.

When to use the SPI library code versus custom code. Typically has to do with performance. If the SPI library code works well enough for your
usage, use it! For example, here is a simple sketch that maybe outputs 32 RED pixels to a display:
C++:
#include <SPI.h>
#define ILI9488_RED 0xF800         /* 255,   0,   0 */
void setup() {
  // put your setup code here, to run once:
    SPI.begin();
    SPI.beginTransaction(SPISettings(16000000, MSBFIRST, SPI_MODE0));
}
void loop() {
    for (int i = 0; i < 32; i++) SPI.transfer16(ILI9488_RED);
    uint8_t buf[16];
    for (int i = 0; i < 16; i += 2) {
        buf[i] = ILI9488_RED >> 8;
        buf[i+1] = ILI9488_RED  & 0xff;
    }   
    delay(50);
    for (int i = 0; i < 4; i++) SPI.transfer(buf, nullptr, sizeof(buf));
    delay(1000);
}

It does it two ways. The first way does simple output of each pixel, The second way does it with temporary array for 8 pixels at a time.

Here is a Logic Analyzer capture of the one at a time. Notice the gap between each of the words that was output...

1719407516731.png

Closeup:

1719407747048.png



Where if I tell it to do multiple words, you save time between words:
1719408171199.png


Closer up:
1719408206805.png

You still see a gap between the groups of words I output, but even this sped up the output, by about 13% using the two timings shown.

With a custom SPI code this would remove the remove the other gaps... And this can add up. For example instead of outputting 16
pixels, I wish to output a full screen of lets say 480x320 or 153600 pixels, this adds up...

Hope that helps
 
Back
Top