DMA slave mode teensy 3.6

Status
Not open for further replies.

ozeng

Member
Hi all,

I'm working on a project that requires a basic command/response structure from a slave teensy. What I want to happen is a fixed-sized command packet is dma'd into a buffer (dma channel 1), and when the packet is received an interrupt is fired and channel 1 stops. In the interrupt we process the packet and put a response in a buffer, and then enable dma channel 2 to clock out the response. When channel 2 is done, it fires an interrupt and stops. Channel 1 is enabled by chip select falling low.

I've been using an interrupt based slave mode library here: https://github.com/btmcmahan/Teensy-3.0-SPI-Master---Slave/blob/master/t3spi.cpp, but being interrupt-dependent seems to be problematic at 6.25Mhz, which is what I'm running at (also it would have to compete with other interrupts, which is bad). I'm hoping DMA would be able to access SPI1_PUSHR fast enough.

I have a setup that does part of this, where I connect a teensy master's spi0 ports to teensy slave's spi1 ports.

Master:
MOSI pin 11
MISO pin 12
CS pin 10
CLK pin 13

Slave:
MISO pin 0 // These pins are the opposite of the teensy pinout documentation
MOSI pin 1
CS pin 31
SCK pin 32

I have 3 dma channels:
dma_pop pushes data out of SPI1_POPR into a 16-bit register and always runs because SPI1_PUSHR won't clock out bits unless POPR is cleared. So, this channel should run even when dma_rx is off.
dma_rx (channel 1 in the example above) takes the 16-bit word from dma_pop and moves it into a receive buffer. This is only enabled when we're receiving a command.
dma_tx (channel 2 in the example above) takes the 16-bit word from the response buffer and moves it to SPI1_PUSHR. This is only enabled when we're clocking out the response. In the example dma_tx echoes from the same buffer that dma_rx clocks into, and the spi clock is 625 khz instead of 6.25mhz

I think this code is breaking because maybe disableOnCompletion(); is firing after every dma execution, instead of after the buffer is completely transferred. dma_rx seems to work fine - the receive buffer looks about correct. dma_tx seems to not work as much though/maybe SPI_PUSHR isn't working. What's going on here?

master code:
Code:
#include <Arduino.h>
#include <SPI.h>  // include the SPI library:
void setup() {
  // put your setup code here, to run once:
  SPI.begin();
  SPI.beginTransaction(SPISettings(625000, MSBFIRST, SPI_MODE0));
  pinMode(10, OUTPUT);
}

void loop() {
  digitalWriteFast(10, LOW);
  //delayMicroseconds(0);
  // put your main code here, to run repeatedly:
  for (int i = 1; i < 25; i++) {
    uint16_t blah = SPI.transfer16(i);
    Serial.printf("Sent: %x Received: %x\n", i, blah);
  }
  Serial.printf("---------------------------------------\n");
  digitalWriteFast(10, HIGH);
  delay(2000);
}

slave code:
Code:
#include <DMAChannel.h>
#define MASTER        1
#define SLAVE       0

#define T3_SPI_CLOCK_DIV2   0b0000  //24.0  MHz
#define T3_SPI_CLOCK_DIV4   0b0001  //12.0  MHz
#define T3_SPI_CLOCK_DIV6   0b0010  //08.0  MHz
#define T3_SPI_CLOCK_DIV8   0b0011  //05.3  MHz
#define T3_SPI_CLOCK_DIV16    0b0100  //03.0  MHz
#define T3_SPI_CLOCK_DIV32    0b0101  //01.5  MHz
#define T3_SPI_CLOCK_DIV64    0b0110  //750 KHz
#define T3_SPI_CLOCK_DIV128 0b0111  //375 Khz

#define T3_SPI_MODE0      0x00
#define T3_SPI_MODE1      0x01
#define T3_SPI_MODE2      0x02
#define T3_SPI_MODE3      0x03

#define MSB_FIRST     0
#define LSB_FIRST     1

#define T3_CTAR_0       0
#define T3_CTAR_1       1
#define T3_CTAR_SLAVE     2

#define SCK         0x0D
#define MOSI        0x0B
#define MISO        0x0C
#define ALT_SCK       0x0E
#define ALT_MOSI      0x07
#define ALT_MISO      0x08

#define SCK1                32
#define MOSI1               0
#define MISO1               1

#define T3_CS0          0x01
#define T3_CS1          0x02
#define T3_CS2          0x04
#define T3_CS3          0x08
#define T3_CS4          0x10
#define ALT_CS0       0x81
#define ALT_CS1       0x82
#define ALT_CS2       0x84
#define ALT_CS3       0x88
#define T3_SPI1_CS0             31

#define CS_ActiveLOW    1
#define CS_ActiveHIGH   0
#define CS0_ActiveLOW   0x00010000
#define CS1_ActiveLOW   0x00020000
#define CS2_ActiveLOW   0x00040000
#define CS3_ActiveLOW   0x00080000
#define CS4_ActiveLOW   0x00100000

#define SPI_SR_TXCTR    0x0000f000 //Mask isolating the TXCTR

/* *************** DMA SETUP ********************* */
DMAChannel dma_rx;
DMAChannel dma_pop;
DMAChannel dma_tx2;
void received_packet_isr(void)
{
    dma_rx.clearInterrupt();
    SPI1_PUSHR = 0xabcd;
    dma_tx2.enable();
    Serial.printf("Received packet\n");
}
void sent_packet_isr(void)
{
    SPI1_PUSHR = 0xdcba;
    dma_tx2.clearInterrupt();
    Serial.printf("Sent packet\n");
}

const uint16_t buffer_size = 600;
uint16_t spi_rx_dest[buffer_size];
uint32_t spi_tx_out[buffer_size];
uint16_t spi_word_received = 0xbeef;
void setup_dma_receive(void) {
    for (int i = 0; i < 500; i++) {
        spi_tx_out[i] = 0x100 + i;
    }
    dma_rx.source((uint16_t&) spi_word_received);
    dma_rx.destinationBuffer((uint16_t*) spi_rx_dest, 24);
    dma_rx.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);
    dma_rx.disableOnCompletion();
    dma_rx.interruptAtCompletion();
    dma_rx.attachInterrupt(received_packet_isr);

    spi_rx_dest[0] = 0xbeef;

    dma_pop.sourceBuffer((uint16_t*) &KINETISK_SPI1.POPR, 2);
    dma_pop.destination((uint16_t&) spi_word_received);
    dma_pop.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);

    dma_tx2.sourceBuffer((uint32_t *) spi_rx_dest, 24);
    dma_tx2.destination(KINETISK_SPI1.PUSHR); // SPI1_PUSHR_SLAVE
    dma_tx2.disableOnCompletion();
    dma_tx2.interruptAtCompletion();
    dma_tx2.attachInterrupt(sent_packet_isr);
    dma_tx2.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);

    SPI1_RSER = SPI_RSER_RFDF_RE | SPI_RSER_RFDF_DIRS; // DMA on receive FIFO drain flag
    SPI1_SR = 0xFF0F0000;
    dma_rx.enable();
    dma_pop.enable();
    dma_tx2.enable();
}

/* *********** T3SPI library for slave mode https://github.com/btmcmahan/Teensy-3.0-SPI-Master---Slave/blob/master/t3spi.cpp -- this code should work fine **************** */
void start() {
  SPI1_MCR &= ~SPI_MCR_HALT & ~SPI_MCR_MDIS;
}

void stop() {
  SPI1_MCR |= SPI_MCR_HALT | SPI_MCR_MDIS;
}

void end() {
  SPI1_SR &= ~SPI_SR_TXRXS;
  stop();
}

void setMCR(bool mode){
  stop();
  if (mode==1){
    SPI1_MCR=0x80000000;}
  else{
    SPI1_MCR=0x00000000;}
  start();
}

void enablePins_SLAVE(uint8_t sck, uint8_t mosi, uint8_t miso, uint8_t cs) {
    if (sck == SCK){
    CORE_PIN13_CONFIG = PORT_PCR_MUX(2);}
  if (sck == ALT_SCK){
    CORE_PIN14_CONFIG = PORT_PCR_MUX(2);}
    if (sck == SCK1){
    CORE_PIN32_CONFIG = PORT_PCR_MUX(2);}
  if (mosi == MOSI){
    CORE_PIN11_CONFIG = PORT_PCR_DSE | PORT_PCR_MUX(2);}
  if (mosi == ALT_MOSI){
    CORE_PIN7_CONFIG  = PORT_PCR_DSE | PORT_PCR_MUX(2);}
    if (mosi == MOSI1){
    CORE_PIN0_CONFIG  = PORT_PCR_DSE | PORT_PCR_MUX(2);}
  if (miso == MISO){
    CORE_PIN12_CONFIG = PORT_PCR_MUX(2);}
  if (miso == ALT_MISO){
    CORE_PIN8_CONFIG  = PORT_PCR_MUX(2);}
    if (miso == MISO1){
    CORE_PIN1_CONFIG  = PORT_PCR_MUX(2);}
  if (cs == T3_CS0){
    CORE_PIN10_CONFIG = PORT_PCR_MUX(2);}
  if (cs == ALT_CS0){
    CORE_PIN2_CONFIG  = PORT_PCR_MUX(2);}
    if (cs == T3_SPI1_CS0){
    CORE_PIN31_CONFIG  = PORT_PCR_MUX(2);}
}

void begin_SLAVE(uint8_t sck, uint8_t mosi, uint8_t miso, uint8_t cs) {
  SIM_SCGC6 |= SIM_SCGC6_SPI1;  // enable clock to SPI.

  setMCR(SLAVE);
  setCTAR_SLAVE(8, T3_SPI_MODE0);
  SPI1_RSER = 0x00020000;
  enablePins_SLAVE(sck, mosi, miso, cs);
}

void setMode(uint8_t CTARn, uint8_t dataMode) {
  stop();
  if (CTARn==0){
    SPI1_CTAR0 = (SPI1_CTAR0 & ~(SPI_CTAR_CPOL | SPI_CTAR_CPHA)) | dataMode << 25;}
  if (CTARn==1){
    SPI1_CTAR1 = (SPI1_CTAR1 & ~(SPI_CTAR_CPOL | SPI_CTAR_CPHA)) | dataMode << 25;}
  if (CTARn==2){
    SPI1_CTAR0_SLAVE = (SPI1_CTAR0_SLAVE & ~(SPI_CTAR_CPOL | SPI_CTAR_CPHA)) | dataMode << 25;}
  start();
}

void setFrameSize(uint8_t CTARn, uint8_t size) {
  stop();
  if (CTARn==0){
    SPI1_CTAR0 |= SPI_CTAR_FMSZ(size);}
  if (CTARn==1){
    SPI1_CTAR1 |= SPI_CTAR_FMSZ(size);}
  if (CTARn==2){
    SPI1_CTAR0_SLAVE |= SPI_CTAR_FMSZ(size);}
  start();
}

void setCTAR_SLAVE(uint8_t size, uint8_t dataMode){
  SPI1_CTAR0_SLAVE=0;
  setFrameSize(T3_CTAR_SLAVE, (size - 1));
  setMode(T3_CTAR_SLAVE, dataMode);
}

/* *************** SETUP ***************** */
void setup() {
    Serial.printf("Setting up slave\n");
    begin_SLAVE(SCK1, MOSI1, MISO1, T3_SPI1_CS0);
    Serial.printf("Setting up CTAR\n");
    setCTAR_SLAVE(16, T3_SPI_MODE0);
    Serial.printf("Setting up receive\n");
    setup_dma_receive();
}

void loop() {
  delay(1000);
  Serial.printf("\n\nBuf %x %x %x %x %x %x\n\n", spi_rx_dest[0], spi_rx_dest[1], spi_rx_dest[4], spi_tx_out[0], spi_tx_out[1], spi_tx_out[2]);
  (void) SPI1_POPR;(void) SPI1_POPR;
    SPI1_SR |= SPI_SR_RFDF;
  dma_rx.enable();
}

The response I have is this (spi master serial output):

Sent: 1 Received: 0
Sent: 2 Received: 0
Sent: 3 Received: beef
Sent: 4 Received: beef
Sent: 5 Received: beef
Sent: 6 Received: beef
Sent: 7 Received: beef
Sent: 8 Received: beef
Sent: 9 Received: beef
Sent: a Received: beef
Sent: b Received: beef
Sent: c Received: beef
Sent: d Received: beef
Sent: e Received: abcd
Sent: f Received: 1
Sent: 10 Received: 1
Sent: 11 Received: 1
Sent: 12 Received: 1
Sent: 13 Received: 1
Sent: 14 Received: 1
Sent: 15 Received: 1
Sent: 16 Received: 1
Sent: 17 Received: 1
Sent: 18 Received: 1
---------------------------------------
Sent: 1 Received: 0
Sent: 2 Received: 0
Sent: 3 Received: beef
Sent: 4 Received: beef
Sent: 5 Received: beef
Sent: 6 Received: beef
Sent: 7 Received: beef
Sent: 8 Received: beef
Sent: 9 Received: beef
Sent: a Received: beef
Sent: b Received: beef
Sent: c Received: beef
Sent: d Received: beef
Sent: e Received: abcd
Sent: f Received: 1
Sent: 10 Received: 2
Sent: 11 Received: 3
Sent: 12 Received: 4
Sent: 13 Received: 5
...................................
Sent: n Received: n - e
Sent: n + 1 Received: dcba
---------------------------------------
 
Last edited:
You can't use the same DMA trigger twice:
dma_pop.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);
...
dma_tx2.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);


You are writing random crap into the PUSHR control word:
dma_tx2.sourceBuffer((uint32_t *) spi_rx_dest, 24);
dma_tx2.destination(KINETISK_SPI1.PUSHR); // SPI1_PUSHR_SLAVE


Getting DMA with SPI working is tricky. Take a look at these threads:
https://forum.pjrc.com/threads/43585-Teensy-3-5-SPI-DMA
https://forum.pjrc.com/threads/43048-How-best-to-manage-multiple-SPI-busses/page3
 
Update: I was able to get dma spi working for the most part with concurrent send and receive. Thanks for the help.

Setup function:
Code:
    dma_rx.source((uint16_t&) KINETISK_SPI1.POPR);
    dma_rx.destinationBuffer((uint16_t*) incoming, PACKET_SIZE * 2);
    dma_rx.disableOnCompletion();
    dma_rx.interruptAtCompletion();
    dma_rx.triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);
    dma_rx.attachInterrupt(received_packet_isr);

    dma_tx.sourceBuffer((uint32_t *) garbage, PACKET_SIZE * 2);
    dma_tx.destination(KINETISK_SPI1.PUSHR); // SPI1_PUSHR_SLAVE
    dma_tx.disableOnCompletion();
    dma_tx.triggerAtTransfersOf(dma_rx);

I think that I need to do 32-bit writes to SPI1_PUSHR, even if only 16 of those bits get used; In slave mode the 16 most significant bits get thrown out as the control word is only used in master mode. Not sure about this though.

Also, the dma buffer size seems to be capped at 512 transfers, or 2048 bytes in my case, which places some limit on how much dma-ing the buffer can support. That's 2^9 and 2^11, which don't seem like likely candidates for dma counter sizes.

In the interrupt I process the packet, generate a response and re-enable dma. This time, MISO is active and MOSI is ignored.
Code:
        dma_rx.destinationBuffer((uint16_t*) garbage, sizeof(uint16_t) * sizeOfBuffer);
        dma_tx.sourceBuffer((uint32_t *) outgoing, sizeof(uint32_t) * sizeOfBuffer);
        dma_tx.triggerAtTransfersOf(dma_rx);
        dma_tx.enable();
        dma_rx.enable();

I have an interrupt that fires on chip select low, which resets the process

Code:
    dma_rx.disable();
    dma_tx.disable();
    (void) SPI1_POPR; (void) SPI1_POPR; SPI1_SR |= SPI_SR_RFDF;
    transmitting = false;
    dma_rx.destinationBuffer((uint16_t*) incoming, PACKET_SIZE * 2);
    dma_tx.sourceBuffer((uint32_t *) garbage, PACKET_SIZE * 2);
    dma_tx.triggerAtTransfersOf(dma_rx);
    dma_tx.enable();
    dma_rx.enable();

I'm facing a problem though when the master fails partway through a spi transmission. On future packets, the bits seem to be offset, i.e. instead of receiving 101110, I'd receive 010111. I can't reproduce the error so I don't have details but my hypothesis is it has something to do with a half-full POPR register. Does there exist a quick fix in which I can reset the push and pop register every time chip select goes down?
 
No, I ended up just doing 16-bit reads using DMA and having memory where only the lower 16 bits are used. It's suboptimal but I can't get the DMA to transmit more than 1024 bytes at a time so it doesn't end up wasting that much memory.
 
Do you maybe have an example sketch for slave mode 8bit read/write?
I'm trying to communicate with an raspberry pi and i got a library from one of the members of this forum but the comm is not reliable. Somtimes I experience like a shift in bits an I can't track down the problem :/
 
I'm already using this library. It works.. But its loosing sync.. Sometimes it works for half an hour sometimes more sometimes less... And always fails if i restart the teensy or RPI... It looks like the data from slave to master(teensy ->rpi) gets shifted a position in the array. I cant figure out why. I was digging trough kinetis.h to find a flag when the CS is low but I can't find it. And i'm too big of a noob to understand all the bitwise functions when using DMA or anything in kinetis.h for that matter... No noob documentation is on the web :(
 
Are you using entirely t3spi? That library uses interrupts, not DMA, to do slave mode communications which may contribute to your bit loss.
 
How do you mean entirely? I use the 8bit rxTx example.... I was looking at the source and to me it seems that interrupt is driven by the clock pulse. When the interrupt fires the ISR reads the popr register and the it writes to the pushr register...
 
In that case it is interrupt driven. I used this library before and it had some weird interactions with slave mode spi when the clock speed was high. I think the interrupt would not fire in time for the next spi clock pulse which would cause that problem. I've had fewer problems using DMA or with interrupts and low clock speeds (a few Mhz)
 
Lower clock speeds improved the operation but even at 1MHz i had problems... can you give me an example for 8 bitDMA slave? I need it to recieve and send back ... data is 8bits and 20bytes long, if it's a problem i can shorten it to 10 bytes :) Raspberry sends 32bits long so on the slave side I use a Union to make a long from 4 bytes.
 
As @tni mentioned earlier in this thread, getting SPI and DMA to work can be a bit tricky, especially if you wnat to do it for all of the Teensy boards...

I fumbled around with this for awhile before I was able to make it work sufficiently for adding Async Transfers to the SPI code base using DMA. And I should add I have not done much with it yet for SPI in slave mode. But I am interested for having it work between a Host like an RPI3 (or Odroid or UP board) and a Teensy.

Some side notes - Not sure if it will help you or not.

PUSHR - Is 32 bits, but only up to 16 are data bits, the other are control bits. Second side note: On T3.6 was able to setup DMA channels for 32 bits which allowed me to have the DMA also update the CS pins and the like. However this did not work on T3.2, and I don't remember if it worked on T3.5.

T3.5 - on SPI1/SPI2 is a challenge - more details in the threads tni mentioned.

SPI1/SPI2 on T3.6 - The PUSHR/POPR FIFO queues are only one entry deep. SPI0 is 4 entries deep. So if you are using Interrupts, you must be able to respond to the interrupt within one transfer time. So if you are doing ISR based setup better to use SPI0 if possible.

Will continue to follow along as to see how it works out.

Kurt
 
Hi

I have tried multiple examples....
This code is from a user here one the forum named WMXZ and i had no luck with this one.. Alot of garbage if i tried copying the recieve buffer to another "holder" for further data manipulation.... I get data but it's not consistent.. It shifts alot..
Code:
#define NDAT 8

int m_isMaster=0;
short xx[NDAT],yy[NDAT];

void logg(short *yy, int nn)
{ int ii;
  if(m_isMaster)
    Serial.printf("Master: ");
  else
    Serial.printf("Slave: ");
    
  for(ii=0;ii<nn;ii++) Serial.printf("%d ",yy[ii]); Serial.println();
}

void dma_ch0_isr(void)
{ DMA_CINT=0;
  DMA_CDNE=0;
}
void dma_ch1_isr(void)
{ DMA_CINT=1;
  DMA_CDNE=1;
  SPI0_MCR |= SPI_MCR_HALT | SPI_MCR_MDIS;
  logg(yy,NDAT);
}

void startXfer(void)
{  int ii;
//
 for(ii=0;ii<NDAT;ii++) yy[ii]=0;
 if(m_isMaster)
   for(ii=0;ii<NDAT;ii++) xx[ii]=1+ii;
 else
   for(ii=0;ii<NDAT;ii++) xx[ii]=1+2*ii;
   
  //spi SETUP  
  KINETISK_SPI0.MCR =  //SPI_MCR_MDIS |   // module disable 
    SPI_MCR_HALT |   // stop transfer
    SPI_MCR_PCSIS(0x1F); // set all inactive states high
  KINETISK_SPI0.MCR |= SPI_MCR_CLR_TXF;
  KINETISK_SPI0.MCR |= SPI_MCR_CLR_RXF;
  
  if(m_isMaster) KINETISK_SPI0.MCR |= SPI_MCR_MSTR;
  
#if F_BUS == 48000000
#define SPI_CLOCK   (SPI_CTAR_PBR(2) | SPI_CTAR_BR(0) | SPI_CTAR_DBR) //(48 / 5) * ((1+1)/2) = 9.8 MHz
#endif

  if(m_isMaster)
  KINETISK_SPI0.CTAR0 = SPI_CLOCK | 
  SPI_CTAR_FMSZ(15) |
  SPI_CTAR_PCSSCK(0) |
  SPI_CTAR_PASC(0) |
  SPI_CTAR_PDT(0) |
  SPI_CTAR_CSSCK(0) |
  SPI_CTAR_ASC(0) |
  SPI_CTAR_DT(0);
  else
  SPI0_CTAR0_SLAVE = SPI_CTAR_FMSZ(15);

  // start SPI
   KINETISK_SPI0.RSER = SPI_RSER_TFFF_DIRS | SPI_RSER_TFFF_RE | // transmit fifo fill flag to DMA
    SPI_RSER_RFDF_DIRS | SPI_RSER_RFDF_RE;  // receive fifo drain flag to DMA

  if(m_isMaster)
  KINETISK_SPI0.MCR = SPI_MCR_MSTR;
  else
  KINETISK_SPI0.MCR = 0;   

// set transmit
  if(m_isMaster)
      DMA_TCD0_DADDR=&SPI0_PUSHR;
  else
      DMA_TCD0_DADDR=&SPI0_PUSHR_SLAVE;
        DMA_TCD0_DOFF=0;
        DMA_TCD0_DLASTSGA= 0;

        DMA_TCD0_ATTR=1<<8|1;
        DMA_TCD0_NBYTES_MLNO=2;
        
        DMA_TCD0_SADDR=xx;
        DMA_TCD0_SOFF=2;
        DMA_TCD0_SLAST=-2*NDAT;
        
        DMA_TCD0_CITER_ELINKNO = DMA_TCD0_BITER_ELINKNO=NDAT;
        
        DMA_TCD0_CSR = DMA_TCD_CSR_INTMAJOR | DMA_TCD_CSR_DREQ;
        
  DMAMUX0_CHCFG0 = DMAMUX_DISABLE;
  DMAMUX0_CHCFG0 = DMAMUX_SOURCE_SPI0_TX | DMAMUX_ENABLE;

// set receive
  DMA_TCD1_SADDR=&SPI0_POPR;
  DMA_TCD1_SOFF=0;
        DMA_TCD1_SLAST= 0;

        DMA_TCD1_ATTR=1<<8|1;
        DMA_TCD1_NBYTES_MLNO=2;
        
        DMA_TCD1_DADDR=yy;
        DMA_TCD1_DOFF=2;
        DMA_TCD1_DLASTSGA=-2*NDAT;
        
        DMA_TCD1_CITER_ELINKNO = DMA_TCD1_BITER_ELINKNO=NDAT;
        
        DMA_TCD1_CSR = DMA_TCD_CSR_INTMAJOR | DMA_TCD_CSR_DREQ;
           
  DMAMUX0_CHCFG1 = DMAMUX_DISABLE;
  DMAMUX0_CHCFG1 = DMAMUX_SOURCE_SPI0_RX | DMAMUX_ENABLE;

  //start DMA
        DMA_SERQ = 0;
  NVIC_ENABLE_IRQ(IRQ_DMA_CH0);
        DMA_SERQ = 1;
  NVIC_ENABLE_IRQ(IRQ_DMA_CH1);
}

void setup(void)
{
    SIM_SCGC6 |= SIM_SCGC6_SPI0;
    SIM_SCGC6 |= SIM_SCGC6_DMAMUX;
    SIM_SCGC7 |= SIM_SCGC7_DMA;

    DMA_CR = 0;
    
//    while(!Serial);
    
    // here we must check if we are master or slave
    pinMode(23, INPUT_PULLUP);
    delay(10); // give some tie to settle
    if (digitalRead(23)) m_isMaster=0; else m_isMaster=1;
    pinMode(23, INPUT); // to avoid too much current (is it necesary?)
    
    Serial.printf("isMaster = %d\n",m_isMaster);
    
  if(m_isMaster)
  {
    CORE_PIN2_CONFIG = PORT_PCR_SRE | PORT_PCR_DSE | PORT_PCR_MUX(2); // SRE slew rate enable
    CORE_PIN14_CONFIG = PORT_PCR_SRE | PORT_PCR_DSE | PORT_PCR_MUX(2); // DSE drive strength enable
    CORE_PIN7_CONFIG = PORT_PCR_SRE | PORT_PCR_DSE | PORT_PCR_MUX(2); 
    CORE_PIN8_CONFIG = PORT_PCR_MUX(2);
  }
  else
  {
    CORE_PIN2_CONFIG = PORT_PCR_MUX(2); 
    CORE_PIN14_CONFIG = PORT_PCR_MUX(2); 
    CORE_PIN7_CONFIG = PORT_PCR_SRE | PORT_PCR_DSE | PORT_PCR_MUX(2); 
    CORE_PIN8_CONFIG = PORT_PCR_MUX(2);
  }
    
}

void loop(void)
{ startXfer();
  delay(100);
}

And i have tried this library..
https://github.com/btmcmahan/Teensy-...ster/t3spi.cpp

Thisone is interrupt driven and i get data to the teensy very good. Sometimes it shifts and jumps back but it works.
But there is a problem when returning data to the slave... This one is sometimes offseted by one or more bits...
The slave is sending 8bit data and the master is recieving 32bit data (Rpi) so i need to print 4 bytes for one master byte... So if I print txData[0]=0x01,txData[1]=0x00,txData[2]=0x00,txData[3]=0x00 the master gets rxBuf[1] =0x01. The strange thing is what is inside rxBuf[0]? Why isn't this data inside buffer 0?
And when all goes haywire i start getting that 0x01 inside rxBuf[0] but now it in the MSB!! So the data inside the rxBuf[0] is now 0x01000000.. Looks like it shifts one 8bit byte ... I don't know this is getting too much for my brain and programming experience:))))
 
I hope you tried the code also with commenting the call to 'logg' in 'dma_ch1_isr'

writing to usb serial out of an ISR is a NO-NO.
It was done in my original program only to test for short buffers and slow speed.

what is F_BUS in your case?
 
As I mentioned I was playing around a little bit awhile ago (maybe 5-6 months ago). Again I don't know if this helps or not.

I was experimenting with trying to make a version of SPI library that supported slave mode and ran into some similar issues.

That is, it is difficult to control what the slave sends back to the master. I did not have any problems having the slave receive data.

The issue was again what the slave should send back and the timing of it. That is when the master starts sending data to the slave, the data sent back is whatever currently is in the TX FIFO queue on the slave side. Not sure if this helps, but what i was trying back then was something like, having the host talk to the teensy as if there was a set of logical registers that it could read or write to.

At the time I was playing with a few different experiments controlling reading versus writing. One was to have a different command and the other was to use another IO line to control. But if we look at different commands: <command =1 write, 2= read> <starting bytes> < count of bytes>

Again writing worked fine, like lets write 4 bytes starting at register 0x80: 0x1, 0x80, 0x4, byte1, byte2, byte3, byte4... The slave had no problem getting this type data.

But now suppose I wish to do a Read operation for those same 4 bytes: the Master would maybe output:
0x2, 0x080, 0x4, 0, 0, 0, 0

And assume that the 4 register values were 1, 2, 3, 4

The question is what does the slave do to synchronize the data going back, we probably want the data bytes going back to be:
0, 0, 0, 1, 2, 3, 4

But how do you know when to fill the slave TX FIFO with the right data. That is if your code has no clue what to output until the 2nd byte has been received, which probably implies the third byte has already begun to be output. so maybe you are safe to push it as soon as you see the 2nd bytes value to know starting register and then hopefully you also receive the 3rd byte fast enough to know if you need to push more data bytes back. But if your code does not respond quick enough and the MASTER side has already begun the next entry, then it will probably reuse whatever is in the SPI queue (last entry)...

But this was specific to the type of usage I was trying to do (i.e. variable input and output). My next thing I was going to try was to see if I was better off on maybe disabling the FIFO queues on the slave side and just double buffer.

Also your millage may differ again depending on your data and control. That is if every time the master sets your slave chip select, you will send some fixed number of bytes, not depending on receiving data from master, than once you get the CS, you could hopefully preload the output FIFO with the data, such that they are all picked up in the right order. Could probably easily work with ISRs at that point to continue to fill the remaining bytes if necessary. Not sure what additional complications using DMA would do to this.

And sorry if these rambling here don't help in your cases.
 
Man I really love you explanation but unfortunately i don' know what to do with the info :) i was reading some other threads and someone was explaning that there was a bug in the silicone... To me all this info seems like it's not doable... Atleast a relaible communication...
This thing I'm wotking with is an app caled Machinekit that runs on RPi and the real time OS sends position data over SPI every millisecond... If there was an alternate solution for fas speed transfers I woild use that, bu SPI runs so fast:)
I will try some more things ad we will se how it goes:)
 
If the data you is mainly simply going from RPI -> Teensy and as such you are not worried about getting information back from the Teensy then i would think it would be straight forward.

I have been playing around with some different Teensy Shields, like the one I soldered together this last week:
XYZ-Complete.jpg
Which I screwed up some of the circuits on it... Or a T3.6 version
T3.6-RPI-Hat-V3.jpg
I have some jumpers or the like setup to allow me to play around with some different options, including SPI. So far for me, the best interface to use is Serial. I am setup with jumpers to connect the Serial port that is part of the RPI connector to Serial2 on the Teensy.

But maybe I need to experiment again. What data are you sending both ways?
 
What are those boards doing? Nice shields:) yeah... It would work only one way... But the teensy is returning actual position data... Rpi sends wanted position data and the teensy returns actual position :) besides position data the rpi sends data for outputs and teensy sends back status of inputs (home switches,...).
This is an already working project with a PIC32MX... But i want to make it with a teensy :)
 
Thought I would mention, that every once in awhile I play around with some of the DMA SPI slave code to see if I can make it work reasonably. Current test versions is to use two T3.6s connected to each other. Right now I have them sending 16 byte packets of data to each other... Note: the data from one does not currently influence the data being sent by the other. That is currently I don't have it setup where the Master makes a request, which then the slave returns data based on the stuff from master. I may do that next... Or may try that from an RPI (or ODroid or UP) to a Teensy.

Current Master test program:
Code:
//==============================================================
// SPI Master quick test - DMA version

#include <SPI.h>

EventResponder event;
uint8_t  xxx[16];
uint8_t  yyy[16];

void asyncEventResponder(EventResponderRef event_responder) {
  digitalWriteFast(2, HIGH);
  SPI.endTransaction();
  Serial.print("YYY: ");
  for (uint8_t i = 0; i < sizeof(yyy); i++) Serial.printf("%02x ", yyy[i]);
  Serial.println();
}
void setup() {
  while (!Serial && (millis() < 2000)) ;
  Serial.println("Test SPI DMA master");
  //SPI.setMISO(8);
  //SPI.setMOSI(7);
  SPI.setSCK(14);
  pinMode(2, OUTPUT);
  digitalWriteFast(2, HIGH);
  SPI.begin();
  event.attachImmediate(&asyncEventResponder);
  for (uint8_t i = 0; i < sizeof(xxx); i++) {
    xxx[i] = i;
  }
}

void loop() {
  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  digitalWriteFast(2, LOW);
  SPI.transfer(xxx, yyy, sizeof(xxx), event);
  delay(1000);
}
This is using the asynch version of the SPI transfer (i.e. DMA). Actually it is not needed here, but did to show how it can work...

The Slave code is a little more complex here. I was curious if I could control what got sent back, and decide when the master signals that the transfer was about to start, so I have the Slave Chip select pin. I am using pin2... It must be a CS that can be used as a Slave CS (Could have used 10 here as well). There is still a lot of diagnostic code that is commented out in here, to help debug some of the DMA issues. Also I have an interrupt on the TX side as well as the RX side. The TX one is not needed, It will be triggered as soon as the last byte of the transfer is put into the TX FIFO. It is the RX one that lets you know that the whole transfer was completed. I left the TX one in as it could be used to say, I am now free to update the TX buffer.
Code:
//==============================================================
// SPI Slave quick test - DMA... See if I can use SPI system to intialize.
#include <SPI.h>
#include <DMAChannel.h>

DMAChannel   *_dmaTX = nullptr;
DMAChannel    *_dmaRX = nullptr;
uint8_t loop_count = 0;

uint8_t  xxx[16];
uint8_t  yyy[16];
volatile uint8_t spi_client_state = 0;
uint8_t previous_spi_client_state = 0;

void setup() {
  while (!Serial && (millis() < 2000)) ;
  pinMode(13, OUTPUT);
  pinMode(3, OUTPUT);
  for (int i = 0; i < 5; i++) {
    digitalWriteFast(13, HIGH);
    delay(250);
    digitalWriteFast(13, LOW);
    delay(250);
  }
  Serial.println("Test SPI DMA Slave");
//  SPI.setMISO(8);
//  SPI.setMOSI(7);
  SPI.setSCK(14);
  SPI.begin();  // This sets the MISO/MOSI/SCK pins and gets us access to memory.

  // Now lets convert this over to slave.
  KINETISK_SPI0.MCR =  SPI_MCR_HALT | SPI_MCR_PCSIS(0x1F); // stop transfer all states high
  SPI0_CTAR0_SLAVE = SPI_CTAR_FMSZ(7);                    // We are doing 8 bit transfers.

  // Now setup pin 2 as Slave select
  SPI.setCS(2); // sets the pin as a hardware chip select pin.
  attachInterrupt(2, &CSPinFalling, FALLING);

  _dmaRX = new DMAChannel();
  _dmaRX->disable();
  _dmaRX->attachInterrupt(&dma_rxisr);
  _dmaRX->interruptAtCompletion();
  _dmaRX->source((volatile uint8_t&)KINETISK_SPI0.POPR);
  _dmaRX->TCD->ATTR_SRC = 0;    //Make sure set for 8 bit mode...
  _dmaRX->triggerAtHardwareEvent(DMAMUX_SOURCE_SPI0_RX);
  _dmaRX->disableOnCompletion();

  _dmaTX = new DMAChannel();
  _dmaTX->destination((volatile uint8_t&)KINETISK_SPI0.PUSHR);
  _dmaTX->TCD->ATTR_DST = 0;    // Make sure set for 8 bit mode
  _dmaTX->triggerAtHardwareEvent(DMAMUX_SOURCE_SPI0_TX);
  _dmaTX->attachInterrupt(&dma_txisr);
  _dmaTX->interruptAtCompletion();
  _dmaTX->disableOnCompletion();

  _dmaTX->disable();

  KINETISK_SPI0.MCR =  SPI_MCR_HALT | SPI_MCR_PCSIS(0x1F); // stop transfer all states high
  KINETISK_SPI0.MCR |= SPI_MCR_CLR_TXF;
  KINETISK_SPI0.MCR |= SPI_MCR_CLR_RXF;


  for (uint8_t i = 0; i < sizeof(xxx); i++) {
    xxx[i] = 's';
  }
  //setupSlaveDMA();  // Setup once at startup.
}

void setupSlaveDMA() {
  // Now lets setup DMA

  _dmaTX->sourceBuffer(xxx, sizeof(xxx));

  _dmaRX->destinationBuffer((uint8_t*)yyy, sizeof(yyy));
  //dumpDMA_TCD("TX:", _dmaTX);
  //dumpDMA_TCD("RX:", _dmaRX);
  SPI0_MCR = 0;
  SPI0_RSER = SPI_RSER_RFDF_RE | SPI_RSER_RFDF_DIRS | SPI_RSER_TFFF_RE | SPI_RSER_TFFF_DIRS;
  //SPI0_SR = 0xFF0F0000;
  _dmaTX->enable();
  _dmaRX->enable();
  //Serial.printf("SPI MCR=%x RSER=%x SR=%x\n", SPI0_MCR, SPI0_RSER, SPI0_SR);
  //Serial.printf("DMA CR: %x ES: %x ERQ: %x EEI: %x ERR: %x\n", DMA_CR, DMA_ES, DMA_ERQ, DMA_EEI, DMA_ERR);
}

void loop() {
  uint8_t new_client_state = spi_client_state; // get it in one shot
  if (new_client_state != previous_spi_client_state) {
    switch (new_client_state) {
      case 0: /*Serial.println();*/ break;// break to new line
      case 1:
        break;
      case 2:
        digitalWriteFast(13, LOW);
        Serial.print("XXX: ");
        for (uint8_t i = 0; i < sizeof(xxx); i++) Serial.printf("%02x ", xxx[i]);
        Serial.print("\nYYY: ");
        for (uint8_t i = 0; i < sizeof(yyy); i++) Serial.printf("%02x ", yyy[i]);
        Serial.println("");
        spi_client_state = 0;
        new_client_state = 0;
        loop_count++;
        uint8_t val = loop_count;
        for (uint8_t i = 0; i < sizeof(xxx); i++) xxx[i] = val++;
        //setupSlaveDMA();  // Setup again after the previous one completed.

    }
    previous_spi_client_state = new_client_state;
  }
  delay(1);
}

typedef struct  __attribute__((packed, aligned(4))) {
  uint32_t SADDR;
  int16_t SOFF;
  uint16_t ATTR;
  uint32_t NBYTES;
  int32_t SLAST;
  uint32_t DADDR;
  int16_t DOFF;
  uint16_t CITER;
  int32_t DLASTSGA;
  uint16_t CSR;
  uint16_t BITER;
} TCD_DEBUG;

void dumpDMA_TCD(const char *psz, DMABaseClass *dmabc)
{
  Serial.printf("%s %08x %08x:", psz, (uint32_t)dmabc, (uint32_t)dmabc->TCD);
  TCD_DEBUG *tcd = (TCD_DEBUG*)dmabc->TCD;
  Serial.printf("%08x %04x %04x %08x %08x ", tcd->SADDR, tcd->SOFF, tcd->ATTR, tcd->NBYTES, tcd->SLAST);
  Serial.printf("%08x %04x %04x %08x %04x %04x\n", tcd->DADDR, tcd->DOFF, tcd->CITER, tcd->DLASTSGA,
                tcd->CSR, tcd->BITER);

}


void CSPinFalling() {
  // setup DMA slave transfer
  digitalWriteFast(3, HIGH);
  spi_client_state = 1; // we are now set to be in receive mode
  setupSlaveDMA();  // Setup once at startup.
  digitalWriteFast(3, LOW);
  digitalWriteFast(13, HIGH);
}

//-------------------------------------------------------------------------
// DMA RX ISR
//-------------------------------------------------------------------------
void dma_rxisr(void) {
//  Serial.println(">");
  _dmaRX->clearInterrupt();
  _dmaRX->clearComplete();


  SPI0_RSER = 0;
  //port().MCR = SPI_MCR_MSTR | SPI_MCR_CLR_RXF | SPI_MCR_PCSIS(0x1F);  // clear out the queue
  SPI0_SR = 0xFF0F0000;
  //  port().CTAR0  &= ~(SPI_CTAR_FMSZ(8));   // Hack restore back to 8 bits
  spi_client_state = 2; // Say that we completed here.
}

//-------------------------------------------------------------------------
// DMA TX ISR
//-------------------------------------------------------------------------
void dma_txisr(void) {
//  Serial.println("!TX!");
/*  dumpDMA_TCD("TX:", _dmaTX);
  dumpDMA_TCD("RX:", _dmaRX);
  Serial.printf("SPI MCR=%x RSER=%x SR=%x\n", SPI0_MCR, SPI0_RSER, SPI0_SR);
  Serial.printf("DMA CR: %x ES: %x ERQ: %x EEI: %x ERR: %x\n", DMA_CR, DMA_ES, DMA_ERQ, DMA_EEI, DMA_ERR);
*/
  _dmaTX->clearInterrupt();
  _dmaTX->clearComplete();


}

If I decide to play around where maybe the Teensy has a set of logical Registers that the host can set or can query. I may set it up such that it may do it something like:
Send a fixed size command header, with Command, start reg, count reg - How big this message is may depend on how many registers I support
You would set and clear the chip select with this header. With a very slight delay, I would then reselect the CS pin and for logical writes, output the data to the slave, or buffer characters. At this point the slave would now how many characters the master is sending and again can set up their DMA transfers...
 
Forgot to show the outputs... On the Master side: just showing the data coming back from Slave:
Code:
YYY: 1c 1d 1e 1f 20 12 13 14 15 16 17 18 19 1a 1b 1c 
YYY: 1d 1e 1f 20 21 13 14 15 16 17 18 19 1a 1b 1c 1d 
YYY: 1e 1f 20 21 22 14 15 16 17 18 19 1a 1b 1c 1d 1e 
YYY: 1f 20 21 22 23 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
YYY: 20 21 22 23 24 16 17 18 19 1a 1b 1c 1d 1e 1f 20 
YYY: 21 22 23 24 25 17 18 19 1a 1b 1c 1d 1e 1f 20 21 
YYY: 22 23 24 25 26 18 19 1a 1b 1c 1d 1e 1f 20 21 22

The Slave side shows both the data it received and the data it sent back:
Code:
XXX: 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 
YYY: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 
XXX: 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 
YYY: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 
XXX: 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 
YYY: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 
XXX: 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 
YYY: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 
XXX: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 
YYY: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f

Argh looks like it is wrong again :( ... Need to look again... Where the code updates the XXX array in the TX side as the data is wrong when sent...



screenshot.jpg

I have some the SPI pins on both boards shown in the logic analyzer. Channel 7 shows when I setup a attachIntterupt to the Chip select pin to interrupt on falling. The pin is still left in SPI mode, but I do get this interrupt. During that pulse is when I update the output data for the code...

The text output is showing I am updating the whole array but the SPI data is not right... So again maybe back to drawing board. Note: It works easier when all of the data and registers are updated outside of this. Timings are much easier... Will probably revert to that...
 
Status
Not open for further replies.
Back
Top