teensy 3.0 SPI with DMA -- nice try

Status
Not open for further replies.

manitou

Senior Member+
Does any one have a working snippet of using DMA with SPI on teensy 3.0. Here is my non-working attempt ...

Code:
//  teensy  SPI + DMA   TODO
//   CS,MOSI,MISO,CLK  pins 10-13
//   dma ch0 xmit (mem to SPI)     ch1 recv  (SPI to mem)
//  try  byte, 16-bit, and 32-bit dma transfers  ? SPI 16-bit
//  do exchange, read/write 0xff,  write/sink read
// don't need ISR for now, spin wait
// ref https://github.com/hughpyle/teensy-i2s  our mem2mem


#include <SPI.h>

#define CSpin 10
#define RSER_RFDF_DIRS (1<<16)
#define RSER_RFDF_RE (1<<17)
#define RSER_TFFF_DIRS (1<<24)
#define RSER_TFFF_RE (1<<25)


volatile int DMAdone=0;
#define DMA_CINT_CINT(n) ((uint8_t)(n & 3)<<0) // Clear Interrupt Request


void dma_ch0_isr(void)
{
  DMAdone=1;
  DMA_CINT = DMA_CINT_CINT(0); // use the Clear Intr. Request register
}

void dma_ch1_isr(void)
{
  DMAdone=1;
  DMA_CINT = DMA_CINT_CINT(1); // use the Clear Intr. Request register
}

void spidma_init() {
		// set size, and SPI address  for xmit and recv dma
		//  TODO enable DMA in SPI regs ...
		SPI0_RSER =  RSER_RFDF_DIRS | RSER_RFDF_RE | RSER_TFFF_DIRS | RSER_TFFF_RE;
    	DMA_TCD0_DADDR = &SPI0_PUSHR;
    	DMA_TCD1_SADDR = &SPI0_POPR;
        DMA_TCD0_ATTR = DMA_TCD_ATTR_SSIZE(0) | DMA_TCD_ATTR_DSIZE(0); //8 bit
        DMA_TCD1_ATTR = DMA_TCD_ATTR_SSIZE(0) | DMA_TCD_ATTR_DSIZE(0); //8 bit
        DMA_TCD0_SLAST = 0;
        DMA_TCD1_SLAST = 0;
        DMA_TCD0_DOFF = 1;  // increment
        DMA_TCD1_DOFF = 1;  // increment
        DMA_TCD0_CITER_ELINKNO = 1;
        DMA_TCD1_CITER_ELINKNO = 1;
        DMA_TCD0_DLASTSGA = 0;
        DMA_TCD1_DLASTSGA = 0;
        DMA_TCD0_BITER_ELINKNO = 1;
        DMA_TCD1_BITER_ELINKNO = 1;
        DMAdone=0;
//      DMA_TCD1_CSR = DMA_TCD_CSR_INTMAJOR; // interrupt on major loop completion
}

void myspi_init() {
	// set up SPI speed and dma   ? FIFO
	SPI.begin();
	SPI.setBitOrder(MSBFIRST);
	SPI.setDataMode(SPI_MODE0);
	pinMode(CSpin,OUTPUT);

	spidma_init();
}

void spidma_transfer(char *inbuf, char *outbuf, int bytes) {
}

void spidma_write(char *outbuf, int bytes) {
	static int sink;
	digitalWrite(CSpin,LOW);
    DMA_TCD0_SADDR = outbuf;
    DMA_TCD1_DADDR = &sink;    // ignore bytes from SPI
    DMA_TCD0_NBYTES_MLNO = bytes;
    DMA_TCD1_NBYTES_MLNO = bytes;
    DMA_TCD1_CSR |= (1<<6) | (1<<3);  // active and clear SERQ
    DMA_TCD0_CSR |= (1<<6) | (1<<3);   // start transmit
	DMA_SERQ =0;  // enable channel
	DMA_SERQ =1;  // enable channel
	while (!(DMA_TCD0_CSR & DMA_TCD_CSR_DONE)) /* wait */ ;
	digitalWrite(CSpin,HIGH);
}

void spidma_read(char *inbuf, int bytes) {
	static int whatever = 0xffffffff;
	digitalWrite(CSpin,LOW);
    DMA_TCD0_SADDR = &whatever;
    DMA_TCD1_DADDR = inbuf;
    DMA_TCD0_NBYTES_MLNO = bytes;
    DMA_TCD1_NBYTES_MLNO = bytes;
    DMA_TCD1_CSR |= (1<<6);
    DMA_TCD0_CSR |= (1<<6);   // start transmit
	DMA_SERQ =0;  // enable channel
	DMA_SERQ =1;  // enable channel
	while (!(DMA_TCD1_CSR & DMA_TCD_CSR_DONE)) /* wait */ ;
	digitalWrite(CSpin,HIGH);
}


void setup() {
	Serial.begin(9600);
	while(!Serial.available()){
		Serial.println("hit a key");
		delay(1000);
	}
	Serial.read();
	Serial.println("ok");
	myspi_init();
	Serial.println(SPI0_CTAR0,HEX);
	Serial.println(SPI0_RSER,HEX);
		delay(2000);
}

void loop() {
	char buff[1000];
	unsigned int t1,t2;

	Serial.println("looping");
	t1 = micros();
	spidma_write(buff,sizeof(buff));
	t2 = micros() - t1;
	Serial.println(t2);
	Serial.println(SPI0_CTAR0,HEX);
	Serial.println(SPI0_RSER,HEX);
	delay(2000);
}
 
From looking at your code in spidma_write you're waiting for channel 1 to finish, but you're using channel 0 for writing the output. I suspect channel 1 is going through it's nBytes MUCH faster than channel 0 - change your next to last line in there to:

while (!(DMA_TCD0_CSR & DMA_TCD_CSR_DONE)) /* wait */ ;

to wait for the write to complete. (Nice timing on this, i've been going through the DMA docs tonight and am working on a set of classes/functions to control DMA - at least as far as I need for the short term for the FastSPI_LED library).
 
From looking at your code in spidma_write you're waiting for channel 1 to finish, but you're using channel 0 for writing the output. I suspect channel 1 is going through it's nBytes MUCH faster than channel 0 - change your next to last line in there to:

while (!(DMA_TCD0_CSR & DMA_TCD_CSR_DONE)) /* wait */ ;

to wait for the write to complete. (Nice timing on this, i've been going through the DMA docs tonight and am working on a set of classes/functions to control DMA - at least as far as I need for the short term for the FastSPI_LED library).

Yes, I have tried checking DONE on channel 0, but spidma_write() terminates even more quickly. I've tried waiting for DONE on both channels, tried disabling the RXFIFO, tried messin' with BWC (bandwidth control), tried only activating DMA channel 0, ... not very systematic nor successful. I continue to explore other combinations/settings.
 
I've added in some DMAMUX stuff (github source updated) based on K40 discussion/example here
https://community.freescale.com/thread/304835

but still it seems to be running at "memory" speed (too fast) and the DMA is not being clocked by the SPI ....
I am now seeing an error on channel 1, here are some regs after an iteration
Code:
looping
66
SPI0_CTAR0 0x38011001
SPI0_RSER 0x3030000
SPI0_SR 0xC2080002
SPI0_MCR 0x80000000
SPI0_TCR 0x280000
DMA_TCD0_CSR 0x8
DMA_TCD1_CSR 0x88
DMA_ES 0x80000002

Experiments continue ...

(the I2S DMA example also used DMAMUX)
 
Just for the record, SPI+DMA is a lost cause. Here's a quote from freescale forum
https://community.freescale.com/thread/314019
Freescale DSPI 'DMA' is a 'little bit inconvenient'. The loading of SPIx_PUSHR FIFO registers requires 32-bit writes, the top-half of which are SPI-controls. Thus, if you want to DMA-out a 'block', you have to intersperse your data as the 'bottom byte or word' in these 32-bit words, meaning your data-block is 'non contiguous'. This puts an 'extra step' in your data-block handling to interleave data & controls that MAY preclude any advantage you were hoping to gain from DMA..

sigh
 
I've succeeded partially. My experience is that 8-bit writes to PUSHR do work, but there has to be some glue code that manages start and end of a transmission. I can post the code later, but it's sketchy and requires some tweaking of the linker script. Apart from that, it happily feeds a display in the background.

Regards

Christoph
 
OK, I'll have to clean up the code a bit before I post it here, because it contains leftovers of my previous attempts at using the hardware chip selects (don't try that at home). But I can outline it first:

Major DMA Limitation
There's a bug in the silicon that makes the Scatter-Gather feature of the DMA useless. So we're limited to one Major loop per SPI "transfer" (that is a transfer of multiple words) unless some ISR reloads the TCD or channel linking is used

SPI Limitations
  • The SPI hardware has chip select signals, but they need to be activated by asserting the appropriate bits in the upper half-word of PUSHR.
  • Short writes to PUSHR are filled with zeros, i.e. they will clear previously set bits in the upper half-word of PUSHR. It would be cool if the SPI kept these bits during short writes.
  • The above two points are also true for the transfer attribute select bits and other control bits in PUSHR.
This boils down to this thing can't do much more than an AVR SPI if you want DMA. In fact, the resulting code I've come up with looks pretty much like my AVR code that handles SPI transfers in the background. On the plus side, as we can't use the hardware CS signals, we are forced to implement our own CS scheme, which in turn can be anything we like. We can even place the chip selects on an I²C port expander, if such madness is necessary.

An SPI transfer is described by this class (could also be a struct, but I might create a proper interface some day):
Code:
class SpiTransfer
{
  public:
    class Callback
    {
      public:
      /**
        select: true if a chip should be selected
          false otherwise (deselect)
      **/
      virtual void operator()(bool select)
      {

      };
    };
    enum State
    {
      idle,
      done,
      txUnderflow,
      pending,
      inProgress
    };
    SpiTransfer(const uint8_t* pSource,
                const uint16_t& size,
                volatile uint8_t* pDest = NULL,
                const uint32_t& flags = 0, // might toss this out
                Callback* cb = NULL
    ) : m_state(State::idle),
    m_pSource(pSource),
    m_size(size),
    m_pDest(pDest),
    m_flags(flags),
    m_pNext(NULL),
    m_pCallback(cb) {};
    bool busy() const {return ((m_state == State::pending) || (m_state == State::inProgress));}
//  private:
    volatile State m_state;
    const uint8_t* m_pSource;
    uint16_t m_size;
    volatile uint8_t* m_pDest;
    uint32_t m_flags;
    SpiTransfer* m_pNext;
    Callback* m_pCallback;
};
SPI transfers read from SpiTransfer::m_pSource to PUSHR and from POPR to SpiTransfer::m_pDest. If m_pSource is NULL, the lower half of m_pFlags is used as a fill value (i.e. in order to transfer 6kB of 0xFF's to a slaves, you don't need to prepare a 6kB buffer first). If m_pDest is NULL, POPR is read anyway, but the value is discarded (my DMA-SPI class has a static devNull member for that).

The upper half-word of the flags was used to hold chip select and transfer attribute select bits, but that part is unused now.

The DMA-SPI manages an intrusive linked list of SpiTransfer objects, this is what m_pNext is used for. It points to the next transfer in the queue. The queue has no additional object if this pointer is NULL.

The callback class is just an abstract unary functor with a bool argument. If this argument is true, a chip must be selected (this is where we can implement our own chip select scheme to our taste); if it is false, the chip must be deselected. An implementation of this functor can also be used to call pre- and post-transfer code that does additional stuff like setting application flags and such. Beware: might be called in an ISR context. Or not. Depends.

Basic program flow during a transfer:
  • The transfer is registered with the DMA-SPI driver and started immediately (queue empty) or queued for later handling. The transfer is now State:: pending.
  • When a transfer is started, it is State::inProgress. I might add a cancel() function that removes a transfer from the queue.
  • m_pCallback(true) is called
  • The PUSHR FIFO is filled from m_pSource or from a fill variable. The DMA source increment size is adjusted accordingly.
  • Remaining bytes are handled by the DMA. The previous filling of the FIFO might not be necessary, it just felt right.
  • The last byte is not handled by the DMA, but in the Tx DMA channel ISR. This is also a leftover from my previous attempts. The Tx DMA Channel ISR also clear the SPI's hardware request enable flag so that no more transfers are requested by the SPI (Transfmit FIFO Fill Flag, TFFF)
  • The Rx DMA Channel is configured to receive the requested number of bytes to the destination. This can either by a real buffer or a dump variable, with the destination increment size adjusted accordingly.
  • When all bytes have been received, the current transfer is State::done. The Tx FIFO is empty now (this is why it can be filled when a transfer is started, see above)
  • m_pCallback(false) is called.
  • A pending transfer is started.

I've made an attempt to make this whole thing templated so that it's possible to create a DMA-SPI object for any existing hardware SPI. I did this because I expected to use chips with more than one hardware SPI, which became incredibly easy with the new Teensy 3.1. Same for the DMA channel numbers, these are also a template parameter. I'll have a closer look at the new chip if it has more than one DMAMUX, but that will be easy to add.

Again, I'll post the code when it's more readable and cleaned up. This summary already revealed some frightening frankensteinization which I want to remove first.
 
Last edited:
While cleaning up the code I noticed that it was way too complicated, so I simplified it a bit and split it into several files. First, some utility classes.

We need a callback object that handles the chip select signals, as it is not possible to use the hardware chip selects (and other transfer attributes) when using dma. The interface consists of operator()(bool select), which is called with select=true to assert a chip select, and with select=false to deassert it.

cs.h:
Code:
#ifndef _CS_H_
#define _CS_H_

namespace SPI
{
/** Just the chip select interface.
**/
class AbstractCS
{
  public:
    virtual void operator()(bool select);
};

class DummyCS : public AbstractCS
{
  public:
    void operator()(bool);
};

/** Configures a pin as output, high **/
template<unsigned int pin>
class ActiveLowCSInit
{
  public:
    ActiveLowCSInit()
    {
      pinMode(pin, OUTPUT);
      digitalWriteFast(pin, 1);
    }
};

/** An active low chip select class. This also configures the pin once.
**/
template<unsigned int pin>
class ActiveLowCS : public AbstractCS
{
  public:
    ActiveLowCS()
    {
      pinMode(pin, OUTPUT);
      digitalWriteFast(pin, 1);
    }
    void operator()(bool select)
    {
      if(select)
      {
        digitalWriteFast(pin, 0);
      }
      else
      {
        digitalWriteFast(pin, 1);
      }
    }
  private:
    static ActiveLowCSInit<pin> m_init;
};

template<unsigned int pin>
ActiveLowCSInit<pin> ActiveLowCS<pin>::m_init;

} // namespace SPI

#endif // _CS_H_
So in order to create an active low chip select pin, just do
Code:
class CS9 : public SPI::ActiveLowCS<9> {};
CS9 cs9;
You can in fact create multiple instances of CS9, but the pin will only be set to output once.

Now this doesn't have much to do with SPI or DMA. Here's my current Transfer descriptor class.
transfer.h:
Code:
#ifndef _SPITRANSFER_H_
#define _SPITRANSFER_H_

namespace SPI
{
class Transfer
{
  public:
    enum State
    {
      idle,
      done,
      pending,
      inProgress
    };
    Transfer(const uint8_t* pSource,
                const uint16_t& size,
                volatile uint8_t* pDest = NULL,
                const uint8_t& fill = 0,
                AbstractCS* cb = NULL
    ) : m_state(State::idle),
    m_pSource(pSource),
    m_size(size),
    m_pDest(pDest),
    m_fill(fill),
    m_pNext(NULL),
    m_pSelect(cb) {};
    bool busy() const {return ((m_state == State::pending) || (m_state == State::inProgress));}
//  private:
    volatile State m_state;
    const uint8_t* m_pSource;
    uint16_t m_size;
    volatile uint8_t* m_pDest;
    uint8_t m_fill;
    Transfer* m_pNext;
    AbstractCS* m_pSelect;
};

} // namespace SPI

#endif // _SPITRANSFER_H_
I might add a transfer attribute member to this if that becomes necessary.

I'll skip my SPI class and post my dmaSpi class in the next post. It's not hard to adapt it to the new spi_t that Paul added to teensyduino.
 
So now that we can describe what chip we want to read to or write from - using the CS classes - and where data comes from and goes to - using the transfer class - we can let the DMA handle things in the background.

dmaSpi.h:
Code:
#ifndef _DMASPI_H_
#define _DMASPI_H_

#include <dmamux/dmamux.h>
#include <edma/edma.h>
#include <dspi/dspi.h>

#include <util/atomic.h>

#define DMASPI_TXCHAN 1
#define DMASPI_RXCHAN 0
#define MAKE_DMA_CHAN_ISR(n) void dma_ch ## n ## _isr()
#define DMA_CHAN_ISR(n)  MAKE_DMA_CHAN_ISR(n)

template
<
  DSPI_REGS& spi,
  EDMA_REGS& dmaBase,
  uint8_t txDmaChannel,
  uint8_t rxDmaChannel
>
class DmaSpi
{
  public:
    static bool registerTransfer(SPI::Transfer& transfer)
    {
      if ((transfer.busy())
       || (transfer.m_size >= 0x8000)) // max CITER/BITER count with ELINK = 0 is 0x7FFFF, so reject
      {
        return false;
      }
      transfer.m_state = SPI::Transfer::State::pending;
      transfer.m_pNext = NULL;
      ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
      {
        if (m_pCurrentTransfer == NULL)
        {
          // no pending transfer
          m_pNextTransfer = &transfer;
          m_pLastTransfer = &transfer;
          beginNextTransfer();
        }
        else
        {
          // add to pending transfers
          if (m_pNextTransfer == NULL)
          {
            m_pNextTransfer = &transfer;
          }
          else
          {
            m_pLastTransfer->m_pNext = &transfer;
          }
          m_pLastTransfer = &transfer;
        }
      }
      return true;
    }
    static void begin()
    {
      // disable requests, clear relevant flags
      spi.MCR |= SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF;
      spi.RSER = 0;

      SIM_SCGC6 |= SIM_SCGC6_DMAMUX;
      SIM_SCGC7 |= SIM_SCGC7_DMA;

      DMA_ERR = 0x0F;
      DMAMUX0.CHCFG[txDmaChannel] = 0;
      DMAMUX0.CHCFG[rxDmaChannel] = 0;
      DMA_CINT = DMA_CINT_CINT(rxDmaChannel);

      // enable requests
      // Tx, select SPI Tx FIFO
      DMAMUX0.CHCFG[txDmaChannel] = DMAMUX_ENABLE | DMAMUX_SOURCE_SPI0_TX;
      // RX, select SPI RX FIFO
      DMAMUX0.CHCFG[rxDmaChannel] = DMAMUX_ENABLE | DMAMUX_SOURCE_SPI0_RX;
      NVIC_ENABLE_IRQ((IRQ_DMA_CH0 + rxDmaChannel)); // double parantheses needed by macro
      spi.RSER = SPI_RSER_TFFF_RE | SPI_RSER_TFFF_DIRS | SPI_RSER_RFDF_RE | SPI_RSER_RFDF_DIRS;

      // configure DMA mux, TX
      m_dma.TCD[txDmaChannel].ATTR = DMA_TCD_ATTR_SSIZE(DMA_TCD_ATTR_SIZE_8BIT) | DMA_TCD_ATTR_DSIZE(DMA_TCD_ATTR_SIZE_8BIT);
      m_dma.TCD[txDmaChannel].NBYTES = 1;
      m_dma.TCD[txDmaChannel].SLAST = 0;
      m_dma.TCD[txDmaChannel].DADDR = (uint32_t)&spi.PUSHR;
      m_dma.TCD[txDmaChannel].DOFF = 0;
      m_dma.TCD[txDmaChannel].DLASTSGA = 0;
      m_dma.TCD[txDmaChannel].CSR = DMA_TCD_CSR_DREQ;

      m_dma.TCD[rxDmaChannel].SADDR = (uint32_t)&spi.POPR;
      m_dma.TCD[rxDmaChannel].SOFF = 0;
      m_dma.TCD[rxDmaChannel].ATTR = DMA_TCD_ATTR_SSIZE(DMA_TCD_ATTR_SIZE_8BIT) | DMA_TCD_ATTR_DSIZE(DMA_TCD_ATTR_SIZE_8BIT);
      m_dma.TCD[rxDmaChannel].NBYTES = 1;
      m_dma.TCD[rxDmaChannel].SLAST = 0;
      m_dma.TCD[rxDmaChannel].DLASTSGA = 0;
      m_dma.TCD[rxDmaChannel].CSR = DMA_TCD_CSR_DREQ | DMA_TCD_CSR_INTMAJOR;
    }
    bool busy() const
    {
      return (m_pCurrentTransfer != NULL);
    }
    static void m_rxIsr()
    {
      // transfer finished, start next one if available
      DMA_CINT = DMA_CINT_CINT(rxDmaChannel);
      DMA_CERQ = DMA_CERQ_CERQ(txDmaChannel);

      /** TBD: RELEASE SPI HERE IF APP REQUESTED IT**/
      /** TBD: IF NOT, PROCEED **/

      if (m_pCurrentTransfer->m_pSelect != NULL)
      {
        SPI::AbstractCS& cb = *(m_pCurrentTransfer->m_pSelect);
        cb(false);
      }
      m_pCurrentTransfer = NULL;
      m_pCurrentTransfer->m_state = SPI::Transfer::State::done;
      beginNextTransfer();
    }
    static void write(const uint32_t& flagsVal)
    {
      SPI::Transfer transfer(NULL, 1, NULL, flagsVal);
      registerTransfer(transfer);
      while(transfer.busy())
      ;
    }
    static uint8_t read(const uint32_t& flagsVal)
    {
      volatile uint8_t result;
      SPI::Transfer transfer(NULL, 1, &result, flagsVal);
      registerTransfer(transfer);
      while(transfer.busy())
      ;
      return result;
    }
  private:
    static void beginNextTransfer()
    {
      if (m_pNextTransfer == NULL)
      {
        /** TBD: UNLOCK SPI **/
        return;
      }
      m_pCurrentTransfer = m_pNextTransfer;
      m_pCurrentTransfer->m_state = SPI::Transfer::State::inProgress;
      m_pNextTransfer = m_pNextTransfer->m_pNext;
      if (m_pNextTransfer == NULL)
      {
        m_pLastTransfer = NULL;
      }

      /** TBD: Lock SPI **/

      /** Call Chip Select Callback **/
      if (m_pCurrentTransfer->m_pSelect != NULL)
      {
        SPI::AbstractCS& cb = *(m_pCurrentTransfer->m_pSelect);
        cb(true);
      }

      /** clear SPI flags **/
      spi.SR = 0xFF0F0000;
      spi.MCR |= SPI_MCR_CLR_RXF | SPI_MCR_CLR_TXF;
      spi.TCR = 0;

      /** configure Rx DMA **/
      if (m_pCurrentTransfer->m_pDest != NULL)
      {
        // real data sink
        m_dma.TCD[rxDmaChannel].DADDR = (uint32_t)(m_pCurrentTransfer->m_pDest);
        m_dma.TCD[rxDmaChannel].DOFF = 1;
      }
      else
      {
        // use devNull
        m_dma.TCD[rxDmaChannel].DADDR = (uint32_t)&m_devNull;
        m_dma.TCD[rxDmaChannel].DOFF = 0;
      }
      m_dma.TCD[rxDmaChannel].CITER = m_dma.TCD[rxDmaChannel].BITER = m_pCurrentTransfer->m_size;
      DMA_SERQ = rxDmaChannel;

      /** configure Tx DMA **/
      if (m_pCurrentTransfer->m_pSource != NULL)
      {
        // real data source
        m_dma.TCD[txDmaChannel].SADDR = (uint32_t)(m_pCurrentTransfer->m_pSource);
        m_dma.TCD[txDmaChannel].SOFF = 1;
      }
      else
      {
        // dummy source
        m_dma.TCD[txDmaChannel].SADDR = (uint32_t)&(m_pCurrentTransfer->m_fill);
        m_dma.TCD[txDmaChannel].SOFF = 0;
      }
      m_dma.TCD[txDmaChannel].CITER = m_dma.TCD[txDmaChannel].BITER = m_pCurrentTransfer->m_size;
      DMA_SERQ = txDmaChannel;
    }
    static constexpr EDMA<dmaBase>& m_dma = (EDMA<dmaBase>&)dmaBase;

    static Tcd m_tcd; // used for long transfers
    static SPI::Transfer* volatile m_pCurrentTransfer;
    static SPI::Transfer* volatile m_pNextTransfer;
    static SPI::Transfer* volatile m_pLastTransfer;
    static volatile uint32_t m_push;
    static uint8_t m_fill;
    static volatile uint8_t m_devNull;
};

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
Tcd DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_tcd;

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
SPI::Transfer* volatile DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_pCurrentTransfer = NULL;

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
SPI::Transfer* volatile DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_pNextTransfer = NULL;

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
SPI::Transfer* volatile DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_pLastTransfer = NULL;

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
volatile uint8_t DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_devNull;

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
volatile uint32_t DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_push;

template<DSPI_REGS& spi_, EDMA_REGS& dmaBase, uint8_t txDmaChannel, uint8_t rxDmaChannel>
uint8_t DmaSpi<spi_, dmaBase, txDmaChannel, rxDmaChannel>::m_fill;

typedef DmaSpi<_SPI0, _DMA, DMASPI_TXCHAN, DMASPI_RXCHAN> DMASPI;

extern DMASPI dmaSpi;

#endif // _DMASPI_H_

and, unfortunately, we need a dmaSpi.cpp:
Code:
#include "dmaSpi.h"
DmaSpi<_SPI0, _DMA, DMASPI_TXCHAN, DMASPI_RXCHAN> dmaSpi;

DMA_CHAN_ISR(DMASPI_RXCHAN)
{
  DmaSpi<_SPI0, _DMA, DMASPI_TXCHAN, DMASPI_RXCHAN>::m_rxIsr();
}
The Transfer objects are handled by the dmaSpi in the order they are registered usind DmaSpi::registerTransfer(). The class manages an intrusive list of transfers (using SPI::Transfer::m_pNextTransfer). If the DmaSpi is not busy, a pending transfer is started using DmaSpi::beginNextTransfer(). This method just sets the DMA source and destination addresses for the transmit and receive channels, which are specified in template parameters. DmaSpi assumes that it is the only part of the application that uses these DMA channels, so some channel attributes remain constant throughout the runtime.

The first thing done by DmaSpi::beginNextTransfer() is updating the Transfer state to inProgress and some queue management. It then calls the chip select object and clears pending SPI flags which could interfere with dma operation. Then the channel source and destination attributes are updated the request enable flags set.

The DMA will now transmit and receive data using the SPI. When the RX DMA channel has finished (this also means that everything has been transmitted), its ISR is called (DmaSpi::m_rxIsr()). Requests are disabled, the chip select object is called and the next transfer is started.

Now to the initialization if the whole thing: This is in DmaSpi::begin(). It basically just clears all SPI flags that could interfere with the DMA, enables the DMA and DMAMUX, clears the channel configuration for TX and RX and then configures them to be activated by the SPI TX FIFO FILL and RX FIFO DRAIN flags, respectively. The SPI is told to not generate an interrupt, but a DMA request. The rest is initialization of the transfer control descriptors with those values that don't change during runtime (and that's a lot!).

Hope this helps! I've not included a working example because I've done some modifications to mk20dx128.h, the linker script, and some other bits and pieces. BUT they are all cosmetic tweaks to make this template working. Translating this into a plain class should not be a problem. Here is how I use it:

Code:
class CS9 : public SPI::ActiveLowCS<9> {};
CS9 cs9;

uint8_t source[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint8_t dest[10] = {0};

SPI0.begin(); // configure as master, as usual
SPI0.CTAR[0] = SPI_CTAR_FMSZ(7) | SPI_CTAR_PBR(0) | SPI_CTAR_BR(0); // define transfer attributes

dmaSpi.begin();

SPI::Transfer transfer(source, 10, dest, 0xFF, &cs9);
dmaSpi.registerTransfer(transfer);
while(transfer.busy()); // you can also do something useful here, if the transfer is large
It's also possible to use NULL for source and destination addresses in the transfer constructor, the DmaSpi will handle that using the fill value given (0xFF in this example, for TX) and/or by discarding incoming data.

Regards

Christoph
 
One major problem with the above DMA SPI is that it's not very cooperative. You cannot "mix and match" with procedural SPI code that would, for example, simply write a byte or two to an SPI device. You'd have to set up a buffer with the data, create a suitable chip select class, and register a transfer.

I'm thinking about an easy way to do this, but it would involve creating a mutex for the SPI and a special lock class that operates on that mutex. When the application code (i.e. not the DMA SPI driver) wants to use the SPI, it would request ownership from the DMA SPI, wait for ownership, do whatever is necessary, return ownership to the DMA SPI driver and tell it to resume. Sounds simple, but I'm having a hard time coming up with "safe" code for that.
 
Yes, it helps - where can I find the working code?

So did this help anybody yet?

Christoph,

I need some background SPI handling for a high performance data logger. So, yes, it helps. I would love to see the code working. I know I will have to modify SdFat and other involved components, but I am willing to do this.

Can you post a link to a working example or send it to me through a personal mail?

Thanks!

Jorg
 
Data logger- usually small amounts of data. Seems to me SPI DMA is beneficial on large blocks, like SD card, ethernet packets.

Why do we need scatter/gather DMA?
 
Before anyone wastes a lot of time trying to use the scatter/gather features in the DMA controller, check the errata from Freescale. Apparently that feature is broken in many of the chips.
 
Data logger

Data logger- usually small amounts of data. Seems to me SPI DMA is beneficial on large blocks, like SD card, ethernet packets.

It's not about acquiring the values, it's about writing them out to SD card. And to radio, sending 32 byte bursts using a different CS pin, interleaved with SD card access.

Furthermore, I need the CPU free, as the data acquisition will almost saturate both I2C buses of a Teensy 3.1, generating transfer interrupts (>= 1kHz accelerometer, >= 100 Hz gyro, >= 10 Hz pressure). Values that need be converted into float/double and formatted in packets, then this data needs to be shuffled around to SD card and radio. Quite some work.

So, I'd love to see SPI DMA working and - even more - to get rid of wait loops in SdFat. Will be quite some work to do, but I think it's worth it.

Anyone out there already working on a non-blocking SdFat implementation?
 
Last edited:
A non-blocking SDFat would be awesome, but I don't know if it's in the works anywhere.

I can post my current DMA SPI code here next week, because I don't have it on this machine. I might set up a repo then as well.
 
Testers needed

Well, here it is. I stripped all flexibility from my DMA SPI code, because it required modifications to the linker script and some teensyduino headers - I wanted to spare you that trouble. The DMA SPI is configured to use:
  • SPI0
  • DMA channel 0 for RX
  • DMA channel 1 for TX
  • SPI0_CTAR0 for default transfer attributes
The class DmaSpi0 provides a begin() method that configures SPI0, DMA and DMAMUX for operation. All you need to do to use it is calling begin() in setup() (or wherever you want to initialize it):
Code:
#include <DmaSpi.h>

void setup()
{
  DMASPI0.begin();
}
DMASPI0 is declared in DmaSpi.h, and defined in DmaSpi.cpp.

Transfers are managed with a DmaSpi0::Transfer class. The Transfer constructor has arguments for
  • data source: A pointer to const uint8_t, can be nullptr. In that case, the stuff value (see below) is used.
  • transfer size: Must be > 0 and < 0x8000. If this is violated, the transfer cannot be started.
  • data sink: A pointer to const uint8_t, can be nullptr. In that case, received data is discarded.
  • a "stuff" value: When the data source is a nullptr, this value is sent instead of "real" data.
  • a chip select object: The mk20dxXXX chip has a bug that makes it impossible to use some features, which required this workaround (otherwise we could use the native chip select lines). Chip Select objects and classes are described below.
Chip Select classes inherit AbstractChipSelect, which is an abstract interface with methods select() and deselect(). You can implement anything you like as long as it has this interface - the DMA SPI will use it. There are two predefined classes that might be commonly used:
  • DummyChipSelect simply does nothing to select a chip. This is useful when you "just need SPI" with manual chip select management.
  • ActiveLowChipSelect is a template class that is used to create an active low chip select object. The pin (given as a template argument) is set to OUTPUT and idle (high) when the object is constructed; select() sets it low and deselect() sets it high again.

Usage:

In this example, the DMA SPI is used to transfer 100 bytes with value 0xFF to a chip that is selected with the LED line. Any returned data is discarded.
Code:
// create a chip select object. This one uses the pin that the built-in LED is connected to.
ActiveLowCS<LED_BUILTIN> cs;

// create a transfer object
DmaSpi0::Transfer trx(nullptr, 100, nullptr, 0xFF, &cs);

// and register it. If the DMA SPI is idle, it will immediately start transmitting. Otherwise the transfer is added to a queue.
DMASPI0.registerTransfer(trx);

/** FREE CPU TIME YOU DIDN'T HAVE BEFORE! DO SOMETHING USEFUL! **/

// wait for the transfer to finish
while(trx.busy());
If you have meaningful data for your SPI slave, you just set the data source accordingly. Same for receiving data:
Code:
uint8_t source[128];
uint8_t destination[128];
DmaSpi0::Transfer trx(source, 128, destination, 0xFF, &cs);
...
while(trx.busy());

If you need special CTAR settings, you can create a chip select class that sets SPI0_CTAR0 accordingly before it selects your chip. The DMA SPI creates a backup of SPI0_CTAR0 before selecting a chip and restores it when the transfer is complete, so you cannot break things that way.

Suggestions and bug reports are highly welcome, especially for reading from a device as I have not tried that (well, not really. I created a loopback by shorting MOSI and MISO). One of my plans is to make it more flexible by creating a template that allows to specifiy which SPI, which CTAR, and which DMA channels are used.

Regards

Christoph
 

Attachments

  • DmaSpi.cpp
    358 bytes · Views: 209
  • DmaSpi.h
    7.5 KB · Views: 290
  • .h
    1.1 KB · Views: 219
Note: There is an attachment called ".h" in the previous post, I have no idea how that happened. The correct name is "ChipSelect.h" and I couldn't fix it by re-uploading, but the content is correct.
 
Is contention for use of specific DMA controller channels another example, like the SPI port, of the need for common resource management.
E.g., need for "bool allocateDMAchannel(); and the inverse of that.
 
All internal peripherals are an example for that because there is no theortical imit on libraries that might use a resource, and we can never be sure that a user doesn't use two libraries that access the same resource.

We have two options:
  • Let users decide if libraries go well together
  • Create some way of finding out at compile time if two libraries in a specific configuration go well together (static assertions? Lots of template magic)
 
All internal peripherals are an example for that because there is no theortical imit on libraries that might use a resource, and we can never be sure that a user doesn't use two libraries that access the same resource.

We have two options:
  • Let users decide if libraries go well together
  • Create some way of finding out at compile time if two libraries in a specific configuration go well together (static assertions? Lots of template magic)


Hi Christoph,

Can you package this into library and then zip it? Its makes it much easier to install it as a new library if using Arduino IDE. Also add an example or two.

Thanks,
duff
 
What else (apart from zipping) do I have to do to turn this into a library? I usually only use the teensyduino files in my own build environment.

Note that above code is not meant to be "used", consider it alpha. Coming up with useful example is tough, as they would all be tightly coupled to properly set up hardware.
 
Status
Not open for further replies.
Back
Top