Generla Purpose DMA & USB DMA interactions

Oliver256

New member
Hi everyone,

I’m having a question regarding the DMA capabilities of the Teensy 4.1. In special I’m interested if there is any interaction (performance and timing wise) between the General Purpose DMA and the USB DMA. As far as I understand it from reviewing the docu and this post from Paul (https://forum.pjrc.com/threads/66740-Teensy-4-1-USB-Serial-with-DMA) the USB DMA should be independent from the GP DMA. Do I understand this right?

Background:

I’m using a Teensy 4.1 to read data from two 16-bit ADC’S (AD4004) @ 400 kSPS. I’m using the DMA to generate the conversion clock for the ADC's and read the ADC data from the SPI Bus to a buffer.

To generate the conversion clock I’m using the periodic triggering capabilities of the DMA (together with a PIT timer) to turn on and off a GPIO pin. Everything works fine as long as I do not send data using the Serial.write() function during the sampling in the main. I have confirmed that by reading the SPI data directly with my oscilloscope. I also checked the DMA writing addresses during sampling (at a very low sample rate) and that looks also fine.

The plan is to have a continues data stream to a PC. If I do so the noise floor of the ADC raises a random manner. I’m doing a 4096 Pt. FFT on the PC and every few FFTs the noise floor raises by up to 30 dB.

If I
  • sample data
  • stop sampling
  • send the data
  • sample data
  • ...

everything works also fine.

This behavior makes me think if it is possible that the timing of the sampling clock gets disturbed by the USB DMA.

At the moment I'm not able to share the code of the project.

I’m thankful for every input.

Greetings
Oliver
 
AFAIK: That post suggests there are unique and a few DMA channels of different types for different purposes.

Again, AFAIK in some fashion: When it comes to running together any DMA will need unique BUS access at the time that will preclude others during the transfer at hand.

So that could explain the behavior being observed.

KurtE wrote a Stream class to RAM buffer where 'prints' could go pending output to USB Serial. It was written to run start to end of buffer for tracking output during busy operations, then to be dumped as a record of that process after it is complete.

During FFT data set collection that could be used and then before starting the next data set collection that could be transferred to USB Serial activating that DMA process during other data manipulation perhaps before the next data set collection.

Did a sample sketch here and it does not wrap for continuous use and the DUMP code didn't reset for use - but some edits looked doable and seemed they worked before I left that sketch.

Here is where I left that sketch if it seems like a usable idea:
> see rstream.printf
> print_capture_data();
> it was also using _isr to add to a test a version of a posted ringbuffer, added rstream to see if buffering data was better than printing on the fly
Code:
// https://www.downtowndougbrown.com/2013/01/microcontrollers-interrupt-safe-ring-buffers/
// https://forum.pjrc.com/threads/54962-Interrupt-on-Rising-and-Falling-on-the-same-pin?p=301287&viewfull=1#post301287

// RAMStream: https://forum.pjrc.com/threads/68139-Teensyduino-File-System-Integration-including-MTP-and-MSC?p=297813&viewfull=1#post297813

#define RING_SIZE   64 // must be power of 2
typedef uint8_t ring_pos_t;
volatile ring_pos_t ring_head;
volatile ring_pos_t ring_tail;
volatile uint32_t ring_data[RING_SIZE];


uint32_t add(uint32_t vv) {
  ring_pos_t next_head = (ring_head + 1) & (RING_SIZE - 1);
  if (next_head != ring_tail) {
    /* there is room */
    ring_data[ring_head] = vv;
    ring_head = next_head;
    return 1;
  } else {
    /* no room left in the buffer */
    return 0;
  }
}

uint32_t remove( uint32_t &vv ) {
  if (ring_head != ring_tail) {
    vv = ring_data[ring_tail];
    ring_tail = (ring_tail + 1) & (RING_SIZE - 1);
    return 1;
  } else {
    return 0;
  }
}

// --------------------------------------------------
class RAMStream : public Stream {
public:
  // overrides for Stream
  virtual int available() { if ( tail_ == head_ ) tail_ = head_ = 0; return (tail_ - head_); }
  virtual int read() { return (tail_ != head_) ? buffer_[head_++] : -1; }
  virtual int peek() { return (tail_ != head_) ? buffer_[head_] : -1; }

  // overrides for Print
  virtual size_t write(uint8_t b) {
    if (tail_ < buffer_size) {
      buffer_[tail_++] = b;
      return 1;
    }
    return 0;
  }

  enum { BUFFER_SIZE = 6 * 65535 };
  uint8_t buffer_[BUFFER_SIZE];
  //uint8_t *buffer_ = nullptr;
  uint32_t buffer_size = BUFFER_SIZE;
  uint32_t head_ = 0;
  uint32_t tail_ = 0;
};

RAMStream rstream;


uint8_t print_buffer[256];
void print_capture_data() {
  int avail;
  while ((avail = rstream.available())) {
    if (avail > (int)sizeof(print_buffer)) avail = sizeof(print_buffer);

    int avail_for_write = Serial.availableForWrite();
    if (avail_for_write < avail) avail = avail_for_write;
    rstream.readBytes(print_buffer, avail);
    Serial.write(print_buffer, avail);
    delayNanoseconds(50);
  }

  int ch;
  while ((ch = rstream.read()) != -1) {
    Serial.write(ch);
    delayNanoseconds(50);
  }
  Serial.print('\n');

}


// --------------------------------------------------

uint32_t tCnt = 1;
uint32_t tCntB = 1;
uint32_t eCnt = 0;
#include "IntervalTimer.h"
IntervalTimer tTimer;
IntervalTimer tTimerB;
void t_isr () {
  int nn = ARM_DWT_CYCCNT % 10;
  //if ( !(nn % 2) ) return;
  if ( nn <7 ) nn=0;
  else if ( !(tCnt%4) ) nn=1;
  for ( int ii = 0; ii < nn; ii++ )
    if ( add( tCnt ) == 1 ) tCnt++;
}

void t_isrB () {
  int nn = ARM_DWT_CYCCNT % 10;
  if ( !(nn % 2) ) return;
  for ( int ii = 0; ii < nn; ii++ )
    delayNanoseconds(10);
  tCntB++;
}

elapsedMillis emMeg;
void setup() {
  Serial.begin(115200);
  while (!Serial && millis() < 4000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  tTimer.begin(t_isr, 3);
  tTimerB.begin(t_isrB, 5);
  emMeg = 0;
}

//void yield() {}

uint32_t lCnt = 0;
uint32_t cCnt = 1;
void loop() {
  uint32_t tv = 0;
  uint32_t wCnt = 0;
  uint32_t emNow;
  lCnt++;
  while ( 1 == remove( tv ) ) {
    if ( cCnt != tv ) eCnt += 100000;
    cCnt++;
    wCnt++;
    //delayNanoseconds(100);
  }
  if ( 0 != eCnt ) {
    Serial.printf( "Error Cnt = %u", eCnt );
    Serial.printf( "Error: %u != %u", cCnt, tv );
    while (1);
  }

//  if ( wCnt > 1 ) Serial.printf( "l#= %u :w#= %u\n", lCnt, wCnt );
  if ( wCnt > 1 ) rstream.printf( "l#>= %u :w#= %u\n", lCnt, wCnt );
  if ( !(lCnt % 1000000) ) {
    print_capture_data();
    emNow = emMeg;
    Serial.printf( " -------------   >>>>>> Loop Cnt = %u 1M in %u ms :nxt cCnt@%u\n", lCnt, emNow, cCnt );
    emMeg = 0;
  }
  //else delayNanoseconds(1000);
}
 
Good luck - hopefully it makes sense to conform to use.

It has a fixed buffer allocated:
Code:
  enum { BUFFER_SIZE = 6 * 65535 };
  uint8_t buffer_[BUFFER_SIZE];

If only doing small groups that is overkill.

Testing done with _isr() [doing the ringbuffer writes] was not using DMA so there was no conflict with USB- that was just to test both pieces of code. The RING and RAMStream are independent - just put then there for test as that does it.

It was edited so that once the last byte is removed with rstream.available() it resets head&tail, so it is ready to start at the beginning again, it only adds to the end while space is there.
 
It is sort of hard to give specific suggestions on this without seeing more details.

For example, is your capture code using any interrupts? For example, if you are using PIT timer to set a GPIO pin to control the DMA through ISR handler, then the timing of the code could be influenced by other interrupts. That is if an ISR is running at the same or logically priority as yours, it will continue to run instead of yours... Or if Higher Priority (lower value) it will interrupt yours.

If this is the issue, then a few different options:
a) Setup a timer to directly control IO pin. like a flexpwm timer or GPT timer. These don't need to service interrupts...
b) Set the ISR of your timer to higher priority (lower value). If interval timer, use the priority member function.

How are you doing continuous sampling? If you run it and restart it when done, then same issues as above. Should maybe setup with circular buffer, interrupt but not stop at certain points, like every half. And have it transfer that section of data to Serial.

How are things transferred to Serial? If done at the ISR level and if you do something like: Serial.write(buffer, 4096)
and not enough room on Serial to handle, code will wait until there is... Which can screw up things. So can try few things like:
a) memory buffers like @defragster mentioned. Maybe ISR simply copies into larger buffer and main code transfers to Serial
b) If you know that your sample speed is slower than USB speed, then maybe setup dma chain with 512 byte chunks, and assume you can simply do Serial write direct from ISR.

...
 
Just to clarify it a bit: I already used a buffer like defragster suggested.

For example, is your capture code using any interrupts?
I’m using just one interrupt for servicing the DMA which is copying data from the SPI bus into a buffer. The Buffer size is 128 * 512 bytes. There is space for 16384 samples.

Code:
#define DMA_BUFFERLENGTH 128
#define DMA_BUFFERSIZE 128
DMAMEM uint32_t dmaBufferADC[DMA_BUFFERLENGTH * DMA_BUFFERSIZE] __attribute__((aligned(32)));

At a sample rate of 400 kSPS I get an interrupt every 320 µs (2 ADCs with 16 Bit).

Configuration of the DMA:
Code:
// Enabele SPI RX DMA
  dmachannel_SPI_RX.begin(true);
  dmaChSPIRX_u32 = dmachannel_SPI_RX.channel;
  dmachannel_SPI_RX.source(LPSPI4_RDR);
  dmachannel_SPI_RX.destinationBuffer((uint32_t *)pDMABufferADC, DMA_BUFFERSIZE * 4);
  dmachannel_SPI_RX.interruptAtCompletion();
  dmachannel_SPI_RX.attachInterrupt(ISR_DMA_SPI_RX);
  dmachannel_SPI_RX.triggerAtHardwareEvent(DMAMUX_SOURCE_LPSPI4_RX);
  dmachannel_SPI_RX.enable();
In the DMA ISR the buffer pointer will be moved:

Code:
void ISR_DMA_SPI_RX()
{
#if DEBUG_ISR
  GPIO6_DR |= ((uint32_t)(1 << 29));
#endif
  dmachannel_SPI_RX.clearInterrupt();
  asm("dsb");
  writeCnt_u32 = (writeCnt_u32 + 1) % DMA_BUFFERLENGTH;
  asm("dsb");
  if (writeCnt_u32 != readCnt_u32)
  {
    // write new buffer address to the DMA
    dmachannel_SPI_RX.destinationBuffer((uint32_t *)(pDMABufferADC + (writeCnt_u32 * DMA_BUFFERSIZE)), DMA_BUFFERSIZE * 4);
    asm("dsb");
  }
  else
  {
    // Error is handled in the main loop (Just freezes the program and blinks a status led).
    bufferError = true;
  }
#if DEBUG_ISR
  GPIO6_DR &= ~((uint32_t)(1 << 29));
#endif
}


For example, if you are using PIT timer to set a GPIO pin to control the DMA through ISR handler, then the timing of the code could be influenced by other interrupts. That is if an ISR is running at the same or logically priority as yours, it will continue to run instead of yours... Or if Higher Priority (lower value) it will interrupt yours.
...

I don’t use an interrupt for this. I just setup the PIT to a certain frequency (400 kHz so PIT_LDVAL gets initialized with 0x176) and then I use the ability of the first 4 DMA channels to get triggered periodically directly by the PIT.

Configuration of the DMA:
Code:
// Enable CNV pin on DMA write
  dmachannel_CNV_on.begin(true);
  dmaChCNVon_u32 = dmachannel_CNV_on.channel; 
  dmachannel_CNV_on.source(cnvHigh_u32);
  dmachannel_CNV_on.destination(GPIO1_DR);
  dmachannel_CNV_on.transferCount(1);
  dmachannel_CNV_on.triggerContinuously();
  // Trigger the DMA with a PIT timer (only DMA Ch. 0-3 have PIT trigger abilities)
  *((uint32_t *)(IMXRT_DMAMUX_ADDRESS + dmaChCNVon_u32 * 0x04)) |= DMAMUX_CHCFG_TRIG; 
  dmachannel_CNV_on.enable();

This DMA channel is used to set the GPIO pin high, there is a second one which is almost identical but sets the GPIO pin low.


How are you doing continuous sampling? If you run it and restart it when done, then same issues as above. Should maybe setup with circular buffer, interrupt but not stop at certain points, like every half. And have it transfer that section of data to Serial.

The sampling started by starting the PIT timers in the main loop.
There is one more DMA involved which is also PIT triggered. This one generates the SPI clock to read the data from the ADCs.
Code:
//  Enable SPI TX DMA Ch. 1
  dmachannel_SPI_TX.begin(true);
  dmaChSPITX_u32 = dmachannel_SPI_TX.channel;
  dmachannel_SPI_TX.source(dummyData_u32);
  dmachannel_SPI_TX.destination(LPSPI4_TDR);
  dmachannel_SPI_TX.transferCount(1);
  dmachannel_SPI_TX.triggerContinuously();
  // Triger the DMA with a PIT timer (only DMA Ch. 0-3 have PIT triger abilities)
  *((uint32_t *)(IMXRT_DMAMUX_ADDRESS + dmaChSPITX_u32 * 0x04)) |= DMAMUX_CHCFG_TRIG; 
  dmachannel_SPI_TX.enable();
This way continues sampling is performed (forever if I don’t stop it) and the data from the buffer will be moved to Serial in the main.


How are things transferred to Serial? If done at the ISR level and if you do something like: Serial.write(buffer, 4096)
and not enough room on Serial to handle, code will wait until there is... Which can screw up things.

The data is transferred in the main loop with Serial.write(buffer, 512). Before the transfer I check if there is enough room on the Serial with Serial.availableForWrite(). If not, no transfer is performed.
I will send two align bytes (a SOF, just one’s) every 64 Serial writes in addition to the actual data but I always check if there is enough space available at the Serial.

Code:
#define SOF_SIZE = 8;
const uint32_t sof_count_u32 = 64;

// Send Data to USB
  if (readCnt_u32 != writeCnt_u32)
  {
#if DEBUG_MAIN
    GPIO6_DR |= ((uint32_t)(1 << 20));
#endif // DEBUG
    uint8_t _sofSize_u8 = 0;
    if (!(sendCount_u32 % sof_count_u32))
    {
      _sofSize_u8 = SOF_SIZE;
      asm("dsb");
    }
    if ((Serial.availableForWrite() >= ((DMA_BUFFERSIZE * 4) + _sofSize_u8)))
    {
      if (_sofSize_u8 == SOF_SIZE)
      {
        Serial.write(pSOF_u8, SOF_SIZE);
      }
      arm_dcache_delete(pDMABufferADC + (readCnt_u32 * DMA_BUFFERSIZE), DMA_BUFFERSIZE * 4);
      Serial.write((uint8_t *)(pDMABufferADC + (readCnt_u32 * DMA_BUFFERSIZE)), (DMA_BUFFERSIZE * 4));
      readCnt_u32 = (readCnt_u32 + 1) % DMA_BUFFERLENGTH;
      sendCount_u32++;
    }
#if DEBUG_MAIN
    GPIO6_DR &= ~((uint32_t)(1 << 20));
#endif // DEBUG
  }


So can try few things like:
a) memory buffers like @defragster mentioned. Maybe ISR simply copies into larger buffer and main code transfers to Serial

This is the way I do it at the moment. The ISR is that one of dmachannel_SPI_RX (second code snippet).
 
Back
Top