Problem with Teensy 4.0 - ADC DMA buffer is correctly filled only once...

cyboff

Member
Hello,

I have a problem with Teensy 4.0, that the DMA buffer is correctly filled with the actual values from ADC1_R0 only the first time from start until interruptOnComplete(), then the DMA buffer does not work anymore... What am I missing to set? It worked fine on Teensy 3.2.

Here is my simplified sample code:
Code:
#include <ADC.h>
#include <DMAChannel.h>

const uint16_t buffer_size = 16;
DMAMEM static volatile uint16_t __attribute__((aligned(32))) dma_adc_buff1[buffer_size];
const int readPin_adc0 = A0;

ADC *adc = new ADC();
DMAChannel dma_ch1;

void interruptDMA();

void setup()
{
  while (!Serial && millis() < 5000)
    ;

  Serial.begin(9600);
  Serial.println("Starting");
  pinMode(readPin_adc0, INPUT_DISABLE);
  pinMode(LED_BUILTIN, OUTPUT);

  adc->adc0->setAveraging(1);
  adc->adc0->setResolution(12);
  adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED);
  adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED);

  dma_ch1.source((volatile uint16_t &)(ADC1_R0));
  dma_ch1.destinationBuffer((uint16_t *)dma_adc_buff1, buffer_size * 2);
  dma_ch1.interruptAtCompletion();
  dma_ch1.disableOnCompletion();
  dma_ch1.attachInterrupt(&interruptDMA);
  dma_ch1.triggerAtHardwareEvent(DMAMUX_SOURCE_ADC1);

  dma_ch1.enable(); // Enable the DMA channel
  adc->adc0->enableDMA();
  adc->adc0->startSingleRead(readPin_adc0);
  adc->adc0->startQuadTimer(500);
}

void loop()
{
  delay(1000);
  int i = 0;
  while (i < buffer_size)
  {
    Serial.printf("%u: %u ", i, dma_adc_buff1[i]);
    i++;
  }

  Serial.printf("ADC1_R0: %u \n", ADC1_R0);
  
  dma_ch1.enable(); // enable DMA again
}

void interruptDMA()
{
  //dma_ch1.clearComplete();
  dma_ch1.clearInterrupt();
  
#if defined(__IMXRT1062__) // Teensy 4.0
  asm("DSB");
#endif
}
 
My first try would be to add call: dma_ch1.clearComplete();
right before you the dma_ch1.enable();
 
Hey Kurt, thanks for your time. I already tried it (it's even commented in the code sample), but it didn't help...
Strangely enough, similar code works as it should on Teensy 3.2 (of course I had to change the source to ADC0_RA and DMAMUX_SOURCE_ADC0)...
 
OK, I found the reason why Pedro's ADC with DMA examples worked, but not mine...
Adding this line before re-enabling DMA solved the issue:

Code:
if ((uint32_t)dma_adc_buff1 >= 0x20200000u)
    arm_dcache_delete((void *)dma_adc_buff1, sizeof(dma_adc_buff1));

But I have no idea why :confused: Can somebody explain it to me, please?
 
Sorry, I missed seeing the DMAMEM on your example.

To me, the keyword DMAMEM translates to, almost the worst place to do DMA ;)
External memory can be worse :D

The issue: These areas of memory are configured to use the hardware cache and the way that we have the cache configured, the values stored in the cache and the values stored in the real physical memory are not necessarily match. That is for example going the other direction, if your code should write to a specific memory location, the cached value could be updated, but the new value may not actually be written to the physical memory, until maybe the processor might decide to use that area cache for another location of memory and at that point it will flush out updated stuff in cache back to the physical memory.

likewise, if something should change the actual values of physical memory, and that area of memory is being cached, and some instruction asks for the current value of that memory location, it will continue to get the value from the cache.

Why this is important: Most instructions that read or write to memory locations. BUT: DMA only talks with the actual physical memory.

So arm_dcache_delete - says to delete that region of memory from the current cache, such that the next reads or the like, will not be from the cache and as such will pick up new values from DMA.

Notes:

1) cache writes and deletes work on 16 bytes at a time, aligned on 16 byte boundaries. So the arm_dcache_delete can be dangerous to use. That is it throws away the current contents of a memory region that includes your full DMA request. So it might throw away updated values near the start or end of that memory range: That is anything in the page at the start of the buffer or right after the buffer within 16 byte range. Like potentially heap memory pointers (if this comes from malloc...)

2) DMA operations to memory should have the buffers be 16 byte aligned...

3) if ((uint32_t)dma_adc_buff1 >= 0x20200000u) - this was a quick and dirty test to say, the memory is not in the range of DTCM (or ITCM) which are not cached..

So in generic case, I should have used: arm_dcache_flush_delete((void *)dma_adc_buff1, sizeof(dma_adc_buff1));
instead, which would have first flushed each page back to memory before deleting it from the cache.
But that obviously is probably slower as it actually has to write to memory. FrankB - had a version of safe delete, which checked the first and last page for alignment and would do flushes on them. But that extended API was not incorporated.

Hope that helps.

Kurt
 
Hey Kurt, thanks again for the comprehensive explanation! To be honest, I'm even more confused now :D I'm not very good at programming, so I was using the examples from the ADC library as a guide.... But I always thought that DMAMEM is actually NOT cached to avoid the above problems.... :D And even more confusing to me is that Teensy 3.2 did not require cache flushing to work properly....
 
RAM2 / DMAMEM is cached for sure, and it and the 32KB cache are unique to the T_4.x's 1062 architecture.

The cache does 'get in the way' using DMAMEM needing that extra attention - but it not only keeps RAM1 free for other uses making use of the second 512KB in RAM2, but seems perhaps it minimizes bus contention for running code and data from RAM1 ITCM and DTCM.
 
Sorry, I know sometimes less is more...

If you look at the memory section of the T4.1 product page:
https://www.pjrc.com/store/teensy41.html#memory
everything except RAM1 (DTCM/ITCM) uses the cache.
The DTCM/ITCM - is tightly coupled memory, that runs at full speed and does not need it.

A few years ago, when I was trying to better understand how these different things work, I started a thread, where I tried to document some of this:
https://forum.pjrc.com/threads/57326-T4-0-Memory-trying-to-make-sense-of-the-different-regions
But I probably only made some of these concepts and things, clear as mud.
 
Back
Top