Concurrent DMA-READ and DMA-WRITE to same RAM area

Status
Not open for further replies.

Frank B

Senior Member
Hi,
the topic says it :)

First, pleae excuse that i can't show the code.. nobody wants to read 3000 lines.
If it is really needed i can invest 2 hours next weekend and write a miniml example.

I already mentioned it some weeks ago:
If you use two channels, the first reads and the other channels writes (asynchronously) to the same RAM-Area, the reading channel reads wrong values (sometimes!). Or, the writing is corrupt (?) (hard to say..)

My Question: Is this
a) documented
b) a silicon bug ?

And, more important, is there a workaround ?

Background: I'm using DMA to continously transfer a 153KB area to the ILI-Display (SPI) (-> reading channel)
To speed up things i fill certain areas in the RAM with a concurrent DMA-Channel.
The display shows random(!) black and white lines, ~2-3 times a second.

This does NOT happen, when filling the RAM with a loop in c.
 
Last edited:
Is the alignment and transfer size the same? If not, you might be seeing partial updates. Depending on how you set up the DMA arbitration, the channels could be switching back and forth, giving you the line pattern.
 
Hm. Yes, alignment is the same. Transfer-Size: Not sure what you mean :) The reading-transfer is 16BIT from a circular 320*240*2 byte buffer. It does not stop when it#s ready and starts from beginning automatically (a continous transfer). Target is SPI.

For my test:
The writing is one line (for testing, 320*240*2, too) from a constant 32BIT value (in RAM, not flash) to the same circular buffer, but this time it transfer one line (160*32Bit) with disableOnCompletion.
The software waits until it is ready and then restarts this transfer with the next line.

Thank you for the hint - i'll try to use 16BIT for the writing, too. Let's see what happens :)
 
Last edited:
This is the first I've heard of such an issue.

Maybe try playing with the priority levels of the DMA channels. Like most DMA troubleshooting, this is pretty much working blind, but maybe if it has any effects maybe that will lead you to more understanding of the problem?
 
@TNI: Using 16BIT DMA-write is even worse: More of these "corrupt" lines (again, random !!)
@Paul,I guess nobody tried something like this.. :) If you want to investigate it, i'll show you a minimal example next weekend.

I try to set the priority.

Edit: Hm, the manual says there is already a default-priority set.
So i just swapped the creation (initialization-) order. No change.

I'll try the ECP/DPA ("Preemption") Bits tomorrow..

Edit:mad:tni: 8 bit writes results in even more corrupt lines.
interesting is, that not single pixel are black - when it happens, the whole rest of the display-line is black.
this means, it's the WRITING that does not work (otherwise more than one line would be black)
Then, i checked the error-flag for the writing-dma. it is not set.
 
Last edited:
It may or may not be related,
but there seems to be something special with DMA
1) you can easily read from write to a buffer MA is writing to (this is typically done in dual-buffer DMA applications) as long you do not access the word DMA is using.
2) program (DMA?) always crashes when accessing memory location DMA is using (e.g. copying the data where DMA is writing to)

My experience is, that as long you avoid the memory location DMA is supposed to use, you can read one memory locations before and after
I have no Teensy with me, so I cannot prove it, but the DMA engine gives you the location where it read/writes from/to, so that could easily be tested.
 
Warning: you are all much more experienced with the DMA than I am. So take with grain of salt!

My main stuff I have played around with the DMA is to take your DMA version of the ILI9341 library and incorporate it into my ILI9341_t3n library with a few caveats.
In particular I am mostly interested in one shot updates of the screen. So I looked for way to fix the issue you ran into where your update once code actually updates twice, once the non-dma update and then one dma update.... I fixed it by outputting the first pixel using non-dma, which updates the upper word of the pushr register, so it properly setup the right stuff (two word output, CS, DC, CONT)... Then started DMA but the first buffer I removed the first word from being updated... Can go into more details if interested...

But what I am wondering is, do both of these transfers go at about the same speed, or does one of them run at a slower speed... That is guessing, is if you have two DMAs, one going out to SPI and the other coming in from SDCard? if it might work out that instead of using continuous updates that you maybe play with single shot update?
Again not sure if I am explaining this well or if this makes sense, but if you have both of these transfers setup to use multiple DMASetting objects chained to each other,
if you could hypothetically setup the DMASetting chain on the one filling your frame buffer with maybe an interrupt happening at some percentage done with the fill, that your interrupt handler would then use (to setup the one shot update of the screen)... Could be with either the end of one item or at the midpoint of one item?

Again not sure if this makes sense or not...
 
What happens if you pause the SPI update to the screen? It seems the screen doesn't do double buffering.

With the default DMA configuration with no preemption, the DMA controller will be idle after each 2-byte SPI transfer. Your line draw DMA transfer will execute all at once in the middle of the SPI transfer, so the SPI transfer will get temporarily paused. 16-bit and 8-bit line transfers are slower, so the pause will be longer.

I strongly suspect there is no DMA corruption and you just see the effect of the pause and potentially partially updated lines (the first part of the line was transmitted when you do you perform line draw DMA).

BTW, memcpy is quite close in performance to 32-bit DMA transfers in my tests for RAM -> RAM transfers. It's much faster than copy loops (even unrolled). GCC doesn't want to utilize LDR / STR with automatic pointer increment.
 
@wmxz: that describes eactly what i'm seeing.

@tni, i'll make a test program.
i know that DMA is not faster - but it runs "in background" and i can do other things during the "memcopy/memset" :) So, it saves time this way.

it's not critcal, and i CAN use a memcy (indeed, i'm alreading doing this) - but i want to optimize some things, and this seemed to be an optimal place to begin with .. :)
 
Last edited:
Well, i'm not 100% sure wether this is the same effect or not, because this test is totally different from what i'm exactly doing.

Code:
#include <DMAChannel.h>

DMAChannel dma1, dma2;

const unsigned ssize = 32;
const unsigned dsize = 8;
DMAMEM unsigned src;
DMAMEM unsigned srcbuf[ssize];
DMAMEM unsigned dstbuf[dsize];

void setup() {
  delay(1000);
  Serial.println("start");
  memset(srcbuf, 1, sizeof(srcbuf));
  memset(dstbuf, 9, sizeof(dstbuf));

  src = 0x12345678;
  dma1.begin(false);
  dma1.triggerContinuously();
  //dma1.disableOnCompletion();
  dma1.source(src);
  dma1.destinationBuffer(srcbuf, sizeof(srcbuf) );
  dma1.enable();

  dma2.begin(false);
  dma2.triggerContinuously();
  // dma2.disableOnCompletion();
  dma2.sourceBuffer(srcbuf, sizeof(srcbuf));
  dma2.destinationCircular(dstbuf, sizeof(dstbuf) );
  dma2.enable();
}


void loop() {
  for (unsigned i = 0; i < dsize; i++) {
    if (dstbuf[i] != src) {
      Serial.print(" ERROR at ");
      Serial.print(i);
      Serial.print(": ");
      Serial.print(dstbuf[i], HEX);
      Serial.println();
      delay(100);
    }
    if (dma1.complete()) Serial.println("DMA 1 complete");
    if (dma2.complete()) Serial.println("DMA 2 complete");
  }
  Serial.println();
}

would you expect that dstbuf[] is completely 0 (or do i have a bug in the code above)?

output is... 0(zero) for dstbuf.
- if you remove the dma2.enable, the output is as expected.
- if you test srcbuf[] instead, it contains all 0x12345678.. (which is correct)
- if you enable the "disableOnCompletion" and insert a delay(100); between both transfers, the output shows no errors. I hope this shows that the code is correct(?)

I'm not sure, if it is really the same as my SPI-Problem, and the test can be wrong.
My "real" program uses SPI as "final destination" and therefore SPI as trigger, too, and a single destination address.
 
Last edited:
2) program (DMA?) always crashes when accessing memory location DMA is using (e.g. copying the data where DMA is writing to)
IME, that works just fine. Do you have a test program?
Here is a test sketch that constantly reads a memory buffer that is written to by DMA. A FTM timer is used for DMA transfer triggering. It works perfectly fine.
Code:
#include <DMAChannel.h>
#include <array>

DMAChannel dma;

volatile uint32_t dma_source = 0;
std::array<volatile uint32_t, 1024> dma_dest = {};

void setupDMA() {
    dma.source(dma_source);
    dma.destinationBuffer(dma_dest.data(), 4*dma_dest.size());
    dma.triggerAtHardwareEvent(DMAMUX_SOURCE_FTM1_CH0);
    dma.enable();

    FTM1_SC = 0x00;
    FTM1_CNT = 0x0000;
    FTM1_MOD = 30;
    FTM1_CNTIN = 0;
    FTM1_C0SC = FTM_CSC_CHIE | FTM_CSC_MSA | FTM_CSC_ELSA | FTM_CSC_DMA; // output compare, enable DMA
    FTM1_SC = FTM_SC_CLKS(1); // enable timer with busclock
}

void setup() {
    Serial.begin(9600);
    delay(2000);
    Serial.println("Hello");

    setupDMA();
    uint32_t ctr = 0;
    while(true) {
        ctr++;
        dma_source = ctr;
        volatile uint32_t dummy;
        for(size_t i = 0; i < 1000; i++) {
            for(uint32_t v : dma_dest) dummy = v;
        }
        Serial.print(ctr);
        Serial.print(" dummy: ");
        Serial.print(dummy);
        Serial.print("  DMA ptr: ");
        Serial.println((uint32_t) dma.destinationAddress(), HEX);
    }
}

void loop() {}
 
But he's right. I see - very short and rare - due to the high refresh.rate - blinking pixels on the display - if i use a "memset" instead of the writing-dma-channel.
Tried my test program from post #11 ?
 
The sketch from #11 is buggy. dma1 is running continuously, so dma2 never gets a chance to run. Print dma2.destinationAddress(), it stays constant.
 
Hm, where does the zero content come from ?
there's a memset(dstbuf, 9, sizeof(dstbuf)); (before starting any channel)
 
Last edited:
BTW, don't use "destinationCircular()". It needs a buffer aligned to the buffer size (which must be a power of 2). The DMA controller masks the last bits of the buffer address. If the buffer isn't aligned properly, you end up writing to the wrong destination.
 
Thanks for the tip :) I learned something new.
But removing the "circular" does not change much in this case.
 
Ok, thanks for your time :) I'm pretty confident now, that it is a "silicon" problem, and not my code.
That's what i wanted to know.
I'll try to check the adresses before doing my writes - this will slow it down, and I hope using DMA still makes sense (saving time).
 
Last edited:
Change your dma2 code to:
Code:
  dma2.begin(false);
  dma2.triggerContinuously();
  dma2.sourceBuffer(srcbuf, sizeof(dstbuf));
  dma2.destinationBuffer(dstbuf, sizeof(dstbuf) );
  dma2.enable();

Your source adjustment is invalid and dma2 hangs (but not before reading from invalid memory from before srcbuf start).
 
Last edited:
It needs to be different buffersizes ! The goal is not to make the code running .The goal is to show a hardware-problem.
If you use eqal buffersize, the channels run synchounous and never use the same RAM-location.

Can you explain where the "0" come from, please ? The only possibilitly is that dma2 writes them.
 
dma1 is using "sizeof(srcbuf)", dma2 uses "sizeof(dstbuf)" - so they are different.

What your code was doing is:

TCD->NBYTES = 4;
TCD->SLAST = -sizeof(srcbuf);
TCD->DLASTSGA = -sizeof(dstbuf);
TCD->BITER = sizeof(dstbuf) / 4;
TCD->CITER = sizeof(dstbuf) / 4;

So the DMA start address is is adjusted too much and with each DMA iteration you wander towards the beginning of the memory. Starting with the second DMA iteration, you are not reading from srcbuf anymore, but memory in front of it.
 
OK, that means that we need a better test :)

In the meantime i added the "check-adress-and-wait-if-near-test" :) to my main program. No random black lines anymore. @wmxz, thanks for the hint :)
Need to check now if it still saves time or my more simple 32-bit memset is better.
 
Last edited:
Status
Not open for further replies.
Back
Top