Concurrent DMA-READ and DMA-WRITE to same RAM area

Frank B · Mar 21, 2017

Hi,
the topic says it

First, pleae excuse that i can't show the code.. nobody wants to read 3000 lines.
If it is really needed i can invest 2 hours next weekend and write a miniml example.

I already mentioned it some weeks ago:
If you use two channels, the first reads and the other channels writes (asynchronously) to the same RAM-Area, the reading channel reads wrong values (sometimes!). Or, the writing is corrupt (?) (hard to say..)

My Question: Is this
a) documented
b) a silicon bug ?

And, more important, is there a workaround ?

Background: I'm using DMA to continously transfer a 153KB area to the ILI-Display (SPI) (-> reading channel)
To speed up things i fill certain areas in the RAM with a concurrent DMA-Channel.
The display shows random(!) black and white lines, ~2-3 times a second.

This does NOT happen, when filling the RAM with a loop in c.

tni · Mar 21, 2017

Is the alignment and transfer size the same? If not, you might be seeing partial updates. Depending on how you set up the DMA arbitration, the channels could be switching back and forth, giving you the line pattern.

Frank B · Mar 21, 2017

Hm. Yes, alignment is the same. Transfer-Size: Not sure what you mean

The reading-transfer is 16BIT from a circular 320*240*2 byte buffer. It does not stop when it#s ready and starts from beginning automatically (a continous transfer). Target is SPI.

For my test:
The writing is one line (for testing, 320*240*2, too) from a constant 32BIT value (in RAM, not flash) to the same circular buffer, but this time it transfer one line (160*32Bit) with disableOnCompletion.
The software waits until it is ready and then restarts this transfer with the next line.

Thank you for the hint - i'll try to use 16BIT for the writing, too. Let's see what happens

PaulStoffregen · Mar 21, 2017

This is the first I've heard of such an issue.

Maybe try playing with the priority levels of the DMA channels. Like most DMA troubleshooting, this is pretty much working blind, but maybe if it has any effects maybe that will lead you to more understanding of the problem?

Frank B · Mar 21, 2017

@TNI: Using 16BIT DMA-write is even worse: More of these "corrupt" lines (again, random !!)
@Paul,I guess nobody tried something like this..

If you want to investigate it, i'll show you a minimal example next weekend.

I try to set the priority.

Edit: Hm, the manual says there is already a default-priority set.
So i just swapped the creation (initialization-) order. No change.

I'll try the ECP/DPA ("Preemption") Bits tomorrow..

Edit

tni: 8 bit writes results in even more corrupt lines.
interesting is, that not single pixel are black - when it happens, the whole rest of the display-line is black.
this means, it's the WRITING that does not work (otherwise more than one line would be black)
Then, i checked the error-flag for the writing-dma. it is not set.

WMXZ · Mar 22, 2017

It may or may not be related,
but there seems to be something special with DMA
1) you can easily read from write to a buffer MA is writing to (this is typically done in dual-buffer DMA applications) as long you do not access the word DMA is using.
2) program (DMA?) always crashes when accessing memory location DMA is using (e.g. copying the data where DMA is writing to)

My experience is, that as long you avoid the memory location DMA is supposed to use, you can read one memory locations before and after
I have no Teensy with me, so I cannot prove it, but the DMA engine gives you the location where it read/writes from/to, so that could easily be tested.

KurtE · Mar 22, 2017

Warning: you are all much more experienced with the DMA than I am. So take with grain of salt!

My main stuff I have played around with the DMA is to take your DMA version of the ILI9341 library and incorporate it into my ILI9341_t3n library with a few caveats.
In particular I am mostly interested in one shot updates of the screen. So I looked for way to fix the issue you ran into where your update once code actually updates twice, once the non-dma update and then one dma update.... I fixed it by outputting the first pixel using non-dma, which updates the upper word of the pushr register, so it properly setup the right stuff (two word output, CS, DC, CONT)... Then started DMA but the first buffer I removed the first word from being updated... Can go into more details if interested...

But what I am wondering is, do both of these transfers go at about the same speed, or does one of them run at a slower speed... That is guessing, is if you have two DMAs, one going out to SPI and the other coming in from SDCard? if it might work out that instead of using continuous updates that you maybe play with single shot update?
Again not sure if I am explaining this well or if this makes sense, but if you have both of these transfers setup to use multiple DMASetting objects chained to each other,
if you could hypothetically setup the DMASetting chain on the one filling your frame buffer with maybe an interrupt happening at some percentage done with the fill, that your interrupt handler would then use (to setup the one shot update of the screen)... Could be with either the end of one item or at the midpoint of one item?

Again not sure if this makes sense or not...

tni · Mar 22, 2017

What happens if you pause the SPI update to the screen? It seems the screen doesn't do double buffering.

With the default DMA configuration with no preemption, the DMA controller will be idle after each 2-byte SPI transfer. Your line draw DMA transfer will execute all at once in the middle of the SPI transfer, so the SPI transfer will get temporarily paused. 16-bit and 8-bit line transfers are slower, so the pause will be longer.

I strongly suspect there is no DMA corruption and you just see the effect of the pause and potentially partially updated lines (the first part of the line was transmitted when you do you perform line draw DMA).

BTW, memcpy is quite close in performance to 32-bit DMA transfers in my tests for RAM -> RAM transfers. It's much faster than copy loops (even unrolled). GCC doesn't want to utilize LDR / STR with automatic pointer increment.

tni · Mar 22, 2017

WMXZ said:
2) program (DMA?) always crashes when accessing memory location DMA is using (e.g. copying the data where DMA is writing to)

IME, that works just fine. Do you have a test program?

Frank B · Mar 22, 2017

@wmxz: that describes eactly what i'm seeing.

@tni, i'll make a test program.
i know that DMA is not faster - but it runs "in background" and i can do other things during the "memcopy/memset"

So, it saves time this way.

it's not critcal, and i CAN use a memcy (indeed, i'm alreading doing this) - but i want to optimize some things, and this seemed to be an optimal place to begin with ..

Frank B · Mar 22, 2017

Well, i'm not 100% sure wether this is the same effect or not, because this test is totally different from what i'm exactly doing.

Code:

#include <DMAChannel.h>

DMAChannel dma1, dma2;

const unsigned ssize = 32;
const unsigned dsize = 8;
DMAMEM unsigned src;
DMAMEM unsigned srcbuf[ssize];
DMAMEM unsigned dstbuf[dsize];

void setup() {
  delay(1000);
  Serial.println("start");
  memset(srcbuf, 1, sizeof(srcbuf));
  memset(dstbuf, 9, sizeof(dstbuf));

  src = 0x12345678;
  dma1.begin(false);
  dma1.triggerContinuously();
  //dma1.disableOnCompletion();
  dma1.source(src);
  dma1.destinationBuffer(srcbuf, sizeof(srcbuf) );
  dma1.enable();

  dma2.begin(false);
  dma2.triggerContinuously();
  // dma2.disableOnCompletion();
  dma2.sourceBuffer(srcbuf, sizeof(srcbuf));
  dma2.destinationCircular(dstbuf, sizeof(dstbuf) );
  dma2.enable();
}


void loop() {
  for (unsigned i = 0; i < dsize; i++) {
    if (dstbuf[i] != src) {
      Serial.print(" ERROR at ");
      Serial.print(i);
      Serial.print(": ");
      Serial.print(dstbuf[i], HEX);
      Serial.println();
      delay(100);
    }
    if (dma1.complete()) Serial.println("DMA 1 complete");
    if (dma2.complete()) Serial.println("DMA 2 complete");
  }
  Serial.println();
}

would you expect that dstbuf[] is completely 0 (or do i have a bug in the code above)?

output is... 0(zero) for dstbuf.
- if you remove the dma2.enable, the output is as expected.
- if you test srcbuf[] instead, it contains all 0x12345678.. (which is correct)
- if you enable the "disableOnCompletion" and insert a delay(100); between both transfers, the output shows no errors. I hope this shows that the code is correct(?)

I'm not sure, if it is really the same as my SPI-Problem, and the test can be wrong.
My "real" program uses SPI as "final destination" and therefore SPI as trigger, too, and a single destination address.

tni · Mar 22, 2017

tni said:
WMXZ said:

2) program (DMA?) always crashes when accessing memory location DMA is using (e.g. copying the data where DMA is writing to)

Click to expand...

IME, that works just fine. Do you have a test program?

Here is a test sketch that constantly reads a memory buffer that is written to by DMA. A FTM timer is used for DMA transfer triggering. It works perfectly fine.

Code:

#include <DMAChannel.h>
#include <array>

DMAChannel dma;

volatile uint32_t dma_source = 0;
std::array<volatile uint32_t, 1024> dma_dest = {};

void setupDMA() {
    dma.source(dma_source);
    dma.destinationBuffer(dma_dest.data(), 4*dma_dest.size());
    dma.triggerAtHardwareEvent(DMAMUX_SOURCE_FTM1_CH0);
    dma.enable();

    FTM1_SC = 0x00;
    FTM1_CNT = 0x0000;
    FTM1_MOD = 30;
    FTM1_CNTIN = 0;
    FTM1_C0SC = FTM_CSC_CHIE | FTM_CSC_MSA | FTM_CSC_ELSA | FTM_CSC_DMA; // output compare, enable DMA
    FTM1_SC = FTM_SC_CLKS(1); // enable timer with busclock
}

void setup() {
    Serial.begin(9600);
    delay(2000);
    Serial.println("Hello");

    setupDMA();
    uint32_t ctr = 0;
    while(true) {
        ctr++;
        dma_source = ctr;
        volatile uint32_t dummy;
        for(size_t i = 0; i < 1000; i++) {
            for(uint32_t v : dma_dest) dummy = v;
        }
        Serial.print(ctr);
        Serial.print(" dummy: ");
        Serial.print(dummy);
        Serial.print("  DMA ptr: ");
        Serial.println((uint32_t) dma.destinationAddress(), HEX);
    }
}

void loop() {}

Frank B · Mar 22, 2017

@tni: You're using only one DMA channel.

tni · Mar 22, 2017

Frank B said:
@tni: You're using only one DMA channel.

Yes. My interpretation of wmxzs post is that he is using the CPU for reading the memory.

Frank B · Mar 22, 2017

But he's right. I see - very short and rare - due to the high refresh.rate - blinking pixels on the display - if i use a "memset" instead of the writing-dma-channel.
Tried my test program from post #11 ?

tni · Mar 22, 2017

The sketch from #11 is buggy. dma1 is running continuously, so dma2 never gets a chance to run. Print dma2.destinationAddress(), it stays constant.

Frank B · Mar 22, 2017

Hm, where does the zero content come from ?
there's a memset(dstbuf, 9, sizeof(dstbuf)); (before starting any channel)

tni · Mar 22, 2017

BTW, don't use "destinationCircular()". It needs a buffer aligned to the buffer size (which must be a power of 2). The DMA controller masks the last bits of the buffer address. If the buffer isn't aligned properly, you end up writing to the wrong destination.

Frank B · Mar 22, 2017

Thanks for the tip

I learned something new.
But removing the "circular" does not change much in this case.

Frank B · Mar 22, 2017

Ok, thanks for your time

I'm pretty confident now, that it is a "silicon" problem, and not my code.
That's what i wanted to know.
I'll try to check the adresses before doing my writes - this will slow it down, and I hope using DMA still makes sense (saving time).

tni · Mar 22, 2017

Change your dma2 code to:

Code:

  dma2.begin(false);
  dma2.triggerContinuously();
  dma2.sourceBuffer(srcbuf, sizeof(dstbuf));
  dma2.destinationBuffer(dstbuf, sizeof(dstbuf) );
  dma2.enable();

Your source adjustment is invalid and dma2 hangs (but not before reading from invalid memory from before srcbuf start).

Frank B · Mar 22, 2017

It needs to be different buffersizes ! The goal is not to make the code running .The goal is to show a hardware-problem.
If you use eqal buffersize, the channels run synchounous and never use the same RAM-location.

Can you explain where the "0" come from, please ? The only possibilitly is that dma2 writes them.

tni · Mar 22, 2017

dma1 is using "sizeof(srcbuf)", dma2 uses "sizeof(dstbuf)" - so they are different.

What your code was doing is:

TCD->NBYTES = 4;
TCD->SLAST = -sizeof(srcbuf);
TCD->DLASTSGA = -sizeof(dstbuf);
TCD->BITER = sizeof(dstbuf) / 4;
TCD->CITER = sizeof(dstbuf) / 4;

So the DMA start address is is adjusted too much and with each DMA iteration you wander towards the beginning of the memory. Starting with the second DMA iteration, you are not reading from srcbuf anymore, but memory in front of it.

Frank B · Mar 22, 2017

OK, that means that we need a better test

In the meantime i added the "check-adress-and-wait-if-near-test"

to my main program. No random black lines anymore. @wmxz, thanks for the hint

Need to check now if it still saves time or my more simple 32-bit memset is better.

Concurrent DMA-READ and DMA-WRITE to same RAM area

Senior Member

Well-known member

Senior Member

Well-known member

Senior Member

Well-known member

Senior Member+

Well-known member

Well-known member

Senior Member

Senior Member

Well-known member

Senior Member

Well-known member

Senior Member

Well-known member

Senior Member

Well-known member

Senior Member

Senior Member

Well-known member

Senior Member

Well-known member

Senior Member