Teensy 4.1 DMA priorities and preemption

miciwan · Sep 20, 2021

Does anyone have any practical experience with DMA, priorities and preemption on Teensy 4/4.1?

I have a process that reads out FlexIO shift registers with a DMA into a larger buffer(s) in local memory. Once the buffer is full (interrupt is triggered), I switch to the next one and do a copy from that filled readout buffer to a larger one living in EXTMEM (reading directly to the buffer in EXTMEM doesn't really work, I believe the EXTMEM is just too slow to handle how the data is read).

Inside the interrupt handler fired when the readout buffer is fill, I can just do a memcpy to that large buffer in EXTMEM and it works just fine. But since I also need to do some processing on the data in that buffer elsewhere, I would prefer not to stall the CPU and use another DMA for that copy. But this is when things start to break.

I can kick off that copy DMA, but when I do, the one that reads the data from FlexIO starts hitching - it skips some beats and misses some data. The main suspect is how DMA channels are handled internally - they aren't really running concurrently, but rather there's an arbitration mechanism that picks the one to execute given cycle. And my thinking is that the DMA performing the copy is taking place when the readout should be happening, which makes it miss some of the data.

Technically, IMXRT1060 has two schemes for arbitration DMA request: priority based (on by default) and round-robin. Then, on top of that, the priority based one allows for configuring preemption: whether each channel can be suspended by a higher priority request and if a channel can suspend lower priority channels. But that doesn't really seem to do much: setting the "copy-out" DMA to be lower priority than the "readout", and setting its ECP bit (so that it can be suspended by the higher priority channels) doesn't really change much in the timings - they *are* different, that's for sure, but the the readouts still hitch.

Has anyone had any more luck with playing with the priority and preemption settings and would like to share their experiences?

miciwan · Sep 22, 2021

Ok, maybe slightly different (somewhat related though) question

Does anyone know if there's any control over arbitration of access to EXTMEM? For example, IOMUX has the bits in IOMUXC_GPR_GPR2 register to control the priorities of access through AHB and DMA - is anyone aware of similar settings for FlexSPI? Or some other mechanism that would make FlexSPI pick DMA requests with higher priority?

PaulStoffregen · Sep 23, 2021

miciwan said:
Has anyone had any more luck with playing with the priority and preemption settings and would like to share their experiences?

OctoWS2811 uses DMA priority order. Probably not relevant to your situation, but you did ask and that is the 1 place I've definitely made use of the channel priority settings.

miciwan said:
Or some other mechanism that would make FlexSPI pick DMA requests with higher priority?

While I can't completely rule it out, I'm afraid that sort of functionality probably doesn't exist.

In general, the overall theme is these chips are filled with really amazing peripherals and features, each of which seems to be designed in relative isolation by different engineering teams or even different companies. So you don't tend to see features like any specific peripheral such as FlexSPI able to have any awareness of which other part of the chip is accessing it.

miciwan · Sep 23, 2021

Thanks for the answer Paul.

I believe the reason I'm not really seeing much of a difference when playing with priorities is related to that second question. While all the DMA machinery is doing the readouts and copies, I wanted to do some processing of the read-out data in EXTMEM, and that's what the "main thread" is doing. Contention in the access to EXTMEM is probably causing some stalls in the copy-out DMA, which in turn cause hitching in the read-out DMA. This seems to be consistent with what I'm seeing when I disable the processing part - the copy out becomes much faster and the read out hitches considerably less (though it's not completely eliminated).

Anyway, it looks like memcpy in the DMA completion it is. Not really a big deal, since I can do what I need with that setup just fine, DMAing it out was more of a test how far this could be taken rather than anything else.

miciwan · Sep 24, 2021

FWIW I dug it out in the docs. I won't have a chance to try it out today, but for reference:

https://www.nxp.com/docs/en/application-note/AN12437.pdf is a really good application note with details on 1060 buses and how they are interconnected. The important bit is that diagram:

so DMA and core access the EXTMEM through the same path, through SIM_M7 (and then SIM_EMS). Reference manual lists a bunch of registers for controlling SIM_M7 fabric, most importantly, on page 1752, read_qos and write_qos for masters 0 and 1 - Core and DMA - which define the arbitration priorities for reading and writing (and by default DMA has lower priority)

miciwan · Sep 24, 2021

Oh, BTW (just to derail everything further), Paul, one thing that I noticed while looking into all this: startup.c clears the entire AHBCR register for FLEXSPI2, here:

https://github.com/PaulStoffregen/c...b277586610f43ebd95233e/teensy4/startup.c#L366

(and I just confirmed, it's all zeroed out at runtime), which disables the AHB prefetch, write buffering and caching using the prefetch buffer. Was this intentional or just a typo? I'm asking mainly in the context of the next few lines of that file, which actually set the size and priorities for these buffers.

Teensy 4.1 DMA priorities and preemption

miciwan

Active member

miciwan

Active member

PaulStoffregen

Well-known member

miciwan

Active member

miciwan

Active member

miciwan

Active member