T4.1: DMA for asynch memory copy from EXTMEM?

Netzapper

Member
Hi, I'm working on a digital synth with Teensy 4.1 as the core.

Part of my design loads a bunch of 4k-16k blocks of preprocessed synthesis input data (not PCM audio) from SD card to EXTMEM, and then needs to repeatedly copy individual blocks into an OCRAM buffer for use just in time as they're scheduled to be used in the synth. The data blocks get thoroughly math'd multiple times within the AudioStream::update(), so leaving them in EXTMEM is not viable. (In fact I may eventually buffer them a third time to TCM, although I understand the DMA engine doesn't have access there.)

For the moment at least, I have plenty of spare CPU time to just schedule the copy operations into the idle loop... but I really want to use the idle time to draw pretty pictures to the screen. So, DMA? It's been a net speedup every time I've been able to use it for asynchronous copies on other platforms.

I see the DMAChannels header. It would seem to me I could set the source buffer, destination buffer, and counts. Then have it self-disable or IRQ on finish (depending on how I do the rest of the accounting for blocks), and tell it to go and self-retrigger until complete. But I don't see a simple memory-memory copy like that in the few places I checked library code for DMA usage--the audio library seems all device-memory or memory-device.

But I've got questions...

How do I choose which DMA channel? How can I know which are currently reserved? Can I expect DMA to work "transparently" with the memory-mapped QPI running the EXTMEM chips? Is there an example of memory-memory DMA somewhere in the Teensy codebase?
 
In fact I may eventually buffer them a third time to TCM, although I understand the DMA engine doesn't have access there.
It does. DMA can access anywhere the CPU can (besides the CPU cache - which isn't directly addressable anyway).

I see the DMAChannels header. It would seem to me I could set the source buffer, destination buffer, and counts. Then have it self-disable or IRQ on finish (depending on how I do the rest of the accounting for blocks), and tell it to go and self-retrigger until complete. But I don't see a simple memory-memory copy like that in the few places I checked library code for DMA usage--the audio library seems all device-memory or memory-device.

But I've got questions...

How do I choose which DMA channel? How can I know which are currently reserved? Can I expect DMA to work "transparently" with the memory-mapped QPI running the EXTMEM chips? Is there an example of memory-memory DMA somewhere in the Teensy codebase?
Start with a DMASetting instance, set the misc settings that aren't going to change between calls.
For the actual transfers use a DMAChannel instance (which allocates a channel automatically), assign it the pre-configured DMASetting, attach an isr and call triggerManual() to start the transfer.

The point of using a DMASetting is so you can set the parameters for the next transfer while the previous one is running, without extra steps like allocating extra channels.

As far as making things more efficient, DMA still uses memory bandwidth - if the CPU is also accessing memory there may not be enough bus cycles to go around.
 
This is all very good information!

Is the TCM not on its own bus? Or is memory bandwidth shared between TCM and OCRAM?

I had hoped I could run UI update code, which is mostly SPI traffic to the OLED, while doing the DMA copies. But if I'm sharing bandwidth between DMA and code execution that might be less of a win than I'd hoped.

In any case, thank you so much for the point in the right direction!
 
Back
Top