Issues with DMAChannels

jmarsh

Well-known member
I'm having a problem related to Teensyduino's DMAChannels/eDMA module and am curious to hear any suggestions.

My project is for the T4. I have display driver code that constantly uses a pair of DMAChannels to render data. These channels are "bursty" - they activate infrequently to transfer large amounts of data from one part of memory to another.
I also have an I2S DAC connected (a PCM5102 but that's not particularly relevant) driven by the Audio library using an AudioOutputI2S object. This also uses a DMAChannel to feed the SAI TDR register as required to maintain continuous playback.

The problem: when the bursty DMA channels activate, they are given priority over the audio DMA channel. This lasts long enough that the audio output underruns (which the audio library doesn't attempt to handle) and it shuts off completely. The entire Audio library stops functioning due to the updates not being triggered.

The DMA channels operate in fixed priority mode. Higher channel numbers by default are given priority over lower ones. The DMAChannels are more-or-less initialized statically, which means their order is up to the whim of the compiler.
Inevitably the AudioOutputI2S object seems to get initialized first, meaning its DMAChannel ends up using channel 0 with the lowest priority.
It's a protected member of the class so it's inaccessible without breaking the API, meaning I can't use the DMAPriorityOrder() functions to give it higher priority than the display DMA channels.
The DMA engine has individual registers for setting each channel's priority, but you can't touch them once the DMA engine has been initialized without running the risk of triggering a priority conflict error (they are 8-bit registers and every channel must have a unique value).
The DMAChannel library only manages the lowest 16 channels; there are another 16 channels above those, but there is no API method to allocate them. If you use them you run the risk of stomping on someone else's code. Plus the upper 16 channels belong to group 1 while the lower 16 channels are all group 0, and group 1 by default has higher priority while I want my display channels to be lower priority.
I could reverse this by rewriting the DMA_CR register to switch the channel group priorities. But if another DMAChannel is initialized after this, it will reinitialize the DMA_CR register to the default setting (group 1 back to higher priority over group 0). Likewise for reconfiguring DMA_CR to active round-robin mode rather than fixed priority mode.

So in short, I'm stuck. I want my code to be portable and usable with other libraries, but the DMAChannel library isn't flexible enough and will potentially stomp on any workarounds I try to use.
 
You could try creating the AudioOutputI2S object last using new in your setup() code, along with the required AudioConnections to connect it to the rest of your design. I’m assuming here (no code posted…) there’s nothing else that will take on update responsibility.

I know Paul added some code to deal with updates not happening if the design had no object with update responsibility, e.g. with only USB output. I can’t find it in the codebase right now, so you may need to tinker a bit to get things happening in the order needed.

With the stock Audio library you’re OK to create objects at run time, just not to delete them.
 
…found it, it’s in where the AudioMemory is being set up. So, make sure you create your AudioOutputI2S before you call AudioMemory().
 
USB audio output works fine, that's how I tracked the problem down to the audio DMA channel being starved of runtime.
Dynamic object creation doesn't really fix the problem, it just shifts it; in this case my display driver is for a USB device, so in the ideal programming model the driver would only be instantiated when the device gets connected. Then the same issue would reappear.

The real problem is there's no way of properly setting DMA priorities, or alternatively specifying the desired channel to use* (since they effectively have fixed priorities). Even the existing DMAPriorityOrder functions are ineffective since they just swap the TCD pointers without transferring the contents, which wouldn't work with something like AudioOutputI2S since it only initializes the TCD one time during object creation.

(* Being able to specify a desired channel would allow periodic DMA triggering to be used, since that is limited to DMA channels 0-3 along with PIT channels 0-3.)
 
I was thinking in terms of a workaround for this:
The DMAChannels are more-or-less initialized statically, which means their order is up to the whim of the compiler.
Inevitably the AudioOutputI2S object seems to get initialized first, meaning its DMAChannel ends up using channel 0 with the lowest priority.
The mention of Paul’s timer mod to create updates if there’s no object with update responsibility was only to help with potential odd behaviour if you thought you’d created the AudioOutputI2S to pace the audio, when in fact you might not have done so.

Your post #4 appears to contradict #1 - first it was an issue with uncontrolled ordering of static objects (a known C++ issue), then you’ve started talking about things being instantiated at connection time.

Maybe you could allocate some DMA channels, then the AudioOutputI2S, then deallocate the DMA channels again, so the USB uses them? You’d need to look at the DMAchannel code to get an idea of whether that stands a chance of success.

It seems pretty unlikely, at least in the short term, that any changes to the Teensyduino API are going to happen, though you could code up what you want and submit a PR. But TD releases seem to take nearly a year each, and new functionality often takes several cycles before it’s adopted…
 
Trying to influence the dice roll of which DMA channel gets assigned to which object is not a solution. There isn't even a way to know which channel the AudioOutputI2S object has been assigned because it's a protected member.

I'm thinking maybe the simplest solution would be to add a "pre-emptible" bool to the DMAChannel constructor which would default to false (which matches existing behaviour). Channels initialized with that flag set would have preemption enabled and be allocated from 0-15 (group 0), while channels without the flag would have "can preempt" ability and be allocated from 16-31 (group 1). Channels could still be prioritized relative to other channels within the same group, if anyone is in fact using the PriorityOrder functions.
 
As noted above ... yes, your suggestion may be "best", but it's highly unlikely to get into Teensyduino soon, even less likely if you don't put the work in to generate a PR which meets Paul's criteria.

From your original post it sounds as if you have a use case very specific to you, maybe a product you want to market, or at least something you can publish on github; it's unclear if it's a library or an application. You would obviously like your code to stand a chance of still working for future releases of Teensyduino. As you also noted, within the existing API, you're stuck for a robust formal solution at this time. That's your baseline.

If you care to put in the work, you could try my suggested workarounds, or any others that occur to you. Yes, it will probably involve you unprotecting various classes so you can observe the system behaviour for various orders of execution, instantiation and so forth, but you can still re-test your system using a vanilla Teensyduino once you think you have it working.

Should you decide to do that, you will obviously have to look at the DMAChannel code very closely. A cursory inspection suggests that it's eminently possible to choose exactly which DMA channel is used (it's code, not a "dice roll"...), at least at construction of the AudioOutputI2S object, provided you do so using new in setup(). It'll be kludgy and risks not working if the DMAChannel code gets a major overhaul (unlikely, IMHO), but you really only need to re-test for each Teensyduino release.

If none of the above is palatable to you, then I'd have to agree you're right. You're stuck.
 
A solution to the specific problem in the opening post isn't what I created the thread for. I have other DMA-based code that performs table lookups by poking values into DMA TCDs and chaining them together; it's not triggered by external hardware signals so I'm pretty sure it would have the same issue if I tried to use audio at the same time.

We need a bulletproof way to create either a low priority DMA channel that won't block others, or a high priority channel that can't be blocked. At the moment we have neither, at best we can set relative priorities of channels that are created (but not yet initialized!) and accessible by the same piece of code, which is impractical when channel usage is sprinkled throughout libraries.
(The Audio library in particular always allocates a couple of channels regardless of them actually ever getting used, triggered merely by #including the Audio.h header.)

A cursory inspection suggests that it's eminently possible to choose exactly which DMA channel is used (it's code, not a "dice roll"...)
It's not when you have no idea how many static DMA channels there are in total, how many were initialized before you, or if some have been released causing the mapping to be fragmented...

The DMAChannel code as it currently exists is already broken. The T4 code also has at least one bug due to copy-pasting the T3 code. An overhaul (not that I would consider my proposed changes that extensive; handling shared IRQs is really not difficult) would not be a bad thing. At the very least it requires some attention.

Frankly I'm surprised this issue hasn't been discovered before given how common it is for projects to combine DMA driven ILI9XXX displays with the Teensy audio board...
Edit: now that I look around it has come up at least once, with the suggested solution being: using preemptive channels. "Audio can probably tolerate many microseconds of latency" not so much in my case, apparently.
 
Last edited:
Added support for pre-empt/non pre-emptible channels: https://github.com/A-Dunstan/cores/commit/331ad25279fd87351aee8e9ee6b69497339b5b41

The default is the old behaviour (non pre-emptible), so the Audio library and everything else that already uses DMAChannels will work as they did before. But now for my bursty channels, I can use channel.begin(false, true) to get a channel that will be both lower priority and pre-emptible.

Out of interest I timed how long the bursty transfers typically take to run, and they run for 666us at the beginning of each frame to transfer 32KB, then the two channels alternate transfers taking 333us to move 16KB each. So I suspect it's that initial long transfer at the beginning of the frame that is starving the I2S audio. Shrinking the size of transfers isn't an option since that increases the amount of interrupts per frame.
 
Looks good, from a glance at the code. Is there a "simple" scenario that would enable me to do a bit of testing? It's obviously a bit hardware dependent ... I have various audio boards and TFT screens I could roll out ... can SPI transfers to the screen be made to use DMA, e.g. save/restore an area? Thinking aloud here...
 
This example runs a (very stupid) DMA channel to demonstrate:
Code:
#include <DMAChannel.h>
#include <Audio.h>

DMAChannel dma(false);

AudioSynthWaveform waveform1;
AudioOutputI2S audio_out;
AudioConnection pc1(waveform1, 0, audio_out, 0);
AudioConnection pc2(waveform1, 0, audio_out, 1);

uint32_t g_int1;
uint32_t g_int2;

static void dma_isr(void) {
  static int count;
  if (++count == 1000000) {
    count = 0;
    digitalToggleFast(LED_BUILTIN);
  }
  dma.clearInterrupt();
}

void setup() {
  Serial.begin(0);
  while (!Serial);

  AudioMemory(15);
  waveform1.begin(WAVEFORM_SINE);
  waveform1.frequency(440);
  waveform1.amplitude(0.3);

  pinMode(LED_BUILTIN, OUTPUT);
  digitalWriteFast(LED_BUILTIN, LOW);

  // change second parameter here to true to make this channel pre-emptible
  dma.begin(false, false);
  dma.attachInterrupt(dma_isr);
  dma.triggerContinuously();
  dma.interruptAtCompletion();
  dma.sourceBuffer(&g_int1, sizeof(g_int1));
  dma.destinationBuffer(&g_int2, sizeof(g_int2));
  dma.triggerAtCompletionOf(dma);
  delay(3000);
  dma.triggerManual();
}

void loop() {}
If you leave dma.begin(false, false) as it is, that DMA channel will hog all the operation time and the sine wave audio will cut out as soon as the 3 second delay ends. Changing it to dma.begin(false, true) lets the audio continue as expected.
 
Back
Top