Issues with DMAChannels

I'm not very keen on that idea simply because it's not consistent with any other ISRs. If it was going to be changed in this way it would make more sense to me to move from using a simple function pointer to a function object to allow lambdas to be used (which makes this sort of thing sooo much easier to write).
I hear you - it absolutely isn't consistent. But the old API would have to be left in place anyway, and it's "simply" another overload of the attachInterrupt() method, I think. Unless I've missed something

The option to use lambdas would be fine, and as you say way more flexible. I seem to recall there was a lot of to-ing and fro-ing around doing this for the IntervalTimer library, so actually that's one existing example where the ISR / callback isn't a simple static vector. I don't recall the details, just that it was a bit more complex to design than was immediately apparent.

I'm sure you can appreciate the issues arising from dealing with two SAI peripherals, each of which can have two DMA interrupts. The same applies to the LPSPI, except there are even more of those...
 
Another option would be that instead of an array of isr_t, dma_handlers[] would be an array of DMAChannel* that gets auto-populated when a channel is assigned to a base class. The actual ISR would be a virtual function of DMAChannel and customizing it would require extending the class.

(This would fix what I believe is a bug in my current code: manually changing priorities can reassign channels without fixing the isr table...)
 
Last edited:
List of issues... trying to stay focused.

Audio library issues:
  • output DMA underrun handled badly, audio stops (msg #11 test case)
  • under-utilizes FIFO on Teensy 4
Core library DMAChannel issues or limitations:
  • public API for relative priority levels only works before DMA initialized
  • release() on uninitialized channel overwrites channel 0 (msg #37)
  • transferSize() needs update, 8 vs 16 bytes on Teensy 4.x vs 3.x (msg #50)
  • public API lacks explicit DMA priority levels
  • public API lacks access to preemption
Not on this list is supporting more than 16 channels, because of the interrupt complexities already mentioned (NXP really didn't design the hardware for more than 16 interrupts). Maybe in the distant future, but for the short term I really want to focus on these important issues for the base 16 channels.
 
I believe that you can't have the DMA pre-emption / priority without expanding to 32 channels. However, as far as I know @jmarsh has already done all the required work there, so the only risk is if he's messed up in some fashion. I don't believe that's massively likely, the only issues I've encountered have been due to design choices, not erroneous code. I have great respect for his design and coding capabilities :)

I've been using the changes for a while now, and @KenHahn and @defragster have also had some experience with it, as it's pretty much required to get simultaneous good graphics and audio performance on larger screens like the ST7796. All of which is going out of focus again, I know, but this really is a vital building block for future improvements. Perhaps a step slightly too far for Teensyduino 1.61 - I wouldn't argue with that. But definitely should be one of the first things in the 1.62 beta, IMHO.
 
Specifically the work that I've done is based around being able to allocate a channel as pre-emptible (assigned from the upper 16 channels) or non-pre-emptible (lower 16 channels). Without both of the hardcoded groups it can't work. We did add code to try avoiding allocating channels that share IRQs - if necessary that could become an allocation fail state and we ditch the IRQ sharing, but I don't really see what my existing code does (checks which DMA channel triggered the IRQ) as a problem.
 
Right now, I'm most concerned about this "entire Audio library stops functioning" problem (msg #1):

The problem: when the bursty DMA channels activate, they are given priority over the audio DMA channel. This lasts long enough that the audio output underruns (which the audio library doesn't attempt to handle) and it shuts off completely. The entire Audio library stops functioning due to the updates not being triggered.

Before API improvements for priority levels and preemption, and before better I2S FIFO utilization, if there are cases where bursty DMA hogging the DMA controller causes the audio library to stop functioning (not merely temporarily but "completely"), I want to address that first.

At first I believed msg #11 was code to reproduce this. But unless I've misunderstood, that code isn't "bursty" DMA. It appears to completely hog 100% DMA utilization forever. Yes, preemption would offer a solution to this problem. But that's not the issue I want to investigate right now...

I want to investigate the reported problem of "bursty" DMA causing audio to shut off completely. A lengthy burst of DMA is expected to temporarily stop audio, if the burst runs at higher priority. But after the burst is over, audio should continue. If the audio library isn't continuing after a burst of insufficient DMA service, fixing that is my top priority. I feel strongly this needs to be fixed first, before other work which would lessen the problem.

I'm also under time pressure on a 1.61 release, because we updated the bootloader for compatibility with more flash memory chips. The new bootloader is of course backwards compatibity to 1.60 and earlier, but only for standard mode. Lockable Teensy in secure mode needs an update to 1.61 to create compatible .ehex files.

If possible, I really want to get a fix into 1.61 for any case where the audio library doesn't resume running after a temporary period of DMA unavailable.

Is there code or known way to reproduce the problem of "entire Audio library stops functioning" with other "bursty" (not 100% usage) DMA?

Or if I've misunderstood the problem (maybe I've misunderstood words like "stops functioning" and "completely"), please explain if there is any reason to believe there is a defect in the audio library before I focus on DMAChannel. I want to be certain audio is always resuming after other temporary bursty access relinquishes the DMA controller.
 
If possible, I really want to get a fix into 1.61 for any case where the audio library doesn't resume running after a temporary period of DMA unavailable.
The easy fix is to simply set FCONT in SAI_TCR4, so that any I2S output will continue operation after a FIFO underrun rather than halting with an unhandled error pending. But it will stutter like crazy and not really address any of the root causes - underruns needs to be avoided, not just made acceptable.

The example code demonstrates a worst case scenario. It would still perform the same if the DMA channel had a large block size and was started using a periodic timer.
 
The easy fix is to simply set FCONT in SAI_TCR4,

Been working on this today. The solution isn't that simple. FCONT just controls whether it stops at FIFO empty versus FIFO underrun. Either way, output stops and to get output restarted one of the flag bits in I2S1_TCSR has be be cleared (write 1 to clear).

Turns out we didn't have defines for those bits. I've added them in the core library.


I added an interrupt to clear the FIFO underrun flag.


However, I'm not 100% sure this is the right way to handle this situation. My concern is a DMA channel is hogging all the DMA bandwidth and that results in an I2S FIFO underrun, the interrupt could run before DMA is really available, putting us into a tight loop of retriggering the underrun interrupt and (maybe) hogging the CPU too. I set the NVIC priority as low as possible to at least not block other interurpts, but still doesn't feel like a great solution.

Also really needs testing with some of the troublesome display DMA. Again, the goal isn't flawless audio. The goal here is just to have audio output recover when DMA bandwidth becomes available. I want to make sure this is 100% fixed before adjusting the FIFO watermark and improving DMA APIs for priority and preemption. Those will lessen and if audio is given priority completely prevent this problem, but if audio doesn't get highest priority (like when using OctoWS2811 which is really timing sensitive), I want be sure we have this underrun handling correct before working on that other stuff.
 
If the underrun ISR were to empty the receive FIFO and fill the transmit FIFO with zeroes, in addition to clearing the error bit, then the next error would occur after 32 samples or about 726μs. That should be enough to avoid CPU hogging, at least on a Teensy 4.x.

Doing that would mean knowing which Teensy you’re compiling for, so you don’t overrun the FIFOs … at which point you may as well fix the watermark setting too, as 95% of the work has been done, and you’re editing that file anyway.

The whole thing then needs to be done in all the SAI based audio files, because everything’s been copied, pasted and hacked about over the years.

Or you could decide that everyone’s managed to live with this for ages, and just get 1.61 (which is urgently needed to support more Flash parts) out without it. Then do a separate 1.62 on a slightly more leisurely timescale to address this properly.

If that’s the route chosen I’ll happily re-start my efforts to make a single SAI base class with everything else derived from it, which makes this fix a lot easier as it’s only needed in one place. But I will need help with testing, as I don’t have a raft of Teensys already connected to hardware so I can check I2S, I2S2, I2Squad, I2Shex, I2Soct, TDM, TDM2, (all in and out), PT8211, PT8211_2, PDM and PDM2. If that’s all of them…
 
Been working on this today. The solution isn't that simple. FCONT just controls whether it stops at FIFO empty versus FIFO underrun. Either way, output stops and to get output restarted one of the flag bits in I2S1_TCSR has be be cleared (write 1 to clear).
When I tested it this was not the case. SAI transmits zeroes when the FIFO underruns for as long as the warning flag stays active, when DMA eventually fills the transmit FIFO the warning flag automatically clears. So the audio continues without any CPU intervention.
The warning flag in TCSR cannot be cleared manually, it's read-only.
 
really needs testing with some of the troublesome display DMA
I've had a go at knocking together a piece of test code. For some reason it doesn't actually interfere with audio, but I post it here in case anyone can see where I've gone wrong. In theory the async display updates which take over 21ms ought to hog DMA enough to stop audio; I've also bumped the SPI speed to 40MHz, to no effect apart from speeding up the screen updates.

The code spits out various metrics so you don't necessarily need the actual hardware to know if things are working.
C++:
/*
 * Test for Audio DMA underrun recovery when
 * starved by display updates.
 *
 * Tested on Teensy 4.1, default settings
 */
#include <ST7735_t3.h>
#include <Audio.h>

//=========================================================

// GUItool: begin automatically generated code
AudioSynthWaveform       wav1;           //xy=201,155
AudioSynthWaveform       wav2;           //xy=227,195
AudioRecordQueue         queue;         //xy=417,91
AudioOutputI2S           i2sOut;         //xy=479,167

AudioConnection          patchCord1(wav1, 0, i2sOut, 0);
AudioConnection          patchCord2(wav1, queue);
AudioConnection          patchCord3(wav2, 0, i2sOut, 1);

AudioControlSGTL5000     audioShield;    //xy=472,250
// GUItool: end automatically generated code

//---------------------------------------------------------
void randomWav(AudioSynthWaveform& wav)
{
  wav.begin(0.5f,110.0f*random(1,7),WAVEFORM_SINE);
}

//=========================================================
// Change pin numbers to suit your hardware!
//                        CS DC RST
ST7735_t3 tft = ST7735_t3{ 9,10,22};
#define TFT_BL 4 // backlight pin
//---------------------------------------------------------
static elapsedMicros euTFT;
uint32_t lastTFT;
void randomRect(void)
{
  static elapsedMillis em = 0; // allow a gap between screen updates
  static bool updating = false;
 
  if (!tft.asyncUpdateActive())
  {
    if (updating)
    {
      updating = false;
      lastTFT = euTFT;
    }

    // stock library takes 21.3ms to update
    // 128x160 at 16MHz SPI speed, so...
    if (em > 30) // ...leave a gap for audio to recover
    {
      em = 0;
      euTFT = 0;
      updating = true;
      
      // choose new rectangle dimensions
      int16_t x=999,y=999,w,h;
      w = random(tft.width() / 2);
      h = random(tft.height() / 2);
      while (x+w >= tft.width())
        x = random(tft.width());
      while (y+h >= tft.height())
        y = random(tft.height());
 
      // write to frame buffer
      tft.fillRect(x,y,w,h,random(65536));   
 
      // update screen
      tft.updateScreenAsync();
    }
  }
}


//=========================================================
void setup()
{
  AudioMemory(20);
  audioShield.enable();
  audioShield.volume(0.05f); // low volume

  queue.begin(); // start check of audio stalling

  // standard TFT display setup
  tft.initR(INITR_BLACKTAB); // ST7735
  //tft.init(); // ST7789
  pinMode(TFT_BL,OUTPUT);
  digitalWriteFast(TFT_BL,HIGH);
  tft.fillScreen(0);
  tft.setRotation(1);
  tft.print("Hello world");

  // now prepare to use async updates
  tft.useFrameBuffer(true);
}

//=========================================================
elapsedMillis audioTimer;
int count, audioCount;
void loop()
{
  if (audioTimer >= 290) // roughly 100 audio updates
  {
    audioTimer = 0;
    if (random(2))
      randomWav(wav2);
    else     
      randomWav(wav1);

    Serial.printf("Count: %d; audio updates: %d; last TFT update took %uus\n", count, audioCount, lastTFT);
    count = 0; 
    audioCount = 0;   
  }

  // count audio updates
  if (queue.available())
  {
    audioCount++;
    queue.clear();   
  }
  randomRect();
  count++;
}
 
Here's my attempt at a new test case. It creates DMA transfer that repeatedly copies the same byte many times in a single minor loop. You can call dma_hog_bandwidth(ms) with the number of milliseconds you want it to consume all DMA bandwidth.

C++:
#include <DMAChannel.h>
#include <Audio.h>

DMAChannel dma;

AudioSynthWaveform waveform1;
AudioOutputI2S audio_out;
AudioConnection pc1(waveform1, 0, audio_out, 0);
AudioConnection pc2(waveform1, 0, audio_out, 1);
AudioControlSGTL5000 sgtl5000_1;

uint8_t src, dest;

static void dma_complete_isr(void) {
  digitalWriteFast(2, LOW);
  dma.clearInterrupt();
}

void dma_hog_bandwidth(float milliseconds) {
  dma.TCD->SADDR = &src;
  dma.TCD->SOFF = 0;
  dma.TCD->ATTR = DMA_TCD_ATTR_SSIZE(0) | DMA_TCD_ATTR_DSIZE(0);
  dma.TCD->NBYTES_MLNO = (uint32_t)(milliseconds * 18750.0);
  dma.TCD->SLAST = 0;
  dma.TCD->DADDR = &dest;
  dma.TCD->DOFF = 0;
  dma.TCD->CITER_ELINKNO = 1;
  dma.TCD->DLASTSGA = 0;
  dma.TCD->BITER_ELINKNO = 1;
  dma.TCD->CSR = DMA_TCD_CSR_INTMAJOR;
  dma.attachInterrupt(dma_complete_isr);
  digitalWriteFast(2, HIGH);
  dma.triggerManual();
}

void setup() {
  dma.begin();
  pinMode(2, OUTPUT); // pin 2 high during DMA
  digitalWriteFast(2, LOW);
  Serial.begin(0);
  //while (!Serial) ; // wait for serial monitor

  AudioMemory(15);
  sgtl5000_1.enable();
  sgtl5000_1.volume(0.8);
  waveform1.begin(WAVEFORM_SINE);
  waveform1.frequency(110);
  waveform1.amplitude(1.0);
}

void loop() {
  dma_hog_bandwidth(50);
  delay(200);
}

Here's how my scope sees it interfering with the audio output.

file.png
 
I committed a change to use most of the FIFO for transmit and about half for receive.


At least for transmitting, the new test case make it easy to experiment with how much DMA disruption the FIFO allows. Setting the watermark at 28 lets it maintain a smooth sine wave even with 0.33 ms DMA hogging.

file.png
 
Back
Top