Audio Library disable/enable interrupts, why?

Greg-R

Member
I'm working on a 4-output I2S class for the Open Audio Library. I'm trying to find the reason why code in the update() method of the 2-output I2S is done with interrupts disabled in some parts of the code. Here is an example from the 2-output I2S class update() with the disable/enables:

Code:
  //now that we have our working memory, proceed with getting the audio data and processing
    block_f32 = receiveReadOnly_f32(0); // input 0 = left channel
        // Optional scaling for easy volume control.  Leave outputScale==1.0f for default
        if(outputScale<1.0f || outputScale>1.0f)
           arm_scale_f32 (block_f32->data, outputScale, block_f32->data, block_f32->length);
        scale_f32_to_i32(block_f32->data, block_f32_scaled->data, audio_block_samples);

         //now process the data blocks
        __disable_irq();
        if (block_left_1st == NULL) {
            block_left_1st = block_f32_scaled;
            block_left_offset = 0;
            __enable_irq();
        } else if (block_left_2nd == NULL) {
            block_left_2nd = block_f32_scaled;
            __enable_irq();
        } else {
            audio_block_f32_t *tmp = block_left_1st;
            block_left_1st = block_left_2nd;
            block_left_2nd = block_f32_scaled;
            block_left_offset = 0;
            __enable_irq();
            AudioStream_F32::release(tmp);
        }

I did a search and found this comment by Chip Audette where he says the disable/enable interrupts are not necessary:

https://forum.pjrc.com/index.php?threads/custom-audio-object-causes-lock-ups.75891/#post-350039

But if I search through either the I16 or F32 libraries, this is often done in the update() methods. Why?

Thank you,
Greg
 
That sort of low level interrupt stuff will be because there is an associated DMA completion interrupt which swaps buffers over.
 
It's the simplest method of ensuring all of the block variables get updated atomically. If interrupts weren't disabled, another routine could run when only some of the variables had been changed.
 
Teensy has priority nested interrupts. Audio uses 2 of the priority levels. Those update() functions run from a low priority interrupt, which is the reason the audio library can process audio automatically while other code and Arduino libraries run. Because it's a low priority, nearly all other libraries needing interrupts can also still work. Audio also uses a higher priority interrupt to respond to the DMA controller having completed moving data between buffers in memory and the actual audio inputs & outputs. Those DMA event interrupts can interrupt the lower priority update() functions. They do only the very quick work of swapping which memory the DMA controller will access in the near future. When the lower priority code needs to hand off data or otherwise set the conditions that higher priority code will use when it runs, the higher priority interrupt needs to be prevented from running while the variables it will use are being altered and would be seen in an inconsistent state if it happened to run at just that moment.
 
So you need to block the isr() which is run by the DMA in the middle and at the end of the data buffer.
Thank you for the description of the audio system interrupt scheme! That is very helpful.
I'm trying to think of some way of diagramming or modelling this problem. Following the code back and forth
between update() and isr() with pointer manipulations is not easy.

So how do you test or verify this sort of system? I would think that if the isr() runs during a crucial area of the update(), the result would be horrendous audio distortion. But then maybe the state of a particular system never goes there?
I removed the disables/enables in my quad output code, and it runs perfectly.
How would I create a system which is more likely to run the isr() during the update?
Or is the answer a collection of test cases?
 
Code protected inside a __disable_irq()/__enable_irq() bracket is called a critical section - datastructures shared with an ISR must be handled with critical sections, its quite simple. Then the ISR and the critical sections get serialized atomically without interfering with each other.

Even if the datastructure is a single memory location it may need a critical section (if for instance you read it then write a new value).

Some processor's require memory barrier calls as well if the caching hardware is different between DMA engine and the processor.
 
So how do you test or verify this sort of system?

Maybe you're thinking from unit testing or other automated software tests? That sort of testing can be really valuable for higher level code. But it's not always so useful for low-level code, especially code that is essentially a device driver.

Even with higher level code, certain type of problems like security vulnerabilities due to buffer overflow can be notoriously difficult to find only by automated testing.

I removed the disables/enables in my quad output code, and it runs perfectly.

You're playing with fire!

This is analogous to deleting seemingly unnecessary buffer size checking and then not observing any crashes during normal operation with ordinary data. Or even with unusual data. Modern software practice has a long legacy of buffer overflow bugs that went unnoticed for many years even in very widely used programs, even after extensive testing.

The extreme difficulty of race conditions in low level driver code makes those thorny buffer overflow problems seem simple. This is an incredibly hard problem.

Like how so much high level software today is written in memory safe languages to prevent buffer overflow problems, the usual approach to this sort of low level code is a conservative design where you either use a specific head-tail queue that's known to be safe, or you always disable interrupts while accessing the shared data.
 
When the lower priority code needs to hand off data or otherwise set the conditions that higher priority code will use when it runs, the higher priority interrupt needs to be prevented from running while the variables it will use are being altered and would be seen in an inconsistent state if it happened to run at just that moment.
Why isn't the higher priority IRQ simply masked instead of completely disabling all interrupts?
 
Maybe you're thinking from unit testing or other automated software tests? That sort of testing can be really valuable for higher level code. But it's not always so useful for low-level code, especially code that is essentially a device driver.

Even with higher level code, certain type of problems like security vulnerabilities due to buffer overflow can be notoriously difficult to find only by automated testing.



You're playing with fire!

This is analogous to deleting seemingly unnecessary buffer size checking and then not observing any crashes during normal operation with ordinary data. Or even with unusual data. Modern software practice has a long legacy of buffer overflow bugs that went unnoticed for many years even in very widely used programs, even after extensive testing.

The extreme difficulty of race conditions in low level driver code makes those thorny buffer overflow problems seem simple. This is an incredibly hard problem.

Like how so much high level software today is written in memory safe languages to prevent buffer overflow problems, the usual approach to this sort of low level code is a conservative design where you either use a specific head-tail queue that's known to be safe, or you always disable interrupts while accessing the shared data.

Removing the interrupt disables was a quick test to see if it would fail. Next, I am going to reduce the CPU frequency in steps to see if I can make it fail, and if it does fail, then add the interrupt disables back in to see if that fixes the problem.

One thing about this particular case seems like it should make it less dangerous. The calls to the output object isr() from the DMA are synchronous or at least semi-synchronous? It would seem that it has to be, because the I2S needs to be feed data continuously in order to not have gaps in the output audio.
So the interrupts are not like an asynchronous random button push. They can't be completely random with regards to the output data stream. It seems like they must occur in a restricted time interval.

So what causes the update() function to run? You mentioned it's from a lower priority interrupt. But it has to be running on a regular schedule, to allow for a continual flow of data. How is that timed? The question I have now is why can't the update() function do all of the interleaving. Then the interrupts from the DMA which run the isr() (in the output object) wouldn't be necessary. The DMA would still be activated by the I2S.
There is an interesting bit of code in the isr():

Code:
       if (AudioOutputI2SQuad_F32::update_responsibility) AudioStream_F32::update_all();

What does that do?
 
IIRC updates are triggered by the first input or output class that initializes, such classes have synchronous I2S interrupts typically. Every buffer full of samples the class triggers a software interrupt on leaving its ISR I think. This then calls all the update() methods in the network of AudioObjects
 
Yup, that's how it works.

Newer versions of the core library have a fallback to IntervalTimer if no inputs or outputs took responsibility for calling update_all. That stuff is in AudioStream.cpp in the core library.


The question I have now is why can't the update() function do all of the interleaving.

Please understand I can't answer every question for lack of time, but will take just a moment on this one.

Maybe the interleaving work could be done in the lower priority update() rather than the high priority DMA event interrupt. How beneficial that would be is an open question.

Again I want to emphasize that much like use of memory safe languages to prevent buffer overflow bugs, the common practice with data sharing to interrupts is an overly conservative approach. Race conditions are virtually impossible to fully test. They tend to be extremely hard to reproduce after they do strike, which makes finding and fixing really, really difficult. An overly conservative approach that prevents those problems is usually considered a good trade-off, even if it costs slightly more CPU time, memory, latency for other interrupts, and so on.

Asking these questions is fine. Experimenting with stuff is fine. Maybe you will find a better way?
 
Yup, that's how it works.

Newer versions of the core library have a fallback to IntervalTimer if no inputs or outputs took responsibility for calling update_all. That stuff is in AudioStream.cpp in the core library.




Please understand I can't answer every question for lack of time, but will take just a moment on this one.

Maybe the interleaving work could be done in the lower priority update() rather than the high priority DMA event interrupt. How beneficial that would be is an open question.

Again I want to emphasize that much like use of memory safe languages to prevent buffer overflow bugs, the common practice with data sharing to interrupts is an overly conservative approach. Race conditions are virtually impossible to fully test. They tend to be extremely hard to reproduce after they do strike, which makes finding and fixing really, really difficult. An overly conservative approach that prevents those problems is usually considered a good trade-off, even if it costs slightly more CPU time, memory, latency for other interrupts, and so on.

Asking these questions is fine. Experimenting with stuff is fine. Maybe you will find a better way?

"Extremely hard to reproduce" and "conservative approach that prevents those problems".
That's what I was looking for, the specific justification for disabling/enabling interrupts.

So the interrupt disables are now well understood and will stay in the F32 output object. But I'm still going to proceed with more experiments later after the next release of T41 radio code (the reason why I developed the F32 quad output class).

In another thread you mentioned you are planning to expand the Teensy documentation. I vote for this:

https://www.pjrc.com/teensy/td_libs_AudioProcessorUsage.html

At the bottom, "Understanding Audio Library Scheduling".
Thank you for your comments and answers, cool stuff!

Greg
 
I have a question that I think is answered by this thread, so I’ll post my question and answer here and see if anyone can confirm it.

Q: Should the AudioEffectWaveShaper’s shape() method use __disable_irq() / __enable_irq() ?
A: Yes, otherwise it’s possible for this->waveshape to be in an inconsistent state (partially updated) when update() runs.

Specifically I’m looking at adding the protection around these lines in the shape() method, like so:
C++:
void AudioEffectWaveshaper::shape(float* waveshape, int length)
{
  ...
  __disable_irq()
  if(this->waveshape) {
    delete [] this->waveshape;
  }
  this->waveshape = new int16_t[length];
  for(int i = 0; i < length; i++) {
    this->waveshape[i] = 32767 * waveshape[i];
  }
  __enable_irq()
  ...
}

I’m hoping this might solve an occasional pop that I get in the audio output that’s really hard to reproduce. Or at least rule out another possible cause. Does this sound sensible?

(If it does, I think there’s might be a similarly unprotected update of a couple of variables here in AudioFilterStateVariable::frequency)
 
In general, any call to an audio object method is fairly likely to cause a more or less audible “pop”, even if the underlying code is optimally written (which as you note, this isn’t). Say your incoming audio waveform has reached an amplitude of 0.5; if your old waveshape[] maps that to 0.75, and the new maps it to 0.25, then … pop.

This specific code can be rewritten without resorting to the sledgehammer that is disabling interrupts:
C++:
void AudioEffectWaveshaper::shape(float* waveshape, int length)
{
  ...
  int16_t* tmp_waveshape = this->waveshape;
  this->waveshape = nullptr; // update() will now transmit silence
  // if(this->waveshape) { // no need, C++ allows delete of nullptr
  delete [] tmp_waveshape;
 
  tmp_waveshape = new int16_t[length];
  for(int i = 0; i < length; i++) {
    tmp_waveshape[i] = 32767 * waveshape[i];
  }
  // ***** this code needs protecting too! *****
  // set lerpshift to the number of bits to shift while interpolating
  // to cover the entire waveshape over a uint16_t input range
  int index = length - 1;
  lerpshift = 16;
  while (index >>= 1) --lerpshift;
    
  this->waveshape = tmp_waveshape; // update() can now run OK
  ...
}
Of course this now “risks” getting a 2.9ms silence, because an update() may happen in the middle of the waveshape change. You could use AudioNoInterrupts() in your sketch to prevent that, if it mattered.

For the filter code, yes, interrupts probably do need to be off as there’s no atomic way of updating two variables. But the expensive floating point calculations should be done outside that critical section, with only a simple copy of the results inside it.
 
Why not set up the new array before deleting the old one, so the critical moment boils down to simply assigning a new pointer to this->waveshape.
Because heap fragmentation. But it is an alternative approach, though the calculation of lerpshift still needs to be in the same critical section, because if your new shape table is smaller than the old one, and you get an update() with a small table and a big lerpshift, Bad Things could happen. So:
  1. create new array; compute new lerpshift
  2. assign nullptr to this->waveshape
  3. assign new lerpshift
  4. assign new array to this->waveshape
In this instance the null pointer acts to guard the critical section, because of the object's architecture - that's not always possible, and definitely needs proper comments in the code so future editors can appreciate the reasoning behind the ordering of the statements.
 
If that is really a concern, put waveshape and lerpshift into a struct and dynamically allocate them together, using a single pointer to access both values.
Heap fragmentation is really not a concern on the T4.x boards.
 
If that is really a concern, put waveshape and lerpshift into a struct and dynamically allocate them together, using a single pointer to access both values.
That would work, too.

Heap fragmentation is really not a concern on the T4.x boards.
I'm probably just paranoid about fragmentation, in most Teensy scenarios. But it's always possible, and impossible to prove that it can't happen, which is why you're only allowed statically allocated memory in medical software (which is where I come from, hence the paranoia!). In this case, I'd rather risk a rare dropout in the audio than deliberately write code which could contribute to fragmentation.

Curious to know why you think it's "really not a concern on the T4.x boards", though? There's nothing special about T4.x or its libraries which prevents a sufficiently pathological sketch fragmenting the heap to the point where a malloc() fails ... and there's probably a ton of places where that's not checked in libraries. In fact, the new in the above code can't throw an exception, so there's a crash waiting to happen; using malloc() and free() would actually work better there.
 
Curious to know why you think it's "really not a concern on the T4.x boards", though?
Because they have enough heap space that unless you're specifically trying to cause fragmentation (i.e. allocating discretely sized blocks so that new space constantly has to be used instead of reusing freed space), it's not going to happen. dlmalloc (included in newlib) is very tried and tested and has no trouble "binning" allocations of specific sizes.
There's nothing special about T4.x or its libraries which prevents a sufficiently pathological sketch fragmenting the heap to the point where a malloc() fails ... and there's probably a ton of places where that's not checked in libraries. In fact, the new in the above code can't throw an exception, so there's a crash waiting to happen; using malloc() and free() would actually work better there.
I don't really know what you mean here, exceptions are disabled so new and malloc are basically the same thing; checking for NULL being returned should be done for both.
(There's a bit more subtlety to this with regards to how classes should safely handle allocating additional memory in their constructors, but that's not relevant here since only an array of native types is being allocated.)
 
Because they have enough heap space that unless you're specifically trying to cause fragmentation (i.e. allocating discretely sized blocks so that new space constantly has to be used instead of reusing freed space), it's not going to happen. dlmalloc (included in newlib) is very tried and tested and has no trouble "binning" allocations of specific sizes.

I don't really know what you mean here, exceptions are disabled so new and malloc are basically the same thing; checking for NULL being returned should be done for both.
(There's a bit more subtlety to this with regards to how classes should safely handle allocating additional memory in their constructors, but that's not relevant here since only an array of native types is being allocated.)

How about using std::vector? I have used vector successfully (no detected failures) due to the nice features that regular arrays do not have.
I've seen lots of notes of caution about using standard library stuff in embedded applications. So far so good with Teensy 4.1. Risky or not?
 
Heap allocation is never guaranteed to succeed. Heap fragmentation is real thing and dlmalloc is not magical cure. I have seen it fragmenting memory too. So as long as you do hobby project, or music app that will just stop playing and you will just restart it once your allocator fails, you can do this. But for critical systems that life depends upon you can't. Not only medical. Dynamic memory allocation is forbidden in many other critical applications (mission critical embedded systems, aerospace ) when you can't afford risk.
In my own embedded code I am using static allocation whenever I can (I don't use Audio library except just part of i2s output code)
 
Last edited:
What @tomas said. And std::vector will be using new under the hood, so all you’re doing is losing control while gaining convenience. You can guess, but not know, when memory is going to be allocated and freed; if it’s in your app, in code you wrote, then you have some idea.

This is why I’d advocate against using dynamic memory wherever possible inside libraries - the user is likely to be unaware of it. Even if it’s documented, who reads documentation?!

I once wrote code for a medical device that started off being based on FreeRTOS, while negotiations were going on to license SafeRTOS, its pre-certified sibling. The latter does not have dynamic allocation…

All that being said, for many applications the relatively vast heap of Teensy 4.x combined with its modern heap implementation does seem to work stably, even when abused horribly. My virtual modular synth (which I really must get back to…) not only dynamically allocates audio objects at design time, but then creates and destroys them as notes start and end. And yes, it does use std::vector :)
 
Back
Top