Issues with DMAChannels

Just done a quick test, and all appears to be well for my use case :) I've been able to remove the machinations with dma_channel_allocated_mask, and audio and async display updates continue to co-exist peacefully.
 
Time for some hackery...
I wanted my code (that uses pre-emptible channels) to work with either the "new" DMA begin function (that has two arguments) or the original, so I made some templates:
Code:
template <typename C>
void DMABEGIN(C& dma, void (DMAChannel::*)(bool, bool)) {
  dma.begin(true, true);
}
template <typename C>
void DMABEGIN(C& dma, void (DMAChannel::*)(bool)) {
  dma.begin(true);
}
Then instead of calling begin() directly I use this:
Code:
DMAChannel ch(false);
...
DMABEGIN(ch, &DMAChannel::begin);
and the compiler magically works out whether to invoke the one or two parameter version. Templated functions means an error won't be thrown when the two-parameter version of begin() doesn't exist because that template won't be instantiated.
 
I think I've discovered a very big oversight in the audio library that is contributing to the "I2S loves to underrun when DMA is delayed" issue.
This is where the transmit watermark is set. Here's a quote from the reference manual:
The transmit FIFO request flag sets when the number of entries in any of the enabled
transmit FIFOs is less than or equal to the transmit FIFO watermark configuration and
clears when the number of entries in each enabled transmit FIFO is greater than the
transmit FIFO watermark configuration.
See the problem here? The watermark is being set at the minimum level - so I2S isn't requesting DMA until it's practically empty. In fact if you check the I2S-based SPDIF output modules, they use a value of 0 meaning they're basically running on empty the entire time.

The transmit FIFO watermark should be high so that the TX FIFO is kept as full as possible.
 
The transmit FIFO watermark should be high so that the TX FIFO is kept as full as possible.
Wouldn't this increase latency somehow? Some project needs minimum latency, others might need more tolerance to DMA underruns, so maybe the FIFO watermark should be somewhat configurable, preferably?

Marc
 
The latency of 1 audio block is already there: data is copied from the audioblock into a temporary buffer, which then gets DMA'd to I2S as it requests it.

What it looks like to me is that code from Teensy 3.x was blindly reused for Teensy 4.x despite the watermark setting operating differently.
 
Another fun bug discovered:
- create a static DMAChannel object, using the "DMAChannel(bool allocate)" method with allocate set to false. No initialization is performed, the members of the class will be set to zero (static memory is cleared at device startup).
- call release() on the object. DMA Channel 0 gets nuked despite not being owned by this DMAChannel instance.

The same thing can be triggered with dynamically allocated DMAChannel instances, except in that case the channel that gets nuked is random since it depends on the previous contents of the allocated memory. Note that delete'ing an instance causes the destructor to automatically call release().

The fix: set the default value for the "channel" class member to DMA_NUM_CHANNELS.
 
What it looks like to me is that code from Teensy 3.x was blindly reused for Teensy 4.x despite the watermark setting operating differently.
I kept meaning to look at this ... and now I have ... and I'm not sure I'm seeing what you're seeing.
  • both seem to use a FIFO watermark level of 1
  • Teensy 3.x have an 8-word FIFO, 4.x have 32
  • both the reference manuals seem to say that the FRF (FIFO request flag) is set when the level drops below the configured watermark
    • i.MX RT1060 rev 3 section 38.5.1.4.4
    • K66 rev 2 section 61.4.1
I agree a setting of 1 is bonkers, of course. It would seem more sensible to request when half-empty, and set the DMA channel's NBYTES to half a FIFO-full (4 words for 3.x, 16 for 4.x), and scale back the major loop count to match. I could be persuaded that higher watermarks with smaller transfers would be even better ... for some value of "better".
 
I agree a setting of 1 is bonkers, of course. It would seem more sensible to request when half-empty, and set the DMA channel's NBYTES to half a FIFO-full (4 words for 3.x, 16 for 4.x), and scale back the major loop count to match. I could be persuaded that higher watermarks with smaller transfers would be even better ... for some value of "better".
A value of 1 means there's no wiggle room at all - once a sample is sent, DMA must deliver the next sample within 1/f else an (unrecoverable) underrun will occur. This easily happens if DMA is busy doing anything at all besides waiting around for I2S to make a request. Even with pre-emption the latency to switch channels can be too long.
A value of 0 being used for SPDIF is especially nonsense, when each sample is 64 bits.

I wouldn't agree with changing NBYTES, that could cause other priority issues. Only the watermark needs to be increased so that the FIFO functions as a proper buffer.
 
Last edited:
Sure, I got that. But your post #35 suggests there’s a difference between the K66 and RT1062 watermarks, and I’m just not seeing that. It seems to have been done wrong for both…
 
It was based on the code comment in the T3.x configuration block:
Code:
I2S0_TCR1 = I2S_TCR1_TFW(1);  // watermark at half fifo size
If that is incorrect then yes, T3.x is broken too.
 
I've added this to my list of issues to investigate.

At this moment I'm testing a T4 bootloader chip update, so I can't look at this or much of anything else until next week.

While a bit embarrassing to admit, my gut feeling is this FIFO threshold setting goes all the way back to the earliest audio library code when Teensy 3.0 was the newest model. The MK20DX128 chip used on Teensy 3.0 had only 4 word FIFO.

Edit - note to future me, we might have I2S support for Teensy LC, even if few people (or anyone) really uses it. FIFO on Teensy LC is only 1 word. Testing should also look at whether I2S works at all on Teensy LC, and if it does, try not to break it.
 
Last edited:
At this moment I'm testing a T4 bootloader chip update, so I can't look at this or much of anything else until next week.
About that, when I was testing auto-suspending flash writing I quite often ended up getting the flash stuck in a state that the bootloader couldn't seem to handle - likely a suspended program operation was active when the bootloader took control, and more writing is not possible until the suspended op is cleared. It's definitely not a super-important case to handle but until a physical power cycle it does tend to make the Teensy act like it's bricked.
 
Reporting in here on some progress I've made on fixing FIFO lengths, and knock-on "improvements".

I started in on this, and found I was doing a lot of copy/paste, which really bugged me. So since it seems likely we'll need to do some pretty thorough testing with real hardware, I bit the bullet and did a thing which IMHO should have been done ages ago, i.e. make an SAI base class. It's still a work in progress, but should eventually allow getting rid of a huge bunch of extra near-duplicates of code for using I²S, TDM, etc. etc. on SAI2 as well as SAI1.

The work-in-progress can be found at https://github.com/h4yn0nnym0u5e/Audio/tree/feature/longer-FIFOs - I've done some testing with I2S, I2S2 and TDM, but it's early days yet.
 
I've just re-read all 46 message and tried to make a list of all the unresolved issues.

Audio library issues:
  • output DMA underrun handled badly, audio stops (msg #11 test case)
  • allocates some DMA channels which are never used (msg #8)
  • under-utilizes FIFO on Teensy 4
Core library DMAChannel issues or limitations:
  • public API for relative priority levels only works before DMA initialized
  • release() on uninitialized channel overwrites channel 0 (msg #37)
  • public API lacks explicit DMA priority levels
  • public API lacks access to preemption
Did I miss anything important? Other issues I saw mentioned have been fixed or are addressed by the new 1.09 bootloader.

Also mentioned in msg #21 is the need for rebuilding the VGA R2R ladder hardware. Is it this hardware described at this VGA_t4 library? I'd be happy to make a PCB and send you one. :)
 
That sounds about right to me, though @jmarsh will have the definitive opinion. I believe he’s done all the necessary work on the DMAChannel side, just a PR needed.

On the audio side, the stuff I’ve been doing has got way out of hand! I’ll try to make a branch which solely fixes the FIFO utilisation issue for any Teensy - I found there are depths of 32, 8 or 4, depending on the model. The spurious allocation of DMA channels is fixed in my PR#503.
 
@jmarsh, since the DMA ISR has to go through a dispatcher anyway because of the shared vectors, do you think it would be sensible to add an optional “context” parameter to the attached user ISR? Having just a single void* would make so much difference to e.g. the SAI-independent audio code…
 
I've just re-read all 46 message and tried to make a list of all the unresolved issues.

Audio library issues:
  • output DMA underrun handled badly, audio stops (msg #11 test case)
  • allocates some DMA channels which are never used (msg #8)
  • under-utilizes FIFO on Teensy 4
Core library DMAChannel issues or limitations:
  • public API for relative priority levels only works before DMA initialized
  • release() on uninitialized channel overwrites channel 0 (msg #37)
  • public API lacks explicit DMA priority levels
  • public API lacks access to preemption
Did I miss anything important? Other issues I saw mentioned have been fixed or are addressed by the new 1.09 bootloader.
Additional core issues off the top of my head:
- library only uses 16 out of 32 channels
- transferSize() thinks 16 bytes is a valid data size when it should be 8 (hardware implementation difference between Teensy3 vs Teensy4). There could be additional source/destination functions created to handle a uint64_t argument, but that's possibly not the best idea since it would cause API deviation between T3/T4.

Also mentioned in msg #21 is the need for rebuilding the VGA R2R ladder hardware. Is it this hardware described at this VGA_t4 library? I'd be happy to make a PCB and send you one. :)
The R2R my FlexioVGA code uses is only for 16 colors (4 bpp) but still manages to use 14 resistors:

Code:
/* R2R ladder:
 *
 * GROUND <------------- 536R ----*---- 270R ---*-----------> VGA PIN: R=1/G=2/B=3
 *                                |             |
 * INTENSITY (13) <---536R -------/             |
 *                                              |
 * COLOR: R=11/G=12/B=10  <-----536R------------/
 *
 * (duplicate above three times each for R, G and B - intensity input is shared between all)
 *
 * VSYNC (34) <---------------68R---------------------------> VGA PIN 14
 *
 * HSYNC (35) <---------------68R---------------------------> VGA PIN 13
 */

@jmarsh, since the DMA ISR has to go through a dispatcher anyway because of the shared vectors, do you think it would be sensible to add an optional “context” parameter to the attached user ISR? Having just a single void* would make so much difference to e.g. the SAI-independent audio code…
I'm not very keen on that idea simply because it's not consistent with any other ISRs. If it was going to be changed in this way it would make more sense to me to move from using a simple function pointer to a function object to allow lambdas to be used (which makes this sort of thing sooo much easier to write).
 
Back
Top