USB interface for multi channel outputs, not just stereo

Not sure why all the complication, all you have to do is set up an IntervalTimer for a frequency of 128 / 44100 and measure when it actually fires. If it's off by any amount (which it will be due to insufficient resolution) the update drift will accumulate and eventually cause an over or underrun.
 
No it won’t. There are mechanisms in place to feed back the drift amount to the host, so it accommodates the mismatch in clock speeds. This is what Paul’s original code failed to do, and @alex6679 has managed to accomplish.

Any two clocks are bound to disagree by some amount, the resolution available is pretty much irrelevant. My “complicated” investigation was merely intended to demonstrate (to my own satisfaction, at least) that in the face of a well-controlled disagreement, the old USB audio code fails, and the new code works.
 
No it won’t. There are mechanisms in place to feed back the drift amount to the host, so it accommodates the mismatch in clock speeds. This is what Paul’s original code failed to do, and @alex6679 has managed to accomplish.
This entire time, I have been talking about the original code. Because it is a problem that is easily mitigated by not relying on the undocumented IntervalTimer hack and following the instructions in the audio tool about having an unused I2S object.

TL,DR: it shouldn't be documented because it's a hacky fail-safe option that ensures something will at least be output, even if it's a glitchy mess full of dropouts. Better to follow the existing documentation and include an I2S object so a PLL gets used for timing instead.
 
Last edited:
This entire time, I have been talking about the original code. Because it is a problem that is easily mitigated by not relying on the undocumented IntervalTimer hack and following the instructions in the audio tool about having an unused I2S object.

TL,DR: it shouldn't be documented because it's a hacky fail-safe option that ensures something will at least be output, even if it's a glitchy mess full of dropouts. Better to follow the existing documentation and include an I2S object so a PLL gets used for timing instead.
Well, knowing how the original code works is a benchmark, though any deep discussion might be better reserved for a thread of its own, rather than a thread that's specifically addressing improvements...

I agree, the IntervalTimer thing is a nasty hack; I thought I'd suggested as much at the time it was added (TD1.58), but I can't find it. Somewhat aside from the "religious argument" about that aspect, my research efforts were triggered by post #243 where it was apparent that the hack isn't widely known.

I've added one further test variant to the ones I previously mentioned, which is actually a repeated setup but waiting longer for ill effects. And sure enough, just adding an unused I²S object is not the panacea for glitches in the stock USB audio cores. I set up to record for 15 minutes rather than just 20s, and started the recording quite soon after the Teensy had rebooted. Just over 5 minutes into the recording, the glitches showed up... For comparison, I did the same test with the new driver under discussion here: no glitches in the first 15 minutes.

From a test setup point of view, it's probably valuable to set the Teensy to a sample rate that's deliberately slightly off from 44100Hz, as that highlights issues with the rate synchronisation mechanisms. @alex6679, if you want to try this out, it's super easy. Just force use of the IntervalTimer hack by not including another object with update responsibility, and then modify this part of AudioStream.cpp, around line 79 or so. Here I've attempted to set the rate to 44101Hz, though due to the limited resolution you don't get exactly that:
C++:
    if (update_scheduled == false) {
        // if no hardware I/O has taken responsibility for update,
        // start a timer which will call update_all() at the correct rate
        IntervalTimer *timer = new IntervalTimer();
        if (timer) {
            float usec = 1e6 * AUDIO_BLOCK_SAMPLES / 44101;//AUDIO_SAMPLE_RATE_EXACT;
            timer->begin(update_all, usec);
            update_setup();
        }
    }

Here's an updated version of my test code, gives a bit of information about what's being tested:
C++:
#include <Audio.h>

// GUItool: begin automatically generated code
AudioSynthWaveform       wav;      //xy=1294,387
AudioRecordQueue         queue;         //xy=1452,289
AudioOutputI2S           i2sOut;           //xy=1506,364
AudioOutputUSB           usbOut;           //xy=1531,423

AudioConnection          patchCord1(wav, 0, i2sOut, 0);
AudioConnection          patchCord2(wav, 0, i2sOut, 1);
AudioConnection          patchCord3(wav, 0, usbOut, 0);
AudioConnection          patchCord4(wav, 0, usbOut, 1);
AudioConnection          patchCord5(wav, queue);

// GUItool: end automatically generated code

bool led = true;
int count;
void setup()
{
  AudioMemory(40);
  pinMode(LED_BUILTIN,OUTPUT);
  digitalWrite(LED_BUILTIN,led);

  wav.begin(1.0f,441.0f,WAVEFORM_SINE);
  queue.begin();

  while (!Serial)
    ;

  // Hacky diagnostic of which library is in use:
#if defined(USB_AUDIO_NO_CHANNELS_480)
  Serial.printf("Improved audio library in use: %d channels",USB_AUDIO_NO_CHANNELS_480);
#else
  Serial.println("Stock audio library in use: 2 channels"); 
#endif // defined(USB_AUDIO_NO_CHANNELS_480)
 
  // figure out timing source and sample rate
  uint32_t cycles = IMXRT_PIT_CHANNELS->LDVAL;
  if (0 == cycles || cycles > 1'000'000'000)
  {
    // Retrieve exact register settings rather than relying on
    // correct setup - someone might have fiddled things!
    float nmult = CCM_ANALOG_PLL_AUDIO_NUM & CCM_ANALOG_PLL_AUDIO_NUM_MASK,
          ndiv  = CCM_ANALOG_PLL_AUDIO_DENOM & CCM_ANALOG_PLL_AUDIO_DENOM_MASK,
          nfact = CCM_ANALOG_PLL_AUDIO & CCM_ANALOG_PLL_AUDIO_DIV_SELECT(0xFFFFFFFF),
          n1 = 1+(CCM_CS1CDR & CCM_CS1CDR_SAI1_CLK_PRED(0xFFFFFFFF))/CCM_CS1CDR_SAI1_CLK_PRED(1),
          n2 = 1+(CCM_CS1CDR & CCM_CS1CDR_SAI1_CLK_PODF(0xFFFFFFFF));
    float rate = 24'000'000.0f * (nfact + nmult/ndiv) / n1 / n2 / 256.0f;
    Serial.printf("Using I²S timing at %.2fHz (%.2fHz)\n",rate,AUDIO_SAMPLE_RATE_EXACT);
  }
  else   
    Serial.printf("Using IntervalTimer at %.2fHz (%d cycles)\n",128*24'000'000.0f/cycles,cycles);
}

void loop()
{
  // monitor whether audio updates are
  // occurring, just in case
  if (queue.available())
  {
    count++;
    queue.readBuffer();
    queue.freeBuffer();
    if (count > 100)
    {
      count = 0;
      led = !led;
      digitalWrite(LED_BUILTIN,led); 
    }
  }
}
 
Hi, I am a bit late to the discussion. Here are my two cents to the topic:
As already mentioned it is completely normal that the clocks of the host and the Teensy don't match perfectly. I don't think that the clocks between audio host and audio interface ever match. There are different solutions to the problem and we use the asynchronous audio mode and the Teensy constantly tells the host to send audio data a bit faster or slower.
However, believe it or not, but it's just not that easy to tell if we e.g. slowly run out of buffered samples or if we approach a buffer overrun. The main reasons for buffer over- or underruns is not that the clock of the Teensy doesn't perfectly match the clock of the host, but that the assessment of the buffer level fails and/ or that the feedback to the host does not adapt fast enough.
However, of course it doesn't help if the IntervalTimer is not very accurate.
@h4yn0nnym0u5e I will give it a try and use the IntervalTimer for testing.
 
Hi @alex6679. Yes, absolutely appreciate it's not an easy problem!

I think we're all pretty much agreeing with one another about (a) the clocks will never exactly match, and (b) there are issues with the stock implementation of asynchronous mode. I'd hope that my approach to testing is either useful, or sparks an even better test method, so that you can check that any changes you make to your code haven't broken it. I do recall back with the previous attempts that it was very difficult to be sure that synchronisation was working, and I think that's borne out by the fact it took over 5 minutes running to show the stock code doesn't work. Equally, it's not surprising it's not really been an apparent issue, as the glitches can be hard to spot and may not even occur in shorter tests.

I haven't studied your feedback-generation code properly, so can't necessarily be of much help with it. I assume that it's trying to compute an accurate assessment of how many samples the Teensy will generate between successive USB interrupts? We ought to know exactly how many samples are generated per "Teensy second" (i.e. allowing for crystal inaccuracy) for I²S / TDM / internal S/PDIF / ADC / PDM, or for the IntervalTimer; external S/PDIF is another matter, but there is a sample rate register. If there were no USB ISR latency then a "simple" linear regression should easily compute the ratio of Teensy samples per host millisecond. Is that roughly what you're aiming at, along with discarding any outliers where the USB ISR is taken too late to give a helpful prediction?

It appears from my test of the stock code that a close enough estimate can be good for many minutes, so adapting very fast once sync seems to have been achieved may not be required. Also, I believe the USB spec is pretty tight, so the 1ms interval (or faster for higher data rates, of course) should be pretty accurate, which ought to help with detecting outliers.

I'm sure you've been through all this very thoroughly already ... just tossing ideas out, really.
 
Also, I believe the USB spec is pretty tight, so the 1ms interval (or faster for higher data rates, of course) should be pretty accurate, which ought to help with detecting outliers.
Not really. Lowest possible interrupt period for USB is 1/8000 (USB High Speed has 8 microframes per 1 msec frame) but the Teensy is operating as a device - that means the host controls the bus and polls the Teensy to send/receive. Ideally it uses the information in the endpoint descriptor to schedule transfers but that's not a hard guarantee (most hosts will treat the interval as a maximum time between polls) and transfers can also be delayed if the bus is heavily loaded.
If you wanted specific 1ms intervals you would be better off using the SOF (start of frame) interrupt, generated from the host as it drives the bus.
 
I assume that it's trying to compute an accurate assessment of how many samples the Teensy will generate between successive USB interrupts?
The concept is the same as with the original usb implementation: We have a target buffer level. The error is the deviation of the actual buffer level from the target. We use the error as the input for a PI controller and the feedback is the default feedback plus the output of the controller.
The difference of my approach is just how the buffer level is computed. I don't want to go too much into the details, but I don't think that we need to improve that concept. I just need to implement it a bit more efficiently to solve moti7j's problme and I have some ideas about how to do that.

I guess the problem with smaller block size is just how I compute the ring buffer size. E.g. for the input there must always at least a full block available in the buffer.
I just had a brief look at the code for the input buffer size of the usb audio input (usb_audio_interface.h line 93):
C++:
    //no buffered blocks computation
    //
    // +3:
    // +1 we round the number of needed blocks up
    // +1 because we store the block that is ready for transmission in the buffer
    // +1 just to have one block in reserve (prevents buffer overflows and comes at more or less no additional cost (at normal operation no additional latency + no additional memory needed))
    // *2: because to be symmetrically protected against buffer overflow (a high TARGET_RX_BUFFER_TIME_S only prevents buffer underflows)
    constexpr static uint16_t ringRxBufferSize = uint16_t(TARGET_RX_BUFFER_TIME_S / USBAudioInInterface::blockDuration) *2 +3;

At the current implementation I just make sure that there is always a block available for the audio pipeline. But I completely forgot that I also need to make sure that there is alway enough space for the samples that arrive from the host. Just a guess, but maybe that causes the problem for smaller block sizes. You can qickly check that by increasing 'ringRxBufferSize' by a few blocks (that does not increase the latency!). Maybe it solve the problem for the usb audio input.
I think next week-end I will have some time and I will have a closer look at the problem.
 
I just knew you'd be on top of the feedback mechanism!

I have a suspicion that the issue with small block sizes is not the block count in the ring buffer, but that very small blocks are smaller than the USB packet sizes. Your block count calculation is fairly generous, given the TARGET_RX_BUFFER_TIME_S is longer than the USB interval (1ms) and you then double that number, and then add 3! There ought not to be a problem with input - a 44-sample packet fills nearly 3 16-sample audio blocks, so just allocate them, keep track of the spare samples in the 3rd one, and transmit them in turn on the next update()s. For transmission it looked a bit more complex (though as I said I didn't really analyse the code deeply). It looks as if you clear out the ring buffer entirely if there aren't enough samples for the USB packet; with 64-sample blocks, one block is enough, but with 32 you need to leave the block in the ring buffer and send silence, expecting there to be 2 blocks on the next USB interrupt, when you'd be able to send about 1½ of them. That doesn't leave much slack, of course, so you could leave it until 3 blocks accumulate if you don't mind the extra 1.5ms latency.

This is the section of code that's raised my suspicions:
C++:
        if( avail==0 ||    !USBAudioOutInterface::isBlockReady(tBIdx,0)){
            //Something went wrong. We either did not receive a block, or a buffer underrun occured.
            //We will reset the buffer indices and offsets and transmit zeros.
            if( avail==0){
                devCounter=0;    //only reset in case of an underrun and not if USBAudioOutInterface did not receive data in 'update'
                txUsb_audio_underrun_count++;
                resetTransmissionIndex(virtualSamples, iBIdx, tBIdx, offset);       
                for (uint16_t idx =0; idx < USBAudioOutInterface::ringTxBufferSize; idx++){
                    USBAudioOutInterface::releaseBlocks(idx, noTransmittedChannels);
                }
            }
            const uint32_t numBytes = num*noTransmittedChannels*AUDIO_SUBSLOT_SIZE;
            memset(data, 0, numBytes);
        }
I think that whole while() loop shouldn't even start if the ring buffer doesn't have enough samples for the target packet size - just fill the packet with silence.

But I acknowledge I may have completely misread the code...

Oh, and I think I spotted a misspelling - you have methods called setBlockQuite() and setBlocksQuite(), I think you may mean Quiet rather than Quite; personally I prefer to use Silent anyway :)
 
I forgot, you already pointed out a suspicious section of the code. I’ll look into it over the weekend, and I’m sure I’ll also find some time to fix the spelling error. 😅
 
I got kinda intrigued, so had a look at the “small blocks” issue. This has resulted in a PR which I think fixes it. I’ve tested 44k/2ch/16-samples, plus 44/8/16 and 96/8/16, output and input (limited to first two channels). By no means exhaustive, but it was very close to working and the changes are fairly minor, and in only a couple of areas.

Firstly I fixed the ring buffer clearing on startup or re-sync: the buffer indexes were being adjusted, but all the buffered blocks were being discarded, not just the no-longer-needed ones.

Secondly, the transmit_flag and receive_flag system for detecting a stoppage of audio or USB and re-syncing wasn’t quite right. It worked only if audio updates were less frequent than USB interrupts; the new system should work for the high frequency audio updates, too.

Not claiming it’s perfect, but I hope it’s better!
 
Hi,
thanks for having a look at the problem.
The assumption that the interval between two update calls is always longer than the interval between the usb transmit/ receive interrupts is of course not true for small block sizes. Thank you for identifying the main problem.
Regarding clearing the ring buffer: I was never concerned with that in the past. My reasoning was that when something goes wrong, then it's not so important to make the glitch less audible and it's more important to reset everything to a clean, well defined state. The bigger problem in my opinion is that a buffer over- or underrun occured in the first place.
Anyway, in most cases you are right and it's fine to keep the buffer. Sending some samples twice in case of an underrun or skipping some samples if there is an overrun is normally better than clearing everything.
I found some other (minor) issue:
C++:
    uint32_t len=0;
    uint8_t *data = usb_audio_transmit_buffer;
    if (true || samplesAvail > target)
    {
        while (len < target) {
            uint32_t num = target - len;
            uint32_t avail = AUDIO_BLOCK_SAMPLES - offset;
            if( avail==0 ||    !USBAudioOutInterface::isBlockReady(tBIdx,0)){
                //Something went wrong. We either did not receive a block, or a buffer underrun occured.
                //We will reset the buffer indices and offsets and transmit zeros.
                if( avail==0){
                    devCounter=0;    //only reset in case of an underrun and not if USBAudioOutInterface did not receive data in 'update'
                    txUsb_audio_underrun_count++;
                    resetTransmissionIndex(virtualSamples, iBIdx, tBIdx, offset);       
                    for (uint16_t idx=tBIdx; idx != iBIdx;){
                        if (idx)
                            idx--;
                        else
                            idx = USBAudioOutInterface::ringTxBufferSize - 1;
                        USBAudioOutInterface::releaseBlocks(idx, noTransmittedChannels);
                    }
                }
                const uint32_t numBytes = num*noTransmittedChannels*AUDIO_SUBSLOT_SIZE;
                memset(data, 0, numBytes);
            }
            else {
                if (num > avail){
                    num = avail;
                }
                USBAudioOutInterface::copy_from_buffer(data, tBIdx, noTransmittedChannels, offset, num);
        
            }
            data += num*noTransmittedChannels*AUDIO_SUBSLOT_SIZE;
            len+=num;
            offset+=num;
    #ifndef ASYNC_TX_ENDPOINT
            if(devCounter == devCounterThrs){
                updateBufferOffset(sign, devCounter, offset);
            }
    #endif
            if (offset >= AUDIO_BLOCK_SAMPLES) {
                USBAudioOutInterface::tryIncreaseIdxTransmission(tBIdx,offset);
            }
        }
If offset+=num is larger than AUDIO_BLOCK_SAMPLES, then USBAudioOutInterface::tryIncreaseIdxTransmission(tBIdx,offset) does not work correctly.
Also, I think we can completely remove the block-releasing-loop for (uint16_t idx=tBIdx; idx != iBIdx;){. The blocks we release here are released anyway in the update function.
 
I found that the "clear all blocks" code was causing lasting problems with small block sizes, because the ring buffer could never accumulate enough blocks to fill a USB packet - it kept failing the !USBAudioOutInterface::isBlockReady(tBIdx,0) test. With a bigger change to the logic, it would be possible to clear all blocks, then detect the not enough blocks state, use none of them and send a silent packet, and resume sending when enough blocks are available; but you need to store even more state to do that. Not releasing blocks when overrun leaves you with big latency and a high likelihood of another overrun in the near future. All in all, I quite liked the philosophy of resetting the ring buffer to a sensible number of recent blocks - it just needed fixing! I guess you can get away with leaving the mapped-out blocks in the ring buffer and wait for update() to release them, though it's quite expensive in blocks if you set 8 channels, 96kHz, 16-sample blocks...

I'm not sure if the existing logic ever allows offset+=num to be greater than AUDIO_BLOCK_SAMPLES, except in the case where offset itself is greater than AUDIO_BLOCK_SAMPLES, which Never Happens. But defensive programming does dictate that the possibility should be catered for. It needs to be caught early, as in that case avail will be a huge underflowed number, and cause mayhem in USBAudioOutInterface::copy_from_buffer(data, tBIdx, noTransmittedChannels, offset, num);.

In a world with enough time to play, the ring buffer would probably become its own class, keep track of its own state, and deal internally with requests to transfer blocks to a packet (or vice versa) with whatever the Right Answer is (underflowed / not enough data, overflowed and needs pruning to sensible latency, or working as intended).
 
I've treated myself to a fairly cheap 2019 MacBook Pro to play with this stuff - so far it seems to be behaving fine, though I haven't really pushed things to the limits as yet.

One thing I have been doing is messing with emulating the Teensy 3.x setup, in particular the ~44117Hz sample rate. Not that it's necessarily worth supporting, but it's a good metric for moderate differences between host and device sample rates, which usually shows up as glitches eventually. This tend to be sooner with 44117Hz, which aids detection and debugging. I think it will potentially be relevant for externally-clocked systems, e.g. using S/PDIF.

With the existing code this actually works really badly, which I traced to getTransmissionTarget(). Turns out the count and cycleFinished stuff really doesn't like that sample rate, because it takes ages for the cycle to complete, rather than just a few USB frames. I've done a PR which seems to fix this issue, at least for the time being. It may be rendered redundant if the dynamic rate adjustment mentioned as a TODO is implemented, because that should automatically soak up any minor rate differences anyway.
 
Thanks for fixing that. Back then I only tested the common sampling frequencies. For now I merged your pull-request, but I hope that I'll soon find the time to refactor the function and to improve it. Shouldn't be too difficult. Unfortunately, I am still busy with two other projects.
 
Just given it a quick try, seems to be working well apart from generating a huge number of compile-time warnings due to a macro redefinition. I've submitted a PR to fix those.

Setting the sample rate to Teensy 3.x emulation, i.e. 44117.64706Hz, and instrumenting usb_audio_transmit_callback(), I can see that the buffer is staying nicely in sync:
1767994119455.png

Scope is set to infinite persistence, with the green trace showing the target samples output as a pulse length of roughly Nμs, and the blue trace the number of samples left in the buffer after transmission. You can see that it's transmitting either 44 or 45 samples, and the buffer is left with 24-154 samples or so. It's not going to be super-accurate, because I'm just using TeensyTimerTool in one-shot mode; the point is that it's not drifting outside the X1 / X2 cursors, so there's no indication of over- or under-run.
 
Thanks for fixing that. I took the abbreviations for the channel names directly from the USB specification and didn’t notice the warnings. Interestingly, the bottom center and back center channels are both called BC in the specification.
Thanks also for testing the new getTransmissionTarget() function.
 
Thanks for that. Yes, I did take the trouble to download the USB spec just in case there was something helpful on the page (thanks for putting it in the comments, really helpful!). Clearly they didn't think it through. It's a bit ambiguous for humans, as we might confuse Rear for Right, but at least compilers are happy.

Brief update: I've now tried this with my updated S/PDIF module which runs the audio engine and I²S at whatever the incoming data stream sends it, connected to a Teensy 4.0 sending audio at the Teensy 3.x rate. Everything is still working, which is good news. This is still based on using macOS and Audacity; I'll probably give it a go with Windows at some point.
 
Back
Top