USB interface for multi channel outputs, not just stereo

Hi all!
I was having issues with the original teensy handling of audio. The input and output were not keeping sync. So I got to this thread and alex6679's code.
That implementation works!
I do believe the original teensy audio has some issue with the way the supposed async is implemented for input + output.
Having fixed the issue with the poor sync I continued with other things...

I was trying to stream teensy's ADC along with I2S, but found that I would have issues with the ADC channel dropping samples (showing 128 zeros, followed by an overshot of values). See the image. Audio shield line out 1 was tied to ADC input pin via a resistor and to line in 0 via another resistor (that is the reason for the difference in signal scales, please don't mind it). The signal frequency was set at 44.1kHz/128.
I noticed that the issue would be exacerbated at lower CPU speeds (I actually don't see the issue when running overclocked @816MHz) and seeing that alex6679's code uses up more CPU time I had a look for bottlenecks.

Here are some benchmarks. All use teensy 4.0, audio shield, 600Mhz CPU speed, 2ch, 44.1kHz:
Running the original teensy audio and only using I2S
AudioConnection patchCord1(i2sIN, 0, usbOUT, 0);
AudioConnection patchCord2(i2sIN, 1, usbOUT, 1);
AudioConnection patchCord3(usbIN, 0, i2sOUT, 0);
AudioConnection patchCord4(usbIN, 1, i2sOUT, 1);
Using:
AudioProcessorUsageMax()
I get 0.04%.
Running the same using alex6679's code I get 3.20%.

Running the original teensy audio and streaming the ADC on 2 ch instead of i2sIN
AudioConnection patchCord2(adcIN, 0, usbOUT, 1);
I get 1.96%. And apparently no drops in ADC.
Running the same using alex6679's code I get 4.85%. And I get the issue (image above).

Full test code. Use #define USEADC to test with ADC instead of I2S:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

#define USEADC //toggle ADC vs I2S input

AudioInputI2S i2sIN;
AudioInputUSB usbIN;
AudioOutputI2S i2sOUT;
AudioOutputUSB usbOUT;
AudioConnection patchCord1(i2sIN, 0, usbOUT, 0);
#ifdef USEADC
AudioInputAnalog adcIN;
AudioConnection patchCord2(adcIN, 0, usbOUT, 1);
#else
AudioConnection patchCord2(i2sIN, 1, usbOUT, 1);
#endif
AudioConnection patchCord3(usbIN, 0, i2sOUT, 0);
AudioConnection patchCord4(usbIN, 1, i2sOUT, 1);
AudioControlSGTL5000 sgtl5000_1;

void setup() {
AudioMemory(20); //enough for all tests
sgtl5000_1.enable();
sgtl5000_1.volume(0.15);
}

unsigned long counter = 0;
void loop() {
if(millis() - counter > 1000){
Serial.println();
Serial.print("CPU:");
Serial.print(AudioProcessorUsage());
Serial.print(",");
Serial.print(AudioProcessorUsageMax());
Serial.print(" ");
Serial.print("Memory: ");
Serial.print(AudioMemoryUsage());
Serial.print(",");
Serial.println(AudioMemoryUsageMax());
counter = millis();
}
}

The AudioProcessorUsageMax() still has a lot of headroom, but still... I get issues with ADC.

I am not fluent in c/c++, nor this code base...
I did find out that the smoothing of the update call times is taking up a lot of cycles! I assume the following lines are doing some sort of rolling average?
/usb_audio_interface.cpp in USBAudioInInterface::update:
History<50> historyUpdate = _lastCallUpdate.getHistory();
_updateCurrentSmoothPending = _lastCallUpdate.getLastCall<20>(historyUpdate, blockDuration*F_CPU_ACTUAL);
and USBAudioOutInterface::update:
History<50> historyUpdate = _lastCallUpdate.getHistory();
double updateCurrentSmooth = _lastCallUpdate.getLastCall<20>(historyUpdate, blockDuration*F_CPU_ACTUAL);
Reducing the history from 20 to 5 improves performance (I2S test case; from 3.20% to 1.50%)!

For reference, replacing History with current values:
_updateCurrentSmoothPending= t;
and
double updateCurrentSmooth = clockCount;
improves from 3.20% to 0.37%! So this History is a major burden.

Removing "History" makes the problem almost go away, but not fully.
I assume the issue is with disabling interrupts or handling them in time? So I think the code inside the disable_irq is more critical!?

As I didn't write the code base and am not familiar with it I wanted to ask you @alex6679 if it would be possible to replace the smoothing with a simpler approach:
weight = 0.2 //chose higher or lower for more or less smoothing
smooth_val = smooth_val * weight + new_val * (1 - weight)
 
Hi,
thank for your feedback. I can try to make the History more efficient. I implemented it to estimate the number of buffered samples at my spdif resampling. Here it was crucial to get a very accurate estimation of the last calls of the 'update' as well as the 'isr' functions. Maybe that accuracy is not needed for the usb feedback.
I am currently really busy, but I try to find some time the next few weeks. If you are motivated, you can do some experiments and test different methods of getting accurate estimates of times when 'usb_audio_receive_callback()' and 'update' where called.
You can use my example https://github.com/alex6679/teensy-4-usbAudio/blob/main/src/main_usbInput.ino to evaluate different methods:

- Activate PLOT_REQUEST_FRREQ to see what sample frequency the Teensy requests from the host. That frequency should of course be quite stable.
- Activate PLOT_BUFFER to see the estimated number of samples in the buffer. The smoothed samples plot should of course be also quite constant.
 
I believe I managed to implement a different filtering option.
I have to do some more testing and will report back.

Btw, small correction. The value for original code, only I2S in mi post above is 0.12% and not 0.04%
 
So I did some reworking of Alex's code.
At this time I don't precisely know how the rest of the code runs. I just worked on the filtering part and did testing.
Changes:
  • I removed the whole LastCall from the code and replaced it with an exponential moving average (aka: now=before*weight+now*(1-weight)). Before, the filtering was done on the increasing and overflowing time, but I rewrote it so that filtering is performed on durations, which was simpler and made more sense. I made it so that I can set different filter weights for last Received and last Updated durations.
  • I identified a 'rush' condition in USBAudioInInterface::update, where I believe that an interrupt could be called between the time of uint32_t clockCount = ARM_DWT_CYCCNT; and the calculations for lastCallReceiveIsr ... an update of lastCallReceive could come up, making it more recent than _lastCallUpdate -> causing an underflow and therefore an outlier in the calculated times since last call. I believe this is part of the reason for the filtering and particularly the outlier detection before. I noticed this after instrumenting the code. This was after already reworking a lot of the code, but I believe it was an issue in the code already. So I fixed it in my code by shifting it under disable_irq and I don't detect it anymore. Before it would take tens of seconds for it to occur, now I can't detect it after 20+min.

I am attaching examples of the instrumented raw and filtered "time since" plots.

durations 1ms sampling a0b0 (raw).png
durations 1ms sampling a0b0.95.pngdurations 1ms sampling a0.95b0.png

My rework seems to run ok. I tested for varying weights, but it seems to run even with no filtering (using the most recent value).
I am also attaching the plots of REQUEST_FRREQ over time (when playing 2ch, 44.1kHz, teensy audio adapter, 128 samples). I forget the actual plotting period (It took minutes to form each graph), but all graphs had the same plotting period.
For some reason the code before would sometimes start REQUEST_FRREQ low. But only after uploading the code and running for the first time. Otherwise it would start normally... I didn't come across this with my changes.

del req_freq alex.png


del req_freq ab0.9.png


del req_freq 0.png


Observations:
  • Smoothing TimeSinceLastReceived makes the REQUEST_FRREQ jagged! It is better raw = most recent!
  • I didn't see an effect when smoothing timeSinceLastUpdate.
  • I think buffers were fine

The max processing time using my code has gone down from 3.2+% to 0.16% (original Paul's implementation ~0.12%). This is with or without filtering as it is a very simple calculation. I didn't actually pay a lot of attention to optimization.
I am just about able to handle ADC as well (at 600MHz setting).

@alex6679 let me know if you would like me to share the code. You can also play with weights.

Some other things to note for others who might be playing along. I had issues switching between paul's and alex's implementation due to changes to the usb descriptor and windows stubbornly not detecting them. This is known. But at some point I had to go change Product ID to fake values to see the changes.
Initially I was a bit confused with how to run the main_usbInput.ino. Obviously I put it into an arduino required folder along with needed files, but it took me a bit to figure that music needs to be streamed to the teensy :)

I am testing the teensy with different Windows APIs and I am now running into an issue using WASAPI with alex's code, that I am not experiencing with paul's. I will probably come back to write it up here.
 
Hi, thanks for investigating the code.
Some remarks from my side:
Before, the filtering was done on the increasing and overflowing time, but I rewrote it so that filtering is performed on durations, which was simpler and made more sense. I made it so that I can set different filter weights for last Received and last Updated durations.
Just filtering the time doesn't make sense of course. At my first attempts I also filtered durations. Back then I used a second order filter instead of a exponential smoothing. But in the end the question is: How to get an accurate time from a filtered duration? Also, it didn't work that well at suppressing outliers (I describe below, what I mean by outlier).
I identified a 'rush' condition in USBAudioInInterface::update, where I believe that an interrupt could be called between the time of uint32_t clockCount = ARM_DWT_CYCCNT; and the calculations for lastCallReceiveIsr ... an update of lastCallReceive could come up, making it more recent than _lastCallUpdate -> causing an underflow and therefore an outlier in the calculated times since last call.
I tried to keep the section at which isrs are disabled as short as possible. In principle it shouldn't be a problem if the 'usb samples receive'- interrupt is called shortly after the time in update is measured. In that case the smoothed number of samples in the buffer should automatically be decreased (lastCallReceiveIsr > updateCurrentSmooth). But maybe there is a bug in the code. I'll have a look at that.
In general: I introduced the outlier detection because sometimes there are interrupts with a higher priority than 'update' or 'usb_audio_receive_callback'. Normally e.g. update is called every 2.9ms. If a higher priority isr is called just before update, then this update call is a bit later/ is an outlier. Then the algorithm computes too many samples for the smoothed buffer.

@moti7j If you share your code, I would have a closer look at it and test it.
 
I did not consider the outliers being caused by other interrupts causing delays 🤔!
I believe I experienced about 7us of delays due to interrupts in my tests of basic audio operations, but this could probably be much higher with added signal processing. So this is probably a valid point!

I should say that I find the code ran functionally before and I am not criticizing anything, just looking for potential improvements and explaining myself :)

I am attaching the modified usb_audio_interface.cpp and usb_audio_interface.h. It is not the cleanest and as I said, I don't have a full understanding of all the operations. I just tried to naively rework parts. I tried keeping most things unchanged. I did rename a couple variables to remove 'smoothing' from their names, if they are no longer smooth. I can't guarantee I did that everywhere.
I also made a small change that prevents initialization being counted as an underrun (rxUsb_audio_underrun_count++;).

Look for //// outlier detection could be added here for a possibility to add outlier detection.

I've also left the code instrumentation and a function to set weights set_smoothing_val (look for ///--instrumentation--/// in .h and .cpp).
Attached is also an .ino example file for altering weights on the fly (over serial) and plotting the instrumented values. You can send e.g.:
  • "a0.95" to set the smoothing_weight4Update to 0.95 and
  • "b0.95" to set the smoothing_weight4Receive respectfully.
values:
  • 0 disables filtering => most recent value
  • 0<val<1 sets filtering (higher value = more filtering, use 0.5 or more, 0.95=1-1/20~=history<20>)
  • 1 disables filtering and instead uses expected values.
Arduino IDE 1.8.19 fails to send serial commands via the serial plotter when the teensy is used as "Audio" => emulated serial. You can either use IDE 2. or set USB type to "Serial + MIDI + Audio" and then switch to the serial COM port for the serial plotter.
 

Attachments

  • usb_audio_interface.cpp
    41.6 KB · Views: 8
  • usb_audio_interface.h
    10 KB · Views: 9
  • instrumentation.ino
    5 KB · Views: 6
Just a "side question" if I may: did anybody consider writing dedicated ASIO low latency driver for Teensy audio?
 
Thanks for the files. I’ll take a look at them as soon as I have some spare time.

I think I should have explained how the LastCall algorithm works before you started your evaluation. Unfortunately, I’m currently very busy and didn’t manage to find the time during the week.

Regarding the smoothing of durations and times:
I don’t think that smoothing or applying a low-pass filter to the timestamps at which, for example, the update or transmit interrupts are called makes sense, since these times should (obviously) increase linearly. It also doesn’t help to filter the duration between the transmit ISR and the update calls, because we don’t expect that duration to be constant. We expect it to look like a sawtooth signal.

At first, I tried filtering the duration between the update and the transmit ISR calls, because that duration should always be constant. However, these attempts weren’t very successful. In the end, I implemented the LastCall algorithm, which works a bit differently:
  1. We know the expected duration between the update calls (~2.9 ms), and we can use it to generate a “perfect” time sequence:
    0 ms, 1 × 2.9 ms, 2 × 2.9 ms, …, N × 2.9 ms.
  2. We then fit this perfect sequence to the last N samples of the measured sequence. Fitting is quite simple here: we just align the centers of mass of both sequences.
  3. Next, we remove outliers—i.e., the measured samples that deviate the most from the fitted perfect sequence.
  4. We perform a second fit, this time excluding the outliers and the corresponding samples of the perfect sequence.
  5. Finally, we use the latest sample of the perfect/fitted sequence as the time of the last update call.

This approach is quite different from simply filtering the calls.

Let’s assume there is no noise and no outliers in the time sequence of update calls. The LastCall algorithm would then simply return the measured time of the last call.
If we just apply exponential smoothing, the result is always too far in the past:
smooth_val = smooth_val * weight + new_val * (1 - weight) < new_val
Also, applying a low-pass filter to the duration between the update and transmit ISR calls does not help either. It would just smooth out the expected sawtooth signal.

Anyway, I should revisit the algorithm. Your “before” plot of the current algorithm looks a bit strange. Also, there’s definitely room to improve the algorithm’s performance and make it faster.

Maybe it’s more efficient to work directly with the duration between the update and transmit ISR calls, as you suggested. Instead of using a standard low-pass filter, we could, for example, fit a linear function to the unwrapped sawtooth signal. Or, without unwrapping, we could fit a sawtooth function directly to the signal. Based on the expected durations between consecutive interrupt calls, we already know what this sawtooth should look like and we just need to find the time offset.

What do you think about that idea?
 
@tomas Not really. I use the Teensy in my hi-fi system, and low latency isn’t important for my application. Also, I’m already struggling to get the interface working with the standard driver. I really don’t need any additional challenges right now. 😅
 
@alex6679, just a quick observation relating to another thread started where the user may want to use USB, and does state a low-latency requirement ...

I had a quick play with your latest updated code (knowing full well that the existing USB audio code fails horribly with anything other than 128-sample audio blocks...), and I'm finding I can use 64-sample blocks but nothing smaller. I think 16 samples is the limit before other library objects start to break, though 8 may be achievable (?). Looking at your code it appears to allocate ring buffers with a size computed based on the block size, so it ought to work OK. I have given myself enough AudioMemory, so pretty sure that's not the issue.

I did take a look at usb_audio_transmit_callback() and can't see anything super-obvious wrong with it, though it may be relevant that a 64-sample block will always be enough to allow a USB transmission, whereas it'll take multiple blocks if they're smaller. That's assuming the 44- or 45-sample transmissions for 44.1kHz sample rates - the numbers will obviously change depending on the exact configuration.
 
Ok, interesting. As you noticed, I didn't assume that a block will always have 128 samples at the implementation. But I also didn't put too much thought in keeping it working for small block sizes. (and I never tested smaller block sizes)
I did take a look at usb_audio_transmit_callback() and can't see anything super-obvious wrong with it, though it may be relevant that a 64-sample block will always be enough to allow a USB transmission, whereas it'll take multiple blocks if they're smaller.
That might be the problem. I'll have a look at it. I assume that it shouldn't be too difficult to fix.
Did you test both directions? Receiving and sending audio to/from an usb host? Does the usb audio input also not work for smaller block sizes?
 
I didn’t test input, but I can later today.

I’ve taken another look, and am wondering if the way you deal with underflow might be the issue. If there isn’t enough audio data for the USB packet, everything gets dumped, whereas maybe the answer is to fill the packet with silence plus the available data. That empties the ring buffer, but it’s likely that there will be enough audio data available on the next USB update.
 
OK, input tested; 64-sample blocks work, 32 gives a sort of fizzing sound, 16 is silent. Just using the HardwareTesting/PassThroughUSB example, with the audio blocks bumped to 70, and selecting the Teensy as the PC's audio output. That was Windows 11, for what it's worth; output testing was Windows 10. I don't have a Mac, but could drag out the Linux box if needed.
 
Thanks for testing. I think I should first fix it for Windows 11. I assume it will then also work for other operating systems.
If there isn’t enough audio data for the USB packet, everything gets dumped, whereas maybe the answer is to fill the packet with silence plus the available data.
I'll have a look at that.
 
@alex6679 I must admit I am a bit confused about the callbacks and the update, so I have a hard time giving an opinion.
I was expecting the usb callbacks to be called every 1ms and the updates every 128/44100=2.9ms, so I was expecting ~3 steps in the saw tooth plot, but there are more!? So that is why I say I don't fully understand it all.

I ended up only trying to replace the processing bottleneck.
It seemed to me that the LastCall aligning the centers of mass is effectively filtering with a rolling average, but compensating for the introduced lag of such averaging by effectively doing the filtering on the centers of mass.
There are places that use getLastCall<2> which to me seemed like minimal filtering? or was this just for outlier detection?
It seemed to me that the smooth time is used only for the calculation of durations, which would effectively mean that the durations are filtered. I thought that this was being done to get a filtered reqested frequency, so it felt to me that it might need to be smooth, but I think I might be wrong here. If you give me some more pointers I might be able to better understand how these durations are used.

I understand that some variability of times/durations might be introduced by other interrupts, but I don't realy understand if those should be filtered out. If yes, would it make more sense to make predictions based on previous buffers - nr. of samples and not time durations? Again, appologies, I don't fully understand it all.

If it helps, I will say though that from my practical testing the audio was still stable despite some (undesirable) filtering being applied. So to me it indicates that it is perhaps not very sensitive to variations.

I am still experiencing issues with WASAPI that I don't experience otherwise. But only when ran in exclusive mode. I'll write it up, hopefully tomorrow.
 
I was expecting the usb callbacks to be called every 1ms and the updates every 128/44100=2.9ms, so I was expecting ~3 steps in the saw tooth plot, but there are more!? So that is why I say I don't fully understand it all.
I didn't have a look at the code, but is the function we see in the saw tooth plot a downsampled/ aliased version of the original saw tooth? That would explain your observation.
If it helps, I will say though that from my practical testing the audio was still stable despite some (undesirable) filtering being applied. So to me it indicates that it is perhaps not very sensitive to variations.
That might be. Since it depends on the host and we can't test it for all hosts, I would still try to get an as-good-as-possible estimation of the currently buffered samples and thereby a smooth/ accurate feedback to the host.
Also, I don't know how the varying requested frequency is handled by the host. I guess in some cases the signal is resampled to match the currently requested frequency. In that case the higher the the variation of the feedback is the more distorted the signal gets. We therefore want to keep the feedback as stable as possible.
I am still experiencing issues with WASAPI that I don't experience otherwise. But only when ran in exclusive mode. I'll write it up, hopefully tomorrow.
Maybe you can focus on the WASAPI problem and I can try to make filtering of the duration/ the stuff in the LasCall class more efficient?
 
I didn't have a look at the code, but is the function we see in the saw tooth plot a downsampled/ aliased version of the original saw tooth? That would explain your observation.
The sampling should be 1ms, so not aliased. Having a second look I see the 3 steps in the saw tooth (the rapid one), but I still have to better understand the second saw rate. I will try to find time to understand it.

Maybe you can focus on the WASAPI problem and I can try to make filtering of the duration/ the stuff in the LasCall class more efficient?
I tried a lot of things with WASAPI but I keep getting choppy data and I am a bit lost :(
@alex6679:
  1. Could you help me with how to switch back to usb1, so that I can test that? Does it involve changes to the usb descriptor?
  2. I don't know if the chopping comes from the PC or the Teensy side... Could you point me a bit in the way of instrumenting the code to check for issues, cause I have been looking at the input and output .getStatus() and I don't see issues with usb_audio_overrun_count, usb_audio_underrun_count, num_skipped_Samples, num_send_one_more, num_send_one_less

Here is a minimal reconstruction of the exclusive WASAPI issue...
Using a simple USB pass through:
C:
// use e.g. USB Type: "Audio"
#include <Audio.h>
AudioInputUSB            usbIN;
AudioOutputUSB           usbOUT;
AudioConnection          patchCord1(usbIN, 0, usbOUT, 0); //USB pass through
AudioConnection          patchCord2(usbIN, 1, usbOUT, 1); //USB pass through
void setup() {
  AudioMemory(20);
}
void loop(){}

And python sounddevice for playing and recording:
Python:
import sounddevice as sd
import numpy as np
import matplotlib.pyplot as plt

duration = 0.5 # seconds
note_frequency=200 # Hz
VOLUME = 0.1 #0-1
use_api = 'WASAPI' #<- you can try using other APIs, like; 'MME', 'WDM-KS', 'DirectSound', 'WASAPI'

extra = sd.WasapiSettings(exclusive=True) if 'WASAPI' in use_api else None # using exclusive mode, when setting WASAPI, else None

# Find Teensy (Input and Output) in the WASAPI devices:
input_teensyWASAPI_id=None
output_teensyWASAPI_id=None
for api in sd.query_hostapis():
    if use_api in api['name']:
        for dev_id in api['devices']:
            device=sd.query_devices(dev_id)
            if 'Teensy' in device['name']:
                if device['max_input_channels']:
                    input_teensyWASAPI_id=dev_id
                if device['max_output_channels']:
                    output_teensyWASAPI_id=dev_id
if input_teensyWASAPI_id is None or output_teensyWASAPI_id is None:
    raise Exception("Teensy WASAPI input&output was not found!")

# Generate the waveform for playback:
sample_rate = sd.query_devices(input_teensyWASAPI_id)['default_samplerate'] # get the teensy's sampling rate
t = np.arange(int(sample_rate*duration))/sample_rate
outdata = (np.sin(2*np.pi*note_frequency*t)*(2**15*VOLUME)).astype(np.int16)

# Play and Record audio for the duration. Single channel.
input_data = sd.playrec(outdata,channels=1, samplerate=sample_rate, dtype=np.int16, blocking=True,\
                        device=(input_teensyWASAPI_id,output_teensyWASAPI_id), extra_settings = extra)

# Plot the input (recorded) and output (played) signals:
plt.figure(figsize=(12, 2.8))
plt.plot(input_data, label='recorded') # use plt.plot(t,input_data) for scale in seconds
plt.title("Played and recorded Audio (%s)"%(('exclusive ' if extra is not None else '') + use_api))
plt.plot(outdata, label='played') # use plt.plot(t,input_data) for scale in seconds
#plt.xlabel("Time [s]")
#plt.xlim(1500,5000) # zoom-in x axis fro WASAPI. Shift to higher for other APIs. Remove if plotting time!
plt.ylabel("Amplitude")
plt.legend(loc='upper right')
plt.tight_layout()
plt.show()
This python code automatically looks for a teensy; WASAPI and sets it to exclusive mode, but you can change use_api to test with other APIs, etc.

I get this returned from the teensy:
exclusive wasapi issue_nonstart.png

Old segments seem to get inserted later on and the length of these chops seems to vary.

I don't get this issue with other APIs or with non-exclusive mode. And the original teensy code doesn't get these chops either, so something changed.
If someone has ideas to test exclusive WASAPI some other way / library, that could be a useful check as well. I tried using audacity but failed to get exclusive control.
 
Unfortunately, I am currently really busy with other stuff. I guess, I can start having a look at the usb audio issues next week end (not the upcoming one). I think I will first try to solve the problems related to small block sizes, then we need to improve the perfomance of 'LastCall'. After that I can try to help you with your wasapi problems.
Btw. I wonder how your minimal c++ usb audio passthrough example can work. There is no audio object that triggers 'update'. Does that example work if you don't use WASAPI?
 
Recent versions (1.58 onwards) of Teensyduino spot the lack of an object with “update responsibility“ and put in an IntervalTimer to trigger updates.

Has the documentation been updated to match? Guess…
 
Recent versions (1.58 onwards) of Teensyduino spot the lack of an object with “update responsibility“ and put in an IntervalTimer to trigger updates.

Has the documentation been updated to match? Guess…
It's not particularly good at it though since IntervalTimer doesn't have sufficient resolution...
 
Well, it’s better than the old Teensy 3.x which could only manage 44117.647Hz or so. The PIT has a 24MHz clock, which I think means the sample rate ends up being about 44100.547Hz. Of course, that means that AUDIO_SAMPLE_RATE_EXACT is not exact, which could result in problems.
 
Well, it’s better than the old Teensy 3.x which could only manage 44117.647Hz or so. The PIT has a 24MHz clock, which I think means the sample rate ends up being about 44100.547Hz. Of course, that means that AUDIO_SAMPLE_RATE_EXACT is not exact, which could result in problems.
The sample rate is whatever the device on the other end of the USB connection decides to use. The problem is the audio updates don't occur precisely when they should which leads to underruns/dropouts in the USB audio stream... If the default sample rate was 48KHz things would be a lot simpler.
 
I defer to @alex6679 for the fine detail, but I'm pretty sure it's the Teensy which dictates the sample rate, leaving the host to do any resampling required to match its internal clock. This is one of the major contributions he's made, getting the feedback vastly improved so clicks and dropouts are essentially eliminated. If the Teensy can't dictate the sample rate, T3.x would never have worked, because it's forced to generate an extra 17.647 samples per second...

I agree, it's a challenge to compute the mismatch between Teensy and host, because of the somewhat variable interrupt latency. In an ideal world every interrupt source would come with a register which latches the ARM_DWT_CYCCNT value at the moment the request was asserted. We could then be 100% sure about the DMA interrupts which trigger audio updates, and somewhat sure about the 1ms USB interrupts (don't know if those are guaranteed super accurate from the host...). One small improvement that could be made is to modify AudioStream::update_all() to store a copy of ARM_DWT_CYCCNT, which will happen when the high priority ISR fires for the object with update responsibility, rather than relying on the much more variable value that will be available when the low priority software_isr() chains down all the audio objects. Still not perfect, but better.

As it is, we're forced to estimate as best we can. Judging from the above discussions (I've not looked closely at the code), that is costing a bit of CPU time, which might perhaps be improved on, but it's pretty effective in keeping things audibly OK. (I don't know what the WASAPI driver might be doing to break that - guessing it's a separate / specialist issue.) Obviously with uncertainty comes the requirement to keep "enough" samples buffered to cope with an "unexpected" request for more data - talking here from the Teensy to host point of view, the general argument can obviously be inverted :) . As a request every 1ms only needs 44 or 45 samples, the buffer needn't be huge; for the new code, it's done with a ring buffer of audio blocks long enough for 3.6ms+3 blocks, which is pretty unlikely ever to be fully used.
 
OK, so in the Interests Of Science (or at least, vaguely reproducible results), I've done a bit of testing.
  • Teensy 4.0 or Teensy 3.2
  • Current "master" AudioStream sources, or alex6679's modified ones
    • slightly modified to toggle a pin on every update request the trigger, not the low-priority software ISR
  • various sample rates in the region of 44100Hz
  • with and without a "proper" update rate controller (I²S)
  • Windows 11 laptop
  • record Teensy USB output of 441Hz signal for 20s using GoldWave
Teensy 3.2 can of course only do 44117Hz; I can only test with the "master" branch as AFAIK alex6679 hasn't ported his updates to Teensy 3.x, and I can't see it being worth the effort since they're no longer available.

To find glitches, I applied a band-stop filter around 441Hz, then maximised the volume. They're often quite hard to hear, though they show up very well on a spectrogram.

Results:
  • Master branch
    • Teensy 3.2 with I²S timing gives glitches at ~0.2s intervals
    • Teensy 4.0 with IntervalTimer timing bodged to 44117Hz gives glitches at ~0.2s intervals
    • Teensy 4.0 with IntervalTimer timing bodged to 44101Hz gives glitches at ~2.5s intervals (a bit variable)
    • Teensy 4.0 with I²S timing gives no apparent glitches, but maybe 20s isn't long enough
  • alex6679 branch - Teensy 4.0 only, as previously noted
    • no glitches at 44100Hz with either timing source, or at 44117Hz
Using a frequency meter (uncalibrated), I got a toggle frequency of 172.26407Hz using I²S timing, which calculates out to 44099.604Hz (-9ppm). Interestingly the IntervalTimer-controlled run at 44100Hz didn't change the sample frequency all that much - 44099.523Hz. There must be an error in my calculation above.

Typical glitch (top channel) and filtered / maximised "finder" signal (bottom channel):
1764964037799.png


Test sketch:
C++:
#include <Audio.h>

// GUItool: begin automatically generated code
AudioSynthWaveform       wav;      //xy=1294,387
AudioRecordQueue         queue;         //xy=1452,289
//AudioOutputI2S           i2sOut;           //xy=1506,364
AudioOutputUSB           usbOut;           //xy=1531,423

//AudioConnection          patchCord1(wav, 0, i2sOut, 0);
//AudioConnection          patchCord2(wav, 0, i2sOut, 1);
AudioConnection          patchCord3(wav, 0, usbOut, 0);
AudioConnection          patchCord4(wav, 0, usbOut, 1);
AudioConnection          patchCord5(wav, queue);

// GUItool: end automatically generated code

bool led = true;
int count;
void setup()
{
  AudioMemory(40);
  pinMode(LED_BUILTIN,OUTPUT);
  digitalWrite(LED_BUILTIN,led);

  wav.begin(1.0f,441.0f,WAVEFORM_SINE);
  queue.begin();

  while (!Serial)
    ;
  uint32_t cycles = IMXRT_PIT_CHANNELS->LDVAL;
  if (0 == cycles)
    Serial.printf("Using I²S timing at %.2fHz\n",AUDIO_SAMPLE_RATE_EXACT);
  else   
    Serial.printf("Using IntervalTimer at %.2fHz (%d cycles)\n",128*24'000'000.0f/cycles,cycles);
}

void loop()
{
  if (queue.available())
  {
    count++;
    queue.readBuffer();
    queue.freeBuffer();
    if (count > 100)
    {
      count = 0;
      led = !led;
      digitalWrite(LED_BUILTIN,led); 
    }
  }

}
 
Back
Top