How many audio channels can Teensy 4.x stream to a PC over USB?

kdharbert

Well-known member
I have an application requiring PC processing of a set of signals. They are well represented as audio signals and can be processed on a PC using audio software...so, I want to use a Teensy behaving as an audio device to make that happen. I can't tell if Teensy can support outputting more than two channels over USB and can support a lower sampling rate than the default. Is this possible?
 
In theory, at least 8 channels should be achievable. There are threads at https://forum.pjrc.com/index.php?threads/6-channel-line-level-audio-interface-with-teensy-4-1.72887/ and https://forum.pjrc.com/index.php?th...-multi-channel-outputs-not-just-stereo.70176/.

In practice it’s a thorny issue requiring (I believe) some serious wrangling of the USB descriptors and feedback to make it work entirely glitch-free on all platforms. As you will find when you read those threads and follow the links, many people have expressed an interest, a few have made some effort, and no one has finished the job. They mostly just disappear - I’m still here, but freely confess to being unable to understand the USB descriptor stuff well enough to create a valid one for multi-channel audio. I have my suspicions that even the existing 2-channel one isn’t quite right…

If you try any of the modifications linked from the above threads, be aware that errors in the audio streams can take a while to manifest, and be very subtle. I think I got it to a point where it started to drop one sample every 17s, after being flawless for >2 minutes.
 
In theory, at least 8 channels should be achievable. There are threads at https://forum.pjrc.com/index.php?threads/6-channel-line-level-audio-interface-with-teensy-4-1.72887/ and https://forum.pjrc.com/index.php?th...-multi-channel-outputs-not-just-stereo.70176/.

In practice it’s a thorny issue requiring (I believe) some serious wrangling of the USB descriptors and feedback to make it work entirely glitch-free on all platforms. As you will find when you read those threads and follow the links, many people have expressed an interest, a few have made some effort, and no one has finished the job. They mostly just disappear - I’m still here, but freely confess to being unable to understand the USB descriptor stuff well enough to create a valid one for multi-channel audio. I have my suspicions that even the existing 2-channel one isn’t quite right…

If you try any of the modifications linked from the above threads, be aware that errors in the audio streams can take a while to manifest, and be very subtle. I think I got it to a point where it started to drop one sample every 17s, after being flawless for >2 minutes.
Wow. Thanks. Looks like most of the posts were doing multiple input and output. If I only care about sending audio to the PC, am I going to hit the problems you describe?
Any recommends on another unit that might work better as a multichannel usb audio device?
 
Last edited:
You may be OK if you only stream to the PC, I really don’t recall. Also, it may work better on Linux - one person I collaborated with was using Linux, and disappeared, so it may have been working well enough for them at that point.

There’s another recent thread about sending multi-channel audio data via the USB serial - see https://forum.pjrc.com/index.php?threads/number-of-simultaneous-mics-possible.73469/. When it went silent there were issues using a Mac, but maybe not Windows. They could have been related to the use of Python, too.

Not sure about “another unit”; obviously there are lots of audio interfaces about, but they won’t be calibrated, so it rather depends on your “set of signals” as to whether they’d be useful.
 
I have my suspicions that even the existing 2-channel one isn’t quite right…

I'm pretty sure the USB descriptors are good, for bidirectional stereo in asynchronous mode. Async mode means the USB host is responsible for adjusting the data rate to match Teensy's sample rate. Descriptors are the easy part. Handling the data is also pretty simple, at least for cases where we can get USB packets large enough for 1 per millisecond.

But computing the asynchronous rate feedback to transmit to the PC is a tricky matter. It's a classic closed loop control problem. Too rapid can create unstable behavior. Too slow means the PC won't adjust in time to prevent a buffer overrun or underrun. The behavior when an overrun or underrun occurs is also a difficult matter, not handled very gracefully in the current code. An "easy" way of dealing with some of these issues is increasing buffer sizes, but that necessarily means more USB audio latency. A better way would discard or duplicate just 1 sample at or near overflow / underflow, rather than an entire audio block. But the fact remains that asynchronous rate feedback is a hard problem because we just don't know how the USB host really responds to the feedback endpoint packets. I learned the hard way that tuning it well for 1 system means it'll probably behave horribly on the others.

I chose asynchronous mode back in the Teensy 3.0 - 3.2 days, when we were using Cortex-M4 under 100 MHz and trying to enable people to create quite complex audio processing. Having the USB host do the difficult sample rate sync seemed like the only viable option. But in hindsight, it's not so easy because Windows, MacOS, and Linux ALSA, Linux PulseAudio, Linux Jack, Linux {insert latest thing} ... all have differences in timing of their response to the asynchronous rate feedback.

I'm considering changing to adaptive mode, where the USB host communicates data based on its sample rate clock and then the USB device becomes responsible for handing sample rate clock mismatch. While that puts a lot more CPU work on to the Teensy side, we could at least have it consistently implemented and (hopefully) the only variability between USB hosts would be how accurate their sample rate clocks are. Now that we have Cortex-M7 at 600 MHz, this doesn't seem like such a big deal CPU-wise.
 
Last edited:
The other USB audio design decision to be made is explicit versus implicit sample rate feedback. We're using explicit feedback, which seemed like a good trade-off when Teensy had only 12 Mbit/sec USB. With implicit feedback, you must always have streaming in 1 specific direction that ends up just been zeros if you didn't actually want to communicate audio in that direction. Explicit feedback is more bandwidth efficient, where each direction only transmits if actually needed. That seemed pretty desirable for projects that will also want to communicate a lot of data on Serial or MIDI while also streaming audio (probably in only 1 direction).

But now that we have 480 Mbit/sec USB, maybe implicit feedback would make more sense? Implicit would eliminate the need for the extra feedback endpoint.
 
I'm pretty sure the USB descriptors are good, for bidirectional stereo in asynchronous mode.
Well, I'm for sure no expert, but I confess to being confused by e.g. this section of the descriptors (usb_desc.c, starting at line 1595 or so at the time of writing):
Rich (BB code):
    // Standard AS Isochronous Audio Data Endpoint Descriptor
    // USB DCD for Audio Devices 1.0, Section 4.6.1.1, Table 4-20, page 61-62
    9,                     // bLength
    5,                     // bDescriptorType, 5 = ENDPOINT_DESCRIPTOR
    AUDIO_RX_ENDPOINT,            // bEndpointAddress
    0x05,                     // bmAttributes = isochronous, asynchronous
    LSB(AUDIO_RX_SIZE), MSB(AUDIO_RX_SIZE),    // wMaxPacketSize
    4,                     // bInterval, 4 = every 8 micro-frames
    0,                    // bRefresh
    AUDIO_SYNC_ENDPOINT | 0x80,        // bSynchAddress
    // Class-Specific AS Isochronous Audio Data Endpoint Descriptor
    // USB DCD for Audio Devices 1.0, Section 4.6.1.2, Table 4-21, page 62-63
    7,                      // bLength
    0x25,                      // bDescriptorType, 0x25 = CS_ENDPOINT
    1,                      // bDescriptorSubtype, 1 = EP_GENERAL
    0x00,                      // bmAttributes
    0,                      // bLockDelayUnits, 1 = ms
    0x00, 0x00,                  // wLockDelay
So ... the comments say, and the format appears to bear out, that this is Audio DCD v1.0. But this version of an Endpoint descriptor doesn't deal in micro-frames, just Full Speed frames of 1ms, and bInterval "must be set to 1" according to DCD v1.0. If it were a v2.0 endpoint descriptor, the bLength would be 7, not 9, and a bInterval of 4 would indeed mean 1ms. And then in the class-specific descriptor (also the wrong length for DCD v2.0), the comment for bLockDelayUnits says 1 = ms (correct for DCD v1.0), but the number in there is 0. Probably irrelevant, as wLockDelay is 0.

I tried using USB Device Tree v3.8.2 to decode the descriptors, and it reports a bInterval of 4ms for the High Speed configuration.
 
The other USB audio design decision to be made is explicit versus implicit sample rate feedback.
It would be great to be able to palm the responsibility off onto the host, which is highly likely to have more computing grunt than the Teensy. But as you note, and as seemed to be the case when I was collaborating with @mcginty , the results are very variable between OSes. I'm using Windows 10, he was using Linux; I was unhappy, he was either happy or lost interest... If the Teensy has to do the sync to make things reliable then I guess we have to live with it. Either way, it seems apparent that really thorough testing is vital, because some imperfections are incredibly subtle - dropping a single sample is usually super-hard to spot. Sample-accurate transfer would be the ideal, especially for analytical use (the OP's project?).
 
For real time streaming (over USB or any other digital channel), sample accurate transfer depends upon transmitter and receiver sharing the same sample rate clock. We don't get synchronous sample rate clock with USB audio (the spec does have an option for in sync with the USB bitstream clock, but Teensy's hardware can't support USB and audio clocks in sync with each other).

Yes, we're using USB Audio 1.0. When I wrote Teensy's original USB audio code in 2016, most Windows users were still on version 7, having refused to upgrade to Windows 8 and still feeling cautious about the new Windows 10 (the first version with built in support for USB Audio 2.0). Now Windows 7 & 8 are far enough in the past that considering using USB Audio 2.0 on Teensy 4.x probably makes sense, but using USB Audio 2.0 back then would have meant incompatibility for the majority of Windows users. All Teensy core library releases since 2016 have used basically the same USB audio code, of course with minor edits to use 4 byte asynchronous feedback at high speed rather than the 3 byte format of full speed.

Indeed USB Audio 1.0 says "Must be set to 1" for bInterval, on page 61. But it was published in March 1998. It also says "The standard AS isochronous audio data endpoint descriptor is identical to the standard endpoint descriptor defined in Section 9.6.4, “Endpoint,” of the USB Specification..." (which at the time was USB 1.0, not even yet USB 1.1). The USB 2.0 spec was published 2 years later, in April 2000. The point is bInterval is really defined by the USB spec and USB Audio 1.0 is just repeating this info (from the original USB 1.0 spec). The meaning values for bInterval differ between 12 & 480 speed. I believe it's not a huge stretch of imagination to believe bInterval "Must be set to 1" written in 1998 today applies only to 12 Mbit speed.

I tried using USB Device Tree v3.8.2 to decode the descriptors, and it reports a bInterval of 4ms for the High Speed configuration.

This must be a bug in USB Device Tree v3.8.2. The meaning of bInterval for high speed is documented on page 271 of the USB 2.0 spec and specifically says a value of 4 means 8 microframes. It must be 1ms, because the code is all written for 1ms interval. You just wouldn't get usable USB audio on Teensy 4 if the actual interval were anything other than 1ms.

1703315824295.png
 
Last edited:
Back
Top