Audio DSP and more

AaronD

New member
If I could do this in an 8-bit chip, I probably would, since I know how they work and how to use them: just bit-bang a handful of registers, and the peripherals start doing what I want. Data "magically" shows up in a specific register; and I can get an interrupt for that or poll the flag, depending on priority; and whatever I write to that register goes out on the wire. But that's apparently not how the 32-bit world works, if I'm to believe the complexity of everyone's SDK's even for basic things like bit-banging GPIO.

Using a Teensy 4.1:
  • I've looked at the Arduino framework and IDE, with the idea of modifying it to fit what I need, but I see a lot of moving parts that all have to work together and I'm having a hard time keeping track of them all.
  • I've looked at PlatformIO in VS Code, with its own version of the Arduino framework, and it's not much better. It does give me the main() function at least, so that's a plus!
  • I've looked at NXP's CMSIS, without Arduino at all, but their license makes me wonder who really owns my project.

How do I get from my 8-bit mentality to a working boilerplate for this project?

I have some pretty tight requirements that I think are possible in principle, but probably not with the standard libraries as they are.

Code:
            +-------------------------------------------+
            | UART                                      |
+-----+     | - Debug messages during early development |
| PC  |<--->| - 115200 8N1                              |
|     |     +-------------------------------------------+
|     |
|     |     +----------------------------------------+
|     |<--->| USB                                    |
+-----+     |  +---------------------------+         |      +-----------------+
            |  | Serial                    |<-------------->| Other functions |
            |  | - Debug messages          |         |      +-----------------+
            |  | - Manage DSP coefficients |         |
            |  | - Control other functions |<-------------------+
            |  +---------------------------+         |          |
            |                                        |          V
            |  +----------------------------------+  |      +------------------------------------------------+     +--------------------------------+
            |  | Audio                            |  |      | DSP                                            |     | TDM Audio CODEC                |
            |  | - 8 channels each direction      |<------->| - No buffer: single sample all the way through |<--->| - 8 channels each direction    |
            |  | - 32-bit integer or float, 96kHz |  |      | - 32-bit fixed-point or float, 96kHz           |     | - 24- or 32-bit integer, 96kHz |
            |  +----------------------------------+  |      +------------------------------------------------+     +--------------------------------+
            |                                        |
            +----------------------------------------+

The primary purpose is to undo the acoustic lowpass through a physical barrier that can't be removed. So there's a mic and a speaker with not much distance between them, but a fair amount of attenuation that is also frequency-dependent. I figure that the entire system needs to have about 200us (microseconds) of latency, analog-to-analog, including the DSP. To make that work, I think I have to:
  1. Run the CODEC at 96kHz or faster, both to have less time between samples and to allow a less aggressive anti-aliasing filter that itself has fewer samples of latency.
  2. Run the newest single sample all the way through the DSP, and send it out immediately. No buffer. This essentially creates a 3-sample pipeline, ignoring a handful-of-samples delay as part of the DSP chain, to fine-tune the 200us target:
Code:
 In  | 0 | 1 | 2 | 3 | 4 | 5 | ...
DSP  | - | 0 | 1 | 2 | 3 | 4 | ...
Out  | - | - | 0 | 1 | 2 | 3 | ...
The USB connection does not require low latency. It's mostly to record some useful taps in the DSP chain to analyze later, and to play back a signal to various points for better testing. Nothing live there. So USB can operate normally, except that it needs more bits, more samples, and more channels than the Teensy Audio Library provides. And it needs to interface with the single-sample-NOW, DSP code.

The obvious follow-on effect (to me at least) is that absolutely everything takes a back seat to the DSP code, including USB. I would think that TDM would just work in the background as a hardware function - set it up to run, and then the DSP code reads and writes directly to its data registers. (8-bit mentality) No need for DMA because there's no buffer. The code literally grabs each individual sample as it comes, and delivers each result "just in time". And of course, everything else needs to work around this top-priority ISR that fires constantly at 96kHz, and potentially takes a significant amount of time to finish.

About that "significant amount of time to finish", I'm also thinking to use the Teensy's on-board LED as a "DSP load indicator". Turn it off at the start of the ISR and on and the end, so if it goes out completely, it's trying to do too much. If it's still visible, it's probably okay. And an oscilloscope can use it to verify the actual sample rate.


I wouldn't necessarily have to use the graphical editor or any of the existing Audio Library to build my audio chain. I'm perfectly okay with doing something like this, using my own functions:
Code:
void DSP_ISR()
{
    LED_OFF();
    
    
    if (new_coefficients_available())
    {
        get_new_coefficients();
    }
    
    if (coefficients_requested())
    {
        send_coefficients();
    }
    
    
    //magic numbers are channel numbers
    
    float sample = from_codec(3);
    
    sample = process1(sample, process1_state, process1_coefficients);
    to_usb(sample, 2);
    
    sample += from_usb(5);
    sample = process3(process2(sample,
                      process2_state, process2_coefficients),
             process3_state, process3_coefficients);
    
    to_codec(sample, 1);
    
    
    LED_ON();
}
 
@AaronD

Your goals for your project are cool, but you're breaking most of the assumptions upon which the Teensy Audio Library and the Teensy USB Audio implementations are built. In particular:

* The Teensy audio library assumes that audio is processed in 128-point blocks.
* The Teensy audio library only runs at 44100 Hz
* The Teensy audio library only uses 16-bit integers for the data type

You can hack the first two items via guidance here in the forums. The 3rd item is deeper. Here, you'd have to switch to one of the community-built extensions (OpenAudio or Tympan). But, even if you do that, you're not going to like it becuase:

* All of these libraries are very inefficient for 1-point audio blocks

You'll pay the price in CPU usage. If you've got the spare CPU (Teensy 4.1 is fast), it's an easy price to pay. But, if you don't have the CPU, you should abandon using these libraries. You'll have to build your structure for interrupt-triggered processing routines. For someone with your skills, this part is probably well within reach.

Additionally, the Teensy USB Audio class is intimately tied to the Teensy Audio Library (and its assumption of two channels of 16-bit data at 44100). So, the existing USB Audio stuff will probably also not work for you. You'll have to build your own system. Even for a person with decent skills, I think that this part would be pretty hard.

Finally, what are you going to use for acquiring your audio samples? If you are going to use an audio codec IC, be aware that speaking of anti-alias filtering as if you need to apply the filtering yourself...well, that kind of language is a bit obsolete. The typical audio codec uses sigma-delta processing to digitize the audio (ie, it samples at ultra high sample rates and using digital filters to decimate the data down to the requested sample rate). So, when you run an audio codec at "96 kHz", it's actually sampling at MHz speed internally and then it presents a filtered, decimated signal to you at the requested 96 kHz rate. They're amazing. Why should you care about this? Well, here's the last point that I wanted to make:

* Audio codec ICs have an internal latency that is usually 10-20 samples long, which is the audio processing pipeline for the decimation filter in the sigma-delta converter

So, if you're looking to minimize your signal acquisition time to get something close to 1-sample latency, you're going to have to switch away from audio codecs and use a dedicated A/D chip (one that avoids the typical sigma-delta latency). This is probably within your skill set, but it is another step away from the existing audio processing infrastructure offered by Teensy.

So, in summary, you'll need to base your system around a traditional A/D chip. And, given your desire for 1-sample processing, you'll probably need to abandon the Teensy audio library and the Teensy USB audio stuff. Really, then, it sounds like you're building from scratch. That's super cool...but, man, that's a lot of work in there to be done. And, if it were me, probably everything would be do-able with enough time...except the USB stuff. That USB stuff is a black art.

Chip
 
and whatever I write to that register goes out on the wire. But that's apparently not how the 32-bit world works

That's also not how USB or ethernet or even UART with FIFO work on 8 bit MCUs, for the 8 bit chips which have those more sophisticated peripherals.
 
The primary purpose is to undo the acoustic lowpass through a physical barrier that can't be removed. So there's a mic and a speaker with not much distance between them, but a fair amount of attenuation that is also frequency-dependent. I figure that the entire system needs to have about 200us (microseconds) of latency, analog-to-analog, including the DSP.

Maybe a good first question is whether this is even theoretically possible?

200us is the period of a 5 kHz waveform. Only 2 of the ~10 octaves humans can hear are above 5 kHz. But the effective delay of any realistic filter is going to be many times the period of whatever frequencies it filters or changes the gain.

My point is you may be imposing needlessly high requirements that only make the project far more difficult than it really needs to be... sort of like requiring switching the power to an incandescent light bulb in only microseconds, when the filament takes the better part of a second to change temperature enough to emit light. Audio is about the same, relatively low frequencies that propagate through air at a slow speed of about 1 foot per millisecond. Human perception of sound is also remarkably slow.

This doesn't mean you should write slow or inefficient code. But like any engineering, you should start first by considering the physical nature and fundamental requirements. There's little point to imposing tighter timing requirements than the nature of the problem to be solved really needs.
 
* The Teensy audio library assumes that audio is processed in 128-point blocks.
* The Teensy audio library only runs at 44100 Hz
* The Teensy audio library only uses 16-bit integers for the data type

You can hack the first two items via guidance here in the forums. The 3rd item is deeper. Here, you'd have to switch to one of the community-built extensions (OpenAudio or Tympan). But, even if you do that, you're not going to like it becuase:

* All of these libraries are very inefficient for 1-point audio blocks

You'll pay the price in CPU usage. If you've got the spare CPU (Teensy 4.1 is fast), it's an easy price to pay. But, if you don't have the CPU, you should abandon using these libraries. You'll have to build your structure for interrupt-triggered processing routines. For someone with your skills, this part is probably well within reach.

I figured as much. Buffered processing does tend to be more efficient than single-sample "flyby".

As for the first three points, I'm almost resigned to writing my own anyway, so it hardly matters what the existing library does. I'm still open to a single-sample, rate-agnostic library that uses 32-bit data or better, but I haven't found one yet. (and may not, because most things don't have my requirements)

I'm also aware that DSP functions have no concept of time, only samples, so that the coefficients have to be calculated with that in mind. You can't just throw a frequency at a rate-agnostic black box and expect it to work, but you *can* give it [frequency/rate], already computed. Or as I'm planning to do, calculate the coefficients externally in the low-priority code (including the sample rate of course), and then set a flag for the DSP_ISR to copy them synchronously into its working set.

Additionally, the Teensy USB Audio class is intimately tied to the Teensy Audio Library (and its assumption of two channels of 16-bit data at 44100). So, the existing USB Audio stuff will probably also not work for you. You'll have to build your own system. Even for a person with decent skills, I think that this part would be pretty hard.

I did manage to write a complete HID stack from scratch, on a PIC16F1454. (8-bit micro with hardware-adjusted internal clock to match the USB SOF rate, and 4 clocks per instruction at 48MHz, so 12MIPS actual) The official USB testing app on a dedicated PC (with PS/2 mouse and keyboard because it completely hoses the USB controller for anything else) eventually passed it, but I'm not looking forward to doing that again!

My first attempt at a scaled-down version of this project - stereo USB and 48kHz - was on a Raspberry Pi Pico, which itself is far better than the 8-bit stuff that I'm used to, but:
  • Once I got all of the supporting code working, it could barely keep up with a state-variable quadrature sinewave generator by itself.
  • The TinyUSB library that it uses has a bug in the Audio class, that doesn't allow it to restart after the host stops sending samples.
TinyUSB does, however, present the USB descriptors in a way that seems logical for me to understand and modify, so I think I got a correct set of descriptors for 48kHz 32-bit stereo in both directions, and I think it actually does that. Pull data out of the generic byte buffer and cast to int32_t, and it seemed to work...until I paused VLC and tried to play again.

Finally, what are you going to use for acquiring your audio samples? If you are going to use an audio codec IC, be aware that speaking of anti-alias filtering as if you need to apply the filtering yourself...well, that kind of language is a bit obsolete. The typical audio codec uses sigma-delta processing to digitize the audio (ie, it samples at ultra high sample rates and using digital filters to decimate the data down to the requested sample rate). So, when you run an audio codec at "96 kHz", it's actually sampling at MHz speed internally and then it presents a filtered, decimated signal to you at the requested 96 kHz rate. They're amazing. Why should you care about this? Well, here's the last point that I wanted to make:

* Audio codec ICs have an internal latency that is usually 10-20 samples long, which is the audio processing pipeline for the decimation filter in the sigma-delta converter

So, if you're looking to minimize your signal acquisition time to get something close to 1-sample latency, you're going to have to switch away from audio codecs and use a dedicated A/D chip (one that avoids the typical sigma-delta latency). This is probably within your skill set, but it is another step away from the existing audio processing infrastructure offered by Teensy.

So, in summary, you'll need to base your system around a traditional A/D chip. And, given your desire for 1-sample processing, you'll probably need to abandon the Teensy audio library and the Teensy USB audio stuff. Really, then, it sounds like you're building from scratch. That's super cool...but, man, that's a lot of work in there to be done. And, if it were me, probably everything would be do-able with enough time...except the USB stuff. That USB stuff is a black art.

Chip

I'm thinking to use a standard audio CODEC like Cirrus and a few others make. (probably Cirrus) I'm talking about the digital anti-alias filter in the chip itself, not one that I would write, because it does directly affect the analog-to-analog latency, as you said. Reading the datasheets for a bunch of Cirrus chips tells me that the group delay for "Double Speed" is about half of what it is for "Single Speed", one way. Also note that "Double Speed" corresponds to higher sample rates, and thus less time per sample in addition to the group delay being fewer samples.

So at 96kHz, which would be "Double Speed", I think it works to use one of their sigma-delta chips. Most of my latency is going to be in the CODEC, but there's still enough left for the 3-sample DSP pipeline - | transfer ADC to DSP | process in DSP | transfer DSP to DAC | - and some wiggle room to fine-tune via an explicit handful-of-samples delay block.

With a CS42448, for example, running at 96kHz and without that explicit delay block, it comes out to 177us minimum latency, analog-to-analog:
  • Each sample is 1/96kHz = 10.4us
  • The ADC group delay is 9 samples for Double Speed (page 13), or 94us
  • The DAC group delay is 5 samples for Double Speed (page 17), or 52us
  • The 3-sample pipeline adds another 31us
If my target is *exactly* 200us, then I have 23us to add, so the explicit delay needs to be 2 samples. But that 200us is just a rough estimate and not exact. Like everything else in the chain, the final tuning comes last, when everything else is in place and working, and is based on measurement, not prediction.

Maybe a good first question is whether this is even theoretically possible?

200us is the period of a 5 kHz waveform. Only 2 of the ~10 octaves humans can hear are above 5 kHz. But the effective delay of any realistic filter is going to be many times the period of whatever frequencies it filters or changes the gain.

My primary concern is creating a comb filter in the (probably 1st order) acoustic crossover range. Most of the time, comb filters are not as bad as people freak out about - they happen all the time in the natural environment, due to multipath interference, and we're used to it - but they do create some information about that environment, and I don't want to do that here. I want it to sound like the electronics and the barrier are both not there at all.

It's also good to compare to physical acoustics, as you did, but my comparison was 1/4-wave, which comes out to a bit over 1kHz. (1/4-wave comes up a lot in acoustics, as a common threshold of what can be ignored) Setting the lower threshold at 1k instead of 5k leaves some more audible range to be concerned about.

My point is you may be imposing needlessly high requirements that only make the project far more difficult than it really needs to be... sort of like requiring switching the power to an incandescent light bulb in only microseconds, when the filament takes the better part of a second to change temperature enough to emit light. Audio is about the same, relatively low frequencies that propagate through air at a slow speed of about 1 foot per millisecond. Human perception of sound is also remarkably slow.

This doesn't mean you should write slow or inefficient code. But like any engineering, you should start first by considering the physical nature and fundamental requirements. There's little point to imposing tighter timing requirements than the nature of the problem to be solved really needs.

That is entirely possible. If the acoustic crossover ends up being around 500Hz, then it's probably fine. If it's 2k, then it might be a problem. This is still early in the project, so I don't actually know where it is yet. It's difficult enough that if it does turn out to be a problem, it could become a major delay in the moment, so I want to spend that time and effort now, when it isn't as critical.

And there's also the challenge of solving these edge-of-possibility type problems. :cool: Even if it turns out to be a non-issue, I've still learned something.
 
Wow, it's been a while! Lots of things happening in the meantime (and still not completely done yet), but I'm finally back on this now. And I'm having some trouble with what seems like it should be dead simple, just to get an idea of how things are supposed to work before I tear them apart.

There's no pre-fab option for Audio+Serial. Only Audio, or Serial, or Audio+MIDI+Serial. So I'm trying to make that, with all else being the same so far. As I understand this thread, it *should* work to just copy/paste and mash together the blocks for each module in usb_desc.h, and the rest of the code should "just figure it out" based on what ends up being #defined or not? Fix up some indices and counts, but otherwise unchanged?

That *almost* works, but the two problems I'm having are:
  1. I can't seem to create a new USB_AUDIO_SERIAL flag for my custom configuration, and so it reverts to USB_SERIAL instead, with no audio at all.
  2. Commandeering the existing USB_AUDIO flag to replace the SEREMU block with the CDC one (or remove MIDI from *that* one), leaves it *almost* working. It prints() just fine, but doesn't read() correctly. This also prevents it from accepting a new program automatically. Fortunately, the hardware button still works for that.
Here's my commandeered configuration, from usb_desc.h (comment out the relevant original block, of course, to avoid a conflict):
C:
// #elif defined(USB_AUDIO)
#elif defined(USB_MIDI_AUDIO_SERIAL)
  #define VENDOR_ID        0x16C0
  #define PRODUCT_ID        0x0484
  #define MANUFACTURER_NAME    {'T','e','e','n','s','y','d','u','i','n','o'}
  #define MANUFACTURER_NAME_LEN    11
  #define PRODUCT_NAME        {'T','e','e','n','s','y',' ','A','u','d','i','o','/','S','e','r','i','a','l'}
  #define PRODUCT_NAME_LEN    19
  #define EP0_SIZE        64
  #define NUM_ENDPOINTS         5
  #define NUM_INTERFACE        5
  #define CDC_IAD_DESCRIPTOR    1
  #define CDC_STATUS_INTERFACE    0
  #define CDC_DATA_INTERFACE    1    // Serial
  #define CDC_ACM_ENDPOINT    2
  #define CDC_RX_ENDPOINT       3
  #define CDC_TX_ENDPOINT       3
  #define CDC_ACM_SIZE          16
  #define CDC_RX_SIZE_480       512
  #define CDC_TX_SIZE_480       512
  #define CDC_RX_SIZE_12        64
  #define CDC_TX_SIZE_12        64
  #define AUDIO_INTERFACE    2    // Audio (uses 3 consecutive interfaces)
  #define AUDIO_TX_ENDPOINT     4
  #define AUDIO_TX_SIZE         180
  #define AUDIO_RX_ENDPOINT     4
  #define AUDIO_RX_SIZE         180
  #define AUDIO_SYNC_ENDPOINT    5
  #define ENDPOINT2_CONFIG    ENDPOINT_RECEIVE_UNUSED + ENDPOINT_TRANSMIT_INTERRUPT
  #define ENDPOINT3_CONFIG    ENDPOINT_RECEIVE_BULK + ENDPOINT_TRANSMIT_BULK
  #define ENDPOINT4_CONFIG    ENDPOINT_RECEIVE_ISOCHRONOUS + ENDPOINT_TRANSMIT_ISOCHRONOUS
  #define ENDPOINT5_CONFIG    ENDPOINT_RECEIVE_UNUSED + ENDPOINT_TRANSMIT_ISOCHRONOUS
And my testing code in main.cpp:
C:
void loop()
{
    // put your main code here, to run repeatedly:

    Serial.println("Hello USB!");
    while(Serial.available())
    {
        int rx_data = Serial.read();
        Serial.write(rx_data);
        Serial1.write(rx_data);
    }

    Serial1.println("Hello UART!");
    while(Serial1.available())
    {
        int rx_data = Serial1.read();
        Serial1.write(rx_data);
        Serial.write(rx_data);
    }

    digitalWrite(LED_BUILTIN, HIGH);
    delay(500);
    digitalWrite(LED_BUILTIN, LOW);
    delay(500);
}
The unchanged framework does work correctly, so it must be something I did, and the testing code works.
Any thoughts?
 
Back
Top