Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 6 of 6

Thread: Teensy 4.0 Multiplexing 32 audio inputs

  1. #1
    Senior Member ghostintranslation's Avatar
    Join Date
    Oct 2019
    Location
    Halifax, NS, Canada
    Posts
    100

    Teensy 4.0 Multiplexing 32 audio inputs

    Hi,

    I made a basic circuit with 4 multiplexers on A0 A1 A2 A3, and I coded an Audiostream class to sample those at audio frequency and be able to plug them to any other audio object. 8 inputs can be sampled at 44.1kHz each, 16 at 22.05kHz, and up to 32 at 11.025kHz.

    This is part of a larger project for a Teensy 4.0 Eurorack modules base (actually v2 of it), for which it is useful to read signals up to 20kHz for FM for example.

    I tried many different things, and that's the best I could do so far. I used a PIT timer and the ADC lib, but I couldn't take advantage of DMA, everything I tried didn't work for DMA, probably because I am not used to use it.

    What the code does is it reads the 4 analog pins, then iterates the multiplexers, then read, then iterates... until I have 128 samples (or 64 or 32 when more than 8 inputs) for each input. The bottleneck is reading multiple pins "at the same time", but I'm almost sure someone will be able to add DMA or do something to improve and possibly get 44.1kHz on the 32 inputs.

    Filtering can be set with a method so that for slower sampling frequency it smooth the signals better.

    Here is a quick demo video:
    https://www.youtube.com/watch?v=uRedUWVYY1Y

    And here is the code:
    https://gist.github.com/ghostintrans...d0f74df730480e

    I'd be happy to hear from anyone that could increase the sampling frequency when more than 1 pin is read (more than 8 inputs)

  2. #2
    Junior Member
    Join Date
    Jan 2023
    Location
    Perth, Western Australia
    Posts
    5
    It is very poorly documented, but many Teensys including 4.0 have two ADCs, which can be used simultaneously.
    If you choose the appropriate analog input pins, you can read two pins at the same time, using the analogSynchronizedRead. Alternatively, you can schedule to the adc0 and adc1 to sample independently using startSingleRead and readSingle. This thread has some relevant discussion on the ADC library.

    Some analog pins are only connected to one of the ADCs, so you need to know which combinations of pins will work.
    There is some discussion about the Teensy 4.0 pins on this thread.
    Member @KurtE has kindly provided a reference table for Teensy 4. If you check the "Analog" column of sheet "T4" you can see that pins A0-A9 are connected to both ADCs, pins A10-A11 connect to the first ADC, and pins A12-A13 connect to the second ADC.

    Since you have used A0-A3 you should be safe to read any combinations of pins simultaneously, which should give you roughly twice the sampling throughput in your timer callback.

  3. #3
    Senior Member ghostintranslation's Avatar
    Join Date
    Oct 2019
    Location
    Halifax, NS, Canada
    Posts
    100
    Thanks for the suggestions,

    My understanding was that Teensy 4.0 had 2 ADCs yes but that only ADC0 was on pins A0 to A9, so that's one thing I learn.

    I gave it a try by setting up adc1 like adc0 with the fastest settings and using adc0 on A0 and A2 and using adc1 on A1 and A3, and calling adc->adc0->analogRead(A0) and adc->adc1->analogRead(A1), so that I can have 16 inputs with both adcs. But when I do that it is not fast enough still.

    Then I tried with adc->analogSynchronizedRead(A0,A1) and that is actually faster. I had looked at that function's source code before and thought it would not be faster and I probably didn't try at the time... So now with this one I can have 16 inputs at 44.1kHz!

    But if I try to use it for 32 inputs, it's too slow and I still need to divide the sampling frequency by 4.

    This is how I setup adc0 and adc1:
    Code:
    adc = new ADC();
      adc->adc0->setAveraging(1);   // set number of averages
      adc->adc0->setResolution(12); // set bits of resolution
      adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::VERY_HIGH_SPEED);
      adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::VERY_HIGH_SPEED);
      adc->adc0->startSingleRead(A0);
      adc->adc1->setAveraging(1);   // set number of averages
      adc->adc1->setResolution(12); // set bits of resolution
      adc->adc1->setConversionSpeed(ADC_CONVERSION_SPEED::VERY_HIGH_SPEED);
      adc->adc1->setSamplingSpeed(ADC_SAMPLING_SPEED::VERY_HIGH_SPEED);
      adc->adc1->startSingleRead(A1);
    This is how the code looks like in the interrupt with analogSynchronizedRead:

    Code:
      if (headQueueTempCount[muxIndex] < AUDIO_BLOCK_SAMPLES && headQueueTempCount[muxIndex+8] < AUDIO_BLOCK_SAMPLES ) {
        ADC::Sync_result result = adc->analogSynchronizedRead(A0,A1);
    
        for (int i = 0; i < downSamplingFactor; i++) {
          accumulator[muxIndex] = (lowPassCoeff[muxIndex] * (result.result_adc0* 16 - 32768)) + (1.0f - lowPassCoeff[muxIndex]) * accumulator[muxIndex];
          headQueueTemp[muxIndex][headQueueTempCount[muxIndex]] = accumulator[muxIndex];
          headQueueTempCount[muxIndex]++;
          
          accumulator[muxIndex+8] = (lowPassCoeff[muxIndex+8] * (result.result_adc1* 16 - 32768)) + (1.0f - lowPassCoeff[muxIndex+8]) * accumulator[muxIndex+8];
          headQueueTemp[muxIndex+8][headQueueTempCount[muxIndex+8]] = accumulator[muxIndex+8];
          headQueueTempCount[muxIndex+8]++;
        }
      }
    
      if (inputsCount > 16 && headQueueTempCount[muxIndex+16] < AUDIO_BLOCK_SAMPLES && headQueueTempCount[muxIndex+24] < AUDIO_BLOCK_SAMPLES ) {
        ADC::Sync_result result2 = adc->analogSynchronizedRead(A2,A3);
        
        for (int i = 0; i < downSamplingFactor; i++) {
          accumulator[muxIndex+16] = (lowPassCoeff[muxIndex+16] * (result2.result_adc0* 16 - 32768)) + (1.0f - lowPassCoeff[muxIndex+16]) * accumulator[muxIndex+16];
          headQueueTemp[muxIndex+16][headQueueTempCount[muxIndex+16]] = accumulator[muxIndex+16];
          headQueueTempCount[muxIndex+16]++;
          
          accumulator[muxIndex+24] = (lowPassCoeff[muxIndex+24] * (result2.result_adc1* 16 - 32768)) + (1.0f - lowPassCoeff[muxIndex+24]) * accumulator[muxIndex+24];
          headQueueTemp[muxIndex+24][headQueueTempCount[muxIndex+24]] = accumulator[muxIndex+24];
          headQueueTempCount[muxIndex+24]++;
        }
      }
    Regarding the asynchronous way, I can't actually do that because it needs to be timed precisely at 44100 * 8 = 352800Hz, or 4 pins to be read every 2.83us.

  4. #4
    Junior Member
    Join Date
    Jan 2023
    Location
    Perth, Western Australia
    Posts
    5
    The problem is that functions like analogRead and even analogSynchronizedRead will block the CPU in a loop while waiting for conversion to complete. You probably want to avoid doing this in a high-frequency timer interrupt.
    With the current refactored interrupt, each analogSynchronizedRead will take one conversion time, probably at least 0.7us to complete. Two of these calls gives you 1.4us blocking every 2.83us. In other words, roughly 50% of the CPU time is spent waiting for ADC conversions in the interrupt handler.

    Instead you could try using startSynchronizedSingleRead in the timer interrupt to start the conversion.
    Then in the ADC interrupt handler get the result via readSynchronizedSingle. Then start the next conversion, and handle its result, etc.
    Given that you always start adc0 and adc1 together, they should finish at the same time but actually you probably want to wait for both interrupts before reading the value. For example, have one ISR routine for both ADC0 and ADC1, and process the results every second interrupt, zeroing the count in the timer handler, and counting each time you get an ADC interrupt. You could also use this state to decide what to do next. If the count is 2, read and save the first pair of samples, and request the next sample. If the count is 4, read and save the second pair of samples, and update the analog input multiplexer.

    Its a bit messier to handle sampling in this asynchronous way, but if you can get it right you should have almost twice the CPU time left for other background processing, such as any Audio Library work. There will be some extra overhead for the additional interrupts, but I'll bet its small compared to the savings you can achieve.

  5. #5
    Senior Member ghostintranslation's Avatar
    Join Date
    Oct 2019
    Location
    Halifax, NS, Canada
    Posts
    100
    I tried the async as you suggested but I couldn't get it right..

    I did try again with DMA though and the timer provided by the ADC lib and I get better result than before, I can get the 32 inputs at 22.05kHz, so sampling frequency is only divided by 2 now.

    I have a simple procedural code if you want to try, I haven't yet converted it into my Input class:

    Code:
    #include <ADC.h>
    #include "DMAChannel.h"
    
    DMAChannel dmaChannel1;
    DMAChannel dmaChannel2;
    ADC* adc;
    uint16_t adc1PinIndex = 0;
    uint16_t adc2PinIndex = 1;
    const uint16_t buffSize = 128;
    const uint16_t inputsCount = 32;
    uint16_t buffers[inputsCount][buffSize * 2] {{0}};
    uint16_t bufferCount[inputsCount] = {0};
    uint16_t val1 = 0;
    uint16_t val2 = 0;
    uint16_t isr1Count = 0;
    uint16_t isr2Count = 0;
    uint16_t muxIndex = 0;
    uint8_t pinToChannel[4] = {
      7, // 14/A0  AD_B1_02
      8,  // 15/A1  AD_B1_03
      12, // 16/A2  AD_B1_07
      11, // 17/A3  AD_B1_06
    };
    void setup() {
      Serial.flush();
      Serial.begin(9600);
      while (!Serial && millis() < 5000) ;
    
      pinMode(A0, INPUT);
      pinMode(A1, INPUT);
      pinMode(A2, INPUT);
      pinMode(A3, INPUT);
      pinMode(2, OUTPUT);
      pinMode(3, OUTPUT);
      pinMode(4, OUTPUT);
    
      // Reset multiplexer to channel 0
      digitalWriteFast(2, LOW);
      digitalWriteFast(3, LOW);
      digitalWriteFast(4, LOW);
    
      dmaChannel1.source((volatile uint16_t &)(ADC1_R0));
      dmaChannel1.destination((volatile uint16_t &)val1);
      dmaChannel1.transferSize(2);
      dmaChannel1.transferCount(1);
      dmaChannel1.interruptAtCompletion();
      dmaChannel1.attachInterrupt(isr1);
      dmaChannel1.triggerAtHardwareEvent(DMAMUX_SOURCE_ADC1);
      dmaChannel1.enable();
    
      dmaChannel2.source((volatile uint16_t &)(ADC2_R0));
      dmaChannel2.destination((volatile uint16_t &)val2);
      dmaChannel2.transferSize(2);
      dmaChannel2.transferCount(1);
      dmaChannel2.interruptAtCompletion();
      dmaChannel2.attachInterrupt(isr2);
      dmaChannel2.triggerAtHardwareEvent(DMAMUX_SOURCE_ADC2);
      dmaChannel2.enable();
    
      adc = new ADC();
      adc->adc0->setAveraging(1);   // set number of averages
      adc->adc0->setResolution(12); // set bits of resolution
      adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::VERY_HIGH_SPEED);
      adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::VERY_HIGH_SPEED);
      adc->adc0->enableDMA();
      adc->adc0->startSingleRead(A0);
    
      adc->adc1->setAveraging(1);   // set number of averages
      adc->adc1->setResolution(12); // set bits of resolution
      adc->adc1->setConversionSpeed(ADC_CONVERSION_SPEED::VERY_HIGH_SPEED);
      adc->adc1->setSamplingSpeed(ADC_SAMPLING_SPEED::VERY_HIGH_SPEED);
      adc->adc1->enableDMA();
      adc->adc1->startSingleRead(A1);
    
      // Should be *16 to get 128 samples in 2902ms with 32 inputs
      // but then it actually takes 3780ms, so max is *8 to get 64 samples under 2902ms
      adc->adc0->startTimer(AUDIO_SAMPLE_RATE * 16);
      adc->adc1->startTimer(AUDIO_SAMPLE_RATE * 16);
    }
    
    void loop() {
    }
    
    
    void isr1() {
      isr1Count++;
    
      if (isr1Count <= 2) {
        uint16_t inputIndex = muxIndex + 8 * adc1PinIndex;
    
        buffers[inputIndex][bufferCount[inputIndex]] = val1;
        bufferCount[inputIndex]++;
    
        if (adc1PinIndex == 0) {
          // Switching ADC1 mux to pin A2
          adc1PinIndex = 2;
          ADC1_HC0 = pinToChannel[adc1PinIndex] | ADC_HC_AIEN;
        }
      }
      processBothIsr();
      dmaChannel1.clearInterrupt();
      asm("DSB");
    }
    
    void isr2() {
      isr2Count++;
    
      if (isr2Count <= 2) {
        uint16_t inputIndex = muxIndex + 8 * adc2PinIndex;
    
        buffers[inputIndex][bufferCount[inputIndex]] = val2;
        bufferCount[inputIndex]++;
    
        if (adc2PinIndex == 1) {
          // Switching ADC2 mux to pin A3
          adc2PinIndex = 3;
          ADC2_HC0 = pinToChannel[adc2PinIndex] | ADC_HC_AIEN;
        }
      }
    
      processBothIsr();
      dmaChannel2.clearInterrupt();
      asm("DSB");
    }
    
    elapsedMicros timer;
    
    void processBothIsr() {
      if (isr1Count < 2 || isr2Count < 2) {
        return;
      }
    
      // Switching ADC1 mux to pin A0
      adc1PinIndex = 0;
      ADC1_HC0 = pinToChannel[adc1PinIndex] | ADC_HC_AIEN;
      
      // Switching ADC2 mux to pin A1
      adc2PinIndex = 1;
      ADC2_HC0 = pinToChannel[adc2PinIndex] | ADC_HC_AIEN;
      
      isr1Count = 0;
      isr2Count = 0;
    
      muxIndex++;
      muxIndex = muxIndex % 8;
    
      digitalWriteFast(2, muxIndex & 1);
      digitalWriteFast(3, muxIndex & 2);
      digitalWriteFast(4, muxIndex & 4);
    
      if (bufferCount[inputsCount - 1] >= buffSize) {
        // UNCOMENT THIS TO LOOK AT THE SIGNALS
        //    for (int i = 0; i < buffSize; i++) {
        //      for (int j = 0; j < inputsCount; j++) {
        //        if (j % 8 == 0) {
        //          Serial.print(buffers[j][i]);
        //          Serial.print(",");
        //        }
        //      }
        //      Serial.println("");
        //    }
    
        // UNCOMMENT THIS TO LOOK AT THE TIMING
        Serial.println(timer);
        timer = 0;
    
        Serial.flush();
    
        for (int i = 0; i < inputsCount; i++) {
          bufferCount[i] = 0;
        }
      }
    }
    You can try without having the hardware, this code is serial printing the timing it takes to get 128 samples.

    It looks like the bottleneck is the switch of the ADCs mux in isr1 and isr2:
    ADC1_HC0 = pinToChannel[adc1PinIndex] | ADC_HC_AIEN;
    and
    ADC21_HC0 = pinToChannel[adc2PinIndex] | ADC_HC_AIEN;

    Though these instruction does not affect the timing in processBothIsr, probably because this one runs less frequently.

    Not sure if there is a faster way to switch the muxs? Or else any idea?

  6. #6
    Senior Member ghostintranslation's Avatar
    Join Date
    Oct 2019
    Location
    Halifax, NS, Canada
    Posts
    100
    Strangely if I run the ADCs like that:
    Code:
    adc->adc0->startContinuous(A0);
    adc->adc1->startContinuous(A1);
    instead of:
    Code:
    adc->adc0->startTimer(AUDIO_SAMPLE_RATE * 16);
    adc->adc1->startTimer(AUDIO_SAMPLE_RATE * 16);
    it manages to get 128 samples in 2403ms.., but with the timer the limit is at 3780ms unless I comment the ADCs mux switching instructions...

    Any idea how to get it to run at AUDIO_SAMPLE_RATE * 16? It looks like the hardware is capable at least.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •