Fast Convolution Filtering with Teensy 4.0 and audio board

Status
Not open for further replies.
Convolution with an audio IR

Where can I find the "filter_convolution" routine and can it be used to convolve an audio IR ?
I need to convolve an audio signal (continuous) with an IR file (wav), using a Teensy 4.0 + Audio board.

Thanks, Luigi

Hi Frank: Good to see you have tried the convolution filter on T4. You might recall from another, older thread that I had "wrapped" your convolution routine into an Audio library object (filter_convolution). I just received a couple of T4s and have updated Arduino IDE/Teensyduino/Visual Micro to the new versions that support T4. Once I added my filter_convolution files to the Audio library folder in the new Arduino 1.8.9 folder, I was able to compile my test program with no errors, so it is looking good. Its great that you don't have to change the arm_math.h file to use CMSIS4/5 like you used to have to do with the T3.5/6.
I don't have an audio adapter board wired up to the T4 yet, so I can't actually test that the code is working, but I'm optimistic.
Cheers
 
I made minimal changes to use USB input and spdif3 output and I get lots of static (setup works fine without convolution). Tried two difference versions (first was 09/11/2019). Any suggestions or perhaps someone willing to review/test the code?
Code:
/*
 * (c) DD4WH 15/05/2020
 * 
 * Real Time PARTITIONED BLOCK CONVOLUTION FILTERING (STEREO)
 * 
 * thanks a lot to Brian Millier and Warren Pratt for your help!
 * 
 * using a guitar cabinet impulse response with up to about 20000 coefficients per channel
 * 
 * uses Teensy 4.1  and Teensy audio shield rev D, uses PSRAM chip(s) soldered to underside of T4.1
 * 
 * inspired by and uses code from wdsp library by Warren Pratt
 * https://github.com/g0orx/wdsp/blob/master/firmin.c
 * 
*********************************************************************************

   GNU GPL LICENSE v3

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>

 *********************************************************************************
 */

#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <arm_math.h>
#include <arm_const_structs.h> // in the Teensy 4.0 audio library, the ARM CMSIS DSP lib is already a newer version
#include <utility/imxrt_hw.h> // necessary for setting the sample rate, thanks FrankB !
//#include <T4_PowerButton.h> // for flexRamInfo

//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// USER DEFINES

// Choose only one of these impulse responses !

//#define IR1  // 512 taps // MG impulse response from bmillier github @44.1ksps
//#define IR2  // 4096 taps // impulse response @44.1ksps
//#define IR3  // 7552 taps // impulse response @44.1ksps
//#define IR4    // 17920 taps // impulse response 400ms @44.1ksps
//#define IR5    // 21632 taps // impulse response 490ms @44.1ksps
//#define IR6 // 5760 taps, 18.72% load vs. 15.00%
//#define IR7 // 22016 taps, 50.62% load, 93.48% RAM1, 32064 bytes free
//#define IR8 // 25552 taps, too much !
// about 25000 taps is MAXIMUM --> about 0.5 seconds
//#define LPMINPHASE512 // 512 taps minimum phase 2.7kHz lowpass filter
//#define LPMINPHASE1024 // 1024 taps minimum phase 2.7kHz lowpass filter
#define LPMINPHASE2048PASSTHRU // 2048 taps minimum phase 19.0kHz lowpass filter
//#define LPMINPHASE4096 // 4096 taps minimum phase 2.7kHz lowpass filter
const float32_t PROGMEM audio_gain = 2.5; // has to be adjusted from 1.0 to 10.0 depending on the filter gain / impulse resonse gain
///////////////////////////////////////////////////////////////////////////////////////////////////////////////

#if defined(IR1)
#include "impulse_response_1.h"
const int nc = 512; // number of taps for the FIR filter
#elif defined(IR2)
#include "impulse_response_2.h"
const int nc = 4096; // number of taps for the FIR filter
#elif defined(IR3)
#include "impulse_response_3.h"
const int nc = 7552; // number of taps for the FIR filter
#elif defined(IR5)
#include "impulse_response_5.h"
const int nc = 21632; // number of taps for the FIR filter
#elif defined(IR6)
#include "impulse_response_6.h"
const int nc = 5760; // number of taps for the FIR filter, 
#elif defined(IR7)
#include "impulse_response_7.h"
const int nc = 22016; // number of taps for the FIR filter, 
#elif defined(IR8)
#include "impulse_response_8.h"
const int nc = 25552; // number of taps for the FIR filter, 
#elif defined(LPMINPHASE512)
#include "lp_minphase_512.h"
const int nc = 512;
#elif defined(LPMINPHASE1024)
#include "lp_minphase_1024.h"
const int nc = 1024;
#elif defined(LPMINPHASE2048PASSTHRU)
#include "lp_minphase_2048passthru.h"
const int nc = 2048;
#elif defined(LPMINPHASE4096)
#include "lp_minphase_4096.h"
const int nc = 4096;
#else
#include "impulse_response_4.h"
const int nc = 17920; // number of taps for the FIR filter
#endif

extern "C" uint32_t set_arm_clock(uint32_t frequency);

//#define LATENCY_TEST
const double PROGMEM FHiCut = 2500.0;
const double PROGMEM FLoCut = -FHiCut;
// define your sample rate
const double PROGMEM SAMPLE_RATE = 44100;  
// the latency of the filter is meant to be the same regardless of the number of taps for the filter
// partition size of 128 translates to a latency of 128/sample rate, ie. to 2.9msec with 44.1ksps

// latency can even be reduced by setting partitionsize to 64
// however, this only works, if you set AUDIO_BLOCK_SAMPLES to 64 in AudioStream.h
const int PROGMEM partitionsize = 128; 

//#define DEBUG
#define FOURPI  (2.0 * TWO_PI)
#define SIXPI   (3.0 * TWO_PI)
#define BUFFER_SIZE partitionsize
int32_t sum;
int idx_t = 0;
int16_t *sp_L;
int16_t *sp_R;
uint8_t PROGMEM FIR_filter_window = 1;
const uint32_t PROGMEM FFT_L = 2 * partitionsize; 
float32_t mean = 1;
uint8_t first_block = 1; 
const uint32_t PROGMEM FFT_length = FFT_L;
const int PROGMEM nfor = nc / partitionsize; // number of partition blocks --> nfor = nc / partitionsize
//float DMAMEM cplxcoeffs[nc * 2]; // this holds the initial complex coefficients for the filter BEFORE partitioning
float32_t DMAMEM maskgen[FFT_L * 2];
//float32_t DMAMEM fmask[nfor][FFT_L * 2]; // 
float32_t fmask[nfor][FFT_L * 2]; // 
float32_t DMAMEM fftin[FFT_L * 2];
float32_t accum[FFT_L * 2];
float fftout[nfor][FFT_L * 2]; // 

int buffidx = 0;
int k = 0;
//int idxmask = nfor - 1;

uint32_t all_samples_counter = 0;
uint8_t no_more_latency_test = 0;

const uint32_t N_B = FFT_L / 2 / BUFFER_SIZE;
uint32_t N_BLOCKS = N_B;
float32_t DMAMEM float_buffer_L [BUFFER_SIZE * N_B];  
float32_t DMAMEM float_buffer_R [BUFFER_SIZE * N_B]; 
float32_t DMAMEM last_sample_buffer_L [BUFFER_SIZE * N_B];  
float32_t DMAMEM last_sample_buffer_R [BUFFER_SIZE * N_B]; 
// complex FFT with the new library CMSIS V4.5
const static arm_cfft_instance_f32 *S;
// complex iFFT with the new library CMSIS V4.5
const static arm_cfft_instance_f32 *iS;
// FFT instance for direct calculation of the filter mask
// from the impulse response of the FIR - the coefficients
const static arm_cfft_instance_f32 *maskS;

// this audio comes from the codec by I2S
//AudioInputI2S            i2s_in;
AudioRecordQueue         Q_in_L;
AudioRecordQueue         Q_in_R;
AudioMixer4              mixleft;
AudioMixer4              mixright;
AudioPlayQueue           Q_out_L;
AudioPlayQueue           Q_out_R;
//AudioOutputI2S           i2s_out;
AudioOutputSPDIF3        spdif3;         //xy=502,202
AudioInputUSB            usb1;           //xy=143,207

AudioConnection          patchCord1(usb1, 0, Q_in_L, 0);
AudioConnection          patchCord2(usb1, 1, Q_in_R, 0);
// convolution here
AudioConnection          patchCord3(Q_out_L, 0, mixleft, 0);
AudioConnection          patchCord4(Q_out_R, 0, mixright, 0);
AudioConnection          patchCord9(mixleft, 0,  spdif3, 1);
AudioConnection          patchCord10(mixright, 0, spdif3, 0);

void setup() {
  //Serial.begin(115200);
  //while(!Serial);

  AudioMemory(10); 
  delay(100);

  // Enable the audio shield, select input, and enable output
  //codec.enable();
  //codec.adcHighPassFilterDisable(); // 
  //codec.inputSelect(AUDIO_INPUT_LINEIN); // AUDIO_INPUT_MIC
  //codec.volume(0.6);

  set_arm_clock(600000000);

  /****************************************************************************************
     Audio Setup
  ****************************************************************************************/
  mixleft.gain(0, 1.0);
  mixright.gain(0, 1.0);

  //setI2SFreq(SAMPLE_RATE);

  /****************************************************************************************
     properly initialise variables in DMAMEM
  ****************************************************************************************/

  for(unsigned jj = 0; jj < nfor; jj++)
  {
    for(unsigned ii = 0; ii < FFT_L * 2; ii++)
    {
      fftout[jj][ii] = 0.1;
      fmask[jj][ii] = 0.0;
    }
  }

/*  for (unsigned i = 0; i < FFT_length * 2; i++)
  {
      cplxcoeffs[i] = 0.0;
  }
*/

  /****************************************************************************************
     init complex FFTs
  ****************************************************************************************/
  switch (FFT_length)
  {
    case 128:
      S = &arm_cfft_sR_f32_len128;
      iS = &arm_cfft_sR_f32_len128;
      maskS = &arm_cfft_sR_f32_len128;
      break;
    case 256:
      S = &arm_cfft_sR_f32_len256;
      iS = &arm_cfft_sR_f32_len256;
      maskS = &arm_cfft_sR_f32_len256;
      break;
  }

  /****************************************************************************************
     Calculate the FFT of the FIR filter coefficients once to produce the FIR filter mask
  ****************************************************************************************/
    init_partitioned_filter_masks();

#ifdef HAVE_SERIAL
    flexRamInfo();

    Serial.println();
    Serial.print("AUDIO_BLOCK_SAMPLES:  ");     Serial.println(AUDIO_BLOCK_SAMPLES);

    Serial.println();
#endif
    
  /****************************************************************************************
     begin to queue the audio from the audio library
  ****************************************************************************************/
  delay(100);
  Q_in_L.begin();
  Q_in_R.begin();

} // END OF SETUP
  elapsedMillis msec = 0;
void loop() {
  elapsedMicros usec = 0;
  // are there at least N_BLOCKS buffers in each channel available ?
    if (Q_in_L.available() > N_BLOCKS + 0 && Q_in_R.available() > N_BLOCKS + 0)
    {

      // get audio samples from the audio  buffers and convert them to float
      for (unsigned i = 0; i < N_BLOCKS; i++)
      {
        sp_L = Q_in_L.readBuffer();
        sp_R = Q_in_R.readBuffer();

        // convert to float one buffer_size
        // float_buffer samples are now standardized from > -1.0 to < 1.0
        arm_q15_to_float (sp_L, &float_buffer_L[BUFFER_SIZE * i], BUFFER_SIZE); // convert int_buffer to float 32bit
        arm_q15_to_float (sp_R, &float_buffer_R[BUFFER_SIZE * i], BUFFER_SIZE); // convert int_buffer to float 32bit
        Q_in_L.freeBuffer();
        Q_in_R.freeBuffer();
      }
 
      /**********************************************************************************
          Digital convolution
       **********************************************************************************/
      //  basis for this was Lyons, R. (2011): Understanding Digital Processing.
      //  "Fast FIR Filtering using the FFT", pages 688 - 694
      //  numbers for the steps taken from that source
      //  Method used here: overlap-and-save

      // ONLY FOR the VERY FIRST FFT: fill first samples with zeros
      if (first_block) // fill real & imaginaries with zeros for the first BLOCKSIZE samples
      {
        for (unsigned i = 0; i < partitionsize * 4; i++)
        {
          fftin[i] = 0.0;
        }
        first_block = 0;
      }
      else
      {  // HERE IT STARTS for all other instances
        // fill FFT_buffer with last events audio samples
        for (unsigned i = 0; i < partitionsize; i++)
        {
          fftin[i * 2] = last_sample_buffer_L[i]; // real
          fftin[i * 2 + 1] = last_sample_buffer_R[i]; // imaginary
        }
      }
    
      // copy recent samples to last_sample_buffer for next time!
      for (unsigned i = 0; i < partitionsize; i++)
      {
        last_sample_buffer_L [i] = float_buffer_L[i];
        last_sample_buffer_R [i] = float_buffer_R[i];
      }

      // now fill recent audio samples into FFT_buffer (left channel: re, right channel: im)
      for (unsigned i = 0; i < partitionsize; i++)
      {
        fftin[FFT_length + i * 2] = float_buffer_L[i]; // real
        fftin[FFT_length + i * 2 + 1] = float_buffer_R[i]; // imaginary
      }

#if defined(LATENCY_TEST)
        if(msec > 2000 && !no_more_latency_test)
        {
        // latency test
        fftin[42] = 10.0; fftin[44] = -10.0;
        fftin[43] = 10.0; fftin[45] = -10.0;
        no_more_latency_test = 1;
        }
        if(no_more_latency_test == 1) all_samples_counter += partitionsize; 
#endif       
      /**********************************************************************************
          Complex Forward FFT
       **********************************************************************************/
      // calculation is performed in-place the FFT_buffer [re, im, re, im, re, im . . .]
      arm_cfft_f32(S, fftin, 0, 1);
      for(unsigned i = 0; i < partitionsize * 4; i++)
      {
          fftout[buffidx][i] = fftin[i];
      }

      /**********************************************************************************
          Complex multiplication with filter mask (precalculated coefficients subjected to an FFT)
          this is taken from wdsp library by Warren Pratt firmin.c
       **********************************************************************************/
      k = buffidx;

      for(unsigned i = 0; i < partitionsize * 4; i++)
      {
          accum[i] = 0.0;
      }
      
      for(unsigned j = 0; j < nfor; j++)
      { 
          for(unsigned i = 0; i < 2 * partitionsize; i= i + 4 )
          {
            // doing 8 of these complex multiplies inside one loop saves a HUGE LOT of processor cycles
              accum[2 * i + 0] +=  fftout[k][2 * i + 0] * fmask[j][2 * i + 0] -
                                   fftout[k][2 * i + 1] * fmask[j][2 * i + 1];
              accum[2 * i + 1] +=  fftout[k][2 * i + 0] * fmask[j][2 * i + 1] +
                                   fftout[k][2 * i + 1] * fmask[j][2 * i + 0]; 
              accum[2 * i + 2] +=  fftout[k][2 * i + 2] * fmask[j][2 * i + 2] -
                                   fftout[k][2 * i + 3] * fmask[j][2 * i + 3];
              accum[2 * i + 3] +=  fftout[k][2 * i + 2] * fmask[j][2 * i + 3] +
                                   fftout[k][2 * i + 3] * fmask[j][2 * i + 2]; 
              accum[2 * i + 4] +=  fftout[k][2 * i + 4] * fmask[j][2 * i + 4] -
                                   fftout[k][2 * i + 5] * fmask[j][2 * i + 5];
              accum[2 * i + 5] +=  fftout[k][2 * i + 4] * fmask[j][2 * i + 5] +
                                   fftout[k][2 * i + 5] * fmask[j][2 * i + 4]; 
              accum[2 * i + 6] +=  fftout[k][2 * i + 6] * fmask[j][2 * i + 6] -
                                   fftout[k][2 * i + 7] * fmask[j][2 * i + 7];
              accum[2 * i + 7] +=  fftout[k][2 * i + 6] * fmask[j][2 * i + 7] +
                                   fftout[k][2 * i + 7] * fmask[j][2 * i + 6]; 
          }
          k = k - 1;
          if(k < 0)
          {
            k = nfor - 1;
          } 

      } // end nfor loop

      buffidx = buffidx + 1;
      if(buffidx >= nfor)
      {
          buffidx = 0;    
      } 
      /**********************************************************************************
          Complex inverse FFT
       **********************************************************************************/
      arm_cfft_f32(iS, accum, 1, 1);

      /**********************************************************************************
          Overlap and save algorithm, which simply means yóu take only half of the buffer
          and discard the other half (which contains unusable time-aliased audio).
          Whether you take the left or the right part is determined by the position
          of the zero-padding in the filter-mask-buffer before doing the FFT of the 
          impulse response coefficients          
       **********************************************************************************/
        for (unsigned i = 0; i < partitionsize; i++)
        {
          //float_buffer_L[i] = accum[partitionsize * 2 + i * 2 + 0];
          //float_buffer_R[i] = accum[partitionsize * 2 + i * 2 + 1];
          float_buffer_L[i] = accum[i * 2 + 0] * audio_gain;
          float_buffer_R[i] = accum[i * 2 + 1] * audio_gain;
        }

     /**********************************************************************************
          Serial print the first output samples in order to check for latency
       **********************************************************************************/
#if defined(LATENCY_TEST)
        if(no_more_latency_test == 1)
        {
          no_more_latency_test = 2;
          for(unsigned i = 0; i < partitionsize; i++)
          {
            if(accum[i * 2 + 0] * audio_gain > 0.1 || accum[i * 2 + 0] * audio_gain < -0.1)
            {
              Serial.print(i + all_samples_counter - 21); Serial.print("   left:   "); Serial.println(accum[i * 2 + 0]);
              Serial.print(i + all_samples_counter - 21); Serial.print("  right:   "); Serial.println(accum[i * 2 + 1]);
            }
          }
        }
#endif

       /**********************************************************************
          CONVERT TO INTEGER AND PLAY AUDIO - Push audio into I2S audio chain
       **********************************************************************/
      for (int i = 0; i < N_BLOCKS; i++)
        {
          sp_L = Q_out_L.getBuffer();    
          sp_R = Q_out_R.getBuffer();
          arm_float_to_q15 (&float_buffer_L[BUFFER_SIZE * i], sp_L, BUFFER_SIZE); 
          arm_float_to_q15 (&float_buffer_R[BUFFER_SIZE * i], sp_R, BUFFER_SIZE);
          Q_out_L.playBuffer(); // play it !  
          Q_out_R.playBuffer(); // play it !
        }

       /**********************************************************************************
          PRINT ROUTINE FOR ELAPSED MICROSECONDS
       **********************************************************************************/
#ifdef DEBUG
      sum = sum + usec;
      idx_t++;
      if (idx_t > 400) {
        mean = sum / idx_t;
        if (mean / 29.00 / N_BLOCKS * SAMPLE_RATE / AUDIO_SAMPLE_RATE_EXACT < 100.0)
        {
          Serial.print("processor load:  ");
          Serial.print (mean / 29.00 / N_BLOCKS * SAMPLE_RATE / AUDIO_SAMPLE_RATE_EXACT);
          Serial.println("%");
        }
        else
        {
          Serial.println("100%");
        }
        Serial.print (mean);
        Serial.print (" microsec for ");
        Serial.print (N_BLOCKS);
        Serial.print ("  stereo blocks    ");
        Serial.print("FFT-length = "); Serial.print(FFT_length);
        Serial.print(";   FIR filter length = "); Serial.println(nc);
        Serial.print("k = "); Serial.println(k);
        Serial.print("buffidx = "); Serial.println(buffidx);
        idx_t = 0;
        sum = 0;
      }
#endif
    } // end of audio process loop
    
      /**********************************************************************************
          Add button check etc. here
       **********************************************************************************/

}

void init_partitioned_filter_masks()
{
    for(unsigned j = 0; j < nfor;j++)
    {
      // fill with zeroes
      for (unsigned i = 0; i < partitionsize * 4; i++)
      {
          maskgen[i] = 0.0;  
      }
      // take part of impulse response and fill into maskgen
      for (unsigned i = 0; i < partitionsize; i++)
      {   
        // THIS IS FOR REAL IMPULSE RESPONSES OR FIR COEFFICIENTS
        // FOR COMPLEX USE THE PART BELOW
          // the position of the impulse response coeffs (right or left aligned)
          // determines the useable part of the audio in the overlap-and-save (left or right part of the iFFT buffer)
          maskgen[i * 2 + partitionsize * 2] = guitar_cabinet_impulse[i + j * partitionsize];  
      }
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////      
/*    
 *     THIS COMMENTED OUT PART IS FOR COMPLEX FILTER COEFFS
 *     
 *     // take part of impulse response and fill into maskgen
      for (unsigned i = 0; i < partitionsize * 2; i++)
      {
          // the position of the impulse response coeffs (right or left aligned)
          // determines the useable part of the audio in the overlap-and-save (left or right part of the iFFT buffer)
          maskgen[i + partitionsize * 2] = guitar_cabinet_impulse[i + j * partitionsize * 2];  
      }
      */
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////      
      // perform complex FFT on maskgen
      arm_cfft_f32(maskS, maskgen, 0, 1);
      // fill into fmask array
      for (unsigned i = 0; i < partitionsize * 4; i++)
      {
          fmask[j][i] = maskgen[i];  
      }    
    }
}

void setI2SFreq(int freq) {
  // thanks FrankB !
  // PLL between 27*24 = 648MHz und 54*24=1296MHz
  int n1 = 4; //SAI prescaler 4 => (n1*n2) = multiple of 4
  int n2 = 1 + (24000000 * 27) / (freq * 256 * n1);
  double C = ((double)freq * 256 * n1 * n2) / 24000000;
  int c0 = C;
  int c2 = 10000;
  int c1 = C * c2 - (c0 * c2);
  set_audioClock(c0, c1, c2, true);
  CCM_CS1CDR = (CCM_CS1CDR & ~(CCM_CS1CDR_SAI1_CLK_PRED_MASK | CCM_CS1CDR_SAI1_CLK_PODF_MASK))
       | CCM_CS1CDR_SAI1_CLK_PRED(n1-1) // &0x07
       | CCM_CS1CDR_SAI1_CLK_PODF(n2-1); // &0x3f 
//Serial.printf("SetI2SFreq(%d)\n",freq);
}
 
Hmm, strange.

I have never used USB audio with the Teensy and have no hardware compatible with SPDIF, so I cannot test that.

One idea is the sample rate: why did you comment out the I2S setting to 44100sps? The filters are calculated for that sample rate. However it should not lead to crackles are alike, it could just sound a little different with different sample rates. But I am not sure whether USB / SPDIF needs a dedicated fixed and accurate sample rate?

EDIT: Not sure at all, but I remember there was a bug in former Teensyduino versions making necessary the use of AudioControlSGTL5000 audio_codec; in the sketch, even if no SGTL5000 was used???
Did you try the latest TD version or does inserting that improve the situation?
 
I don't use I2S at all, but restoring that rate line made no difference. A big clue - disabling all the convolution code and just taking samples, converting to float and then back also has the distortion. But pass-through patchcords (no Q buffers) works.

I'll keep trying. Possibly you could test USB in to i2s out? Just connect to a PC and play some music.
 
Tested the USB_Passthrough example with Teensy 4.0 & Audio board Rev D, Arduino 1.8.13 TD 1.53, but I cannot get it to work.

* Compiled the sketch with USB type: Audio
* compiled fine
* PC recognizes Digital audio (Teensy audio)
* if I play a WAV file through that output, nothing happens at the output of the USB passthrough sketch (headphone I2S output)
* same result == no audio with your sketch above . . .

Sorry, but USB audio seems to be broken somehow ??? Or I am doing something wrong.
 
good that it works for you now!

Very interesting!

Would you be willing to prepare a very short How-To for your room correction for speakers somewhere?


EDIT: Strangely, USB audio seems to work exclusively if I use USB type: "Serial + Midi + Audio", it does not work at all with USB type: "Audio"
and the USB input is extremely loud, so I have to lower the PC sound source considerably in order not to overload the codec input
Very nice opportunity to learn about how to use USB audio with the Teensy, thanks for that ;-).
 
Last edited:
Ran into an issue - speaker/room correction is normally done with a separate impulse for each channel. And I don't understand the code well enough to change it.
 
Inside the Convolution loop, we are doing a complex FFT / inverse FFT routine. The real part of the input is one channel, the imaginary part of the input is the other channel. So in one loop, BOTH channels are processed with the same IR

The convolution is thus done with a complex coefficient / IR - set. So it is not possible with this setting to get different IRs applied to the specific channels.

Simplest and fastest solution to implement is just copy/paste the code loops, so that you use one IR for one channel and then just have another copy of the code for the other channel.

Downside:
* doubles the CPU load
* doubles the memory

Alternative:
* modify the code for usage of real FFT instead of complex FFT

BUT:
* if you use real FFT as implemented in the CMSIS lib:
* same CPU load for two IRs !!!
* double memory --> because the CMSIS real FFT routine needs double sized buffers

So, my pragmatic recommendation:
* just copy / paste the code for two channels
* but beware of the size limits for the IR, but you know that :)
 
I think, I have to correct myself (although I am not so sure, do not have the time at the moment to test this or properly think about it):

* the code as it is, does in fact offer the possibility to use different IRs for different channels
* you can see a commented part of the code in function init_partitioned_filter_masks() (COMPLEX FILTER COEFFS)
* you would have to modify the code, so that both IRs (for left and right speaker channels) are both stuffed into the maskgen buffer
* maskgen should then consist of even values --> IR left and odd values --> IR right

That is at least my working hypothesis. Good luck in trying it out! I would be very interested in your results!
 
> both IRs (for left and right speaker channels) are both stuffed into the maskgen buffer

I thought about this, but if this would work, how does it work as the code is now? The imaginary/odd part of maskgen is currently set to all zeros. So if the imaginary part is the IR for the right channel, then I'd expect the right channel output to be silent.

The commercial DRC device is 6144 taps/ch x 2 channels, so 8192 x 2 would be good. I think memory use will be OK.
 
You are probably right . . . I remember fiddling around a lot about this when I wrote the code . . . I need to have a look into my handwritten archive for the initial preparations/thoughts on that.

I always get confused, because my initial goal was not audio correction with IRs, but SDR DSP for I & Q signals (90 degrees out of phase), which have to be treated differently than independent left & right signals . . .

My time is restricted at the moment, sorry. So, maybe you can try out to fill the odd values with your second IR and just listen to what happens :)?

However, just doing the processing for 8192-long IRs two times seems feasible with T4.0, I think :). So, your choice is: doing it the elegant way or just doing it quickly. The audio quality will be the same ;-).
 
Agreed, unless someone with better understanding wants to help, creating a second fmask array and then running the existing multiply/ifft process twice better fits my limited understanding of this stuff.
 
Here is the alpha code if you need two impulse responses applied to two channels:
 

Attachments

  • convolution.zip
    64.4 KB · Views: 36
While 9216x2 taps is OK, it could go higher if DMAMEM could be used for some of the large arrays (eg fmask). But with USB input and spdif output, this causes some static. Any idea why or how to fix?
 
Neither DMAMEM nor using real ffts worked out. But I was successful in using the symmetry of fft output to reduce memory use. This resulted in > 12,000 taps with stereo input and separate impulse responses for each channel.
 
Impulse WAV files

You are probably right . . . I remember fiddling around a lot about this when I wrote the code . . . I need to have a look into my handwritten archive for the initial preparations/thoughts on that.

I always get confused, because my initial goal was not audio correction with IRs, but SDR DSP for I & Q signals (90 degrees out of phase), which have to be treated differently than independent left & right signals . . .

My time is restricted at the moment, sorry. So, maybe you can try out to fill the odd values with your second IR and just listen to what happens :)?

However, just doing the processing for 8192-long IRs two times seems feasible with T4.0, I think :). So, your choice is: doing it the elegant way or just doing it quickly. The audio quality will be the same ;-).

Hi Frank- I'm just getting back to the partitioned FFT routine that you wrote, and which I ported to an Audio library. I want to be able to load different impulse files into Teensy via SD card. I've done this successfully with a few open-source cabinet simulation files that I got on the internet.
It appears that the ones that you embedded in your program, as .h files, came from a company in Germany that sells cab. simulation files. I assume you wrote a program to strip the coefficients out of the WAV files that you got. I'd like to determine if my program can correctly read/strip coefficients from other cab.simulation files, such as the ones you have. Could I please get 1 or 2 of them from you- preferably the ones with the greatest number of filter taps?
For interest sake, I should mention that my part. FFT library works fine when using the PJRC Audio Shield but NOT at all when I tried to use 1) SPDIF_Async Audio input library or 2) PT8211 Audio output library.
I.e. just I2S in and I2S out work properly. I haven't figured out why yet.
Best regards
Brian
 
Hi Frank: Please disregard my last post. I have written a program that takes those IRx.h files, converts them into format needed for a WAV file, and writes them out to a WAV file on SD card.
Best regards
Brian
 
Are Inverse FFT tools available?

Hello All,
This post was the closest thing I've found to confirm that there is inverse FFT out there for the Teensy/Arduino environment. Does anyone know if there is a bit of an introduction or tutorial? I'm a newbie to audio DSP, but having just done the FFT part of Pauls Audio Tutorial, it's sort of irresistible to try and manipulate the FFT results and feed them back into the inverse FFT. Pointers? Thank You
 
Hi Amensch: The convolution filtering that DD4WH and I worked on use FFT routines from the CMSIS DSP library. The relevant CMSIS library is now part of the build process for the Teensy 4.x, so these routines can be used without much trouble with Teensy 4.x. The inverse FFT is actually performed using the exact same CMSIS FFT call with one parameter changing from a 0 to a 1.
This particular thread concerns uniformly-partitioned FFT convolution, which is more complicated. But, there is also a thread on the forum about more conventional convolution filtering using FFT/iFFT (also DD4WH and myself). Its mixed in with SDR topics, but lots of useful information there too:

https://forum.pjrc.com/threads/40590-Teensy-Convolution-SDR-(Software-Defined-Radio)?highlight=DD4WH

I wrote an article on the conventional FFT convolution filtering in Circuit Cellar #346. There is a lot of good info there. My contract with CC doesn't allow me to distribute my articles (PDFs) , but you could contact Circuit Cellar themselves.
I have some of the information you might be interested in on my github site. Both the earlier code, as well as the newer partitioned code/information are there:
https://github.com/bmillier/Teensy-FFT-Convolution-Filter
https://github.com/bmillier/FFT-Convolution-Filter-Uniformly-partitioned

I'm not a math whiz, and I learnt most of what I picked up for these projects from the following book by Steven Smith, which is EXCELLENT
http://www.dspguide.com/

Hope this helps some.
regards
 
Convolution noise

Hi Amensch: The convolution filtering that DD4WH and I worked on use FFT routines from the CMSIS DSP library. The relevant CMSIS library is now part of the build process for the Teensy 4.x, so these routines can be used without much trouble with Teensy 4.x. The inverse FFT is actually performed using the exact same CMSIS FFT call with one parameter changing from a 0 to a 1.
This particular thread concerns uniformly-partitioned FFT convolution, which is more complicated. But, there is also a thread on the forum about more conventional convolution filtering using FFT/iFFT (also DD4WH and myself). Its mixed in with SDR topics, but lots of useful information there too:

https://forum.pjrc.com/threads/40590-Teensy-Convolution-SDR-(Software-Defined-Radio)?highlight=DD4WH

I wrote an article on the conventional FFT convolution filtering in Circuit Cellar #346. There is a lot of good info there. My contract with CC doesn't allow me to distribute my articles (PDFs) , but you could contact Circuit Cellar themselves.
I have some of the information you might be interested in on my github site. Both the earlier code, as well as the newer partitioned code/information are there:
https://github.com/bmillier/Teensy-FFT-Convolution-Filter
https://github.com/bmillier/FFT-Convolution-Filter-Uniformly-partitioned

I'm not a math whiz, and I learnt most of what I picked up for these projects from the following book by Steven Smith, which is EXCELLENT
http://www.dspguide.com/

Hope this helps some.
regards

I implemented the convolution, it works but a noise appears with the output signal...may you help me to understand how to cancel this noise ?

Thanks and best regards
 
Status
Not open for further replies.
Back
Top