Forum Rule: Always post complete source code & details to reproduce any issue!
Page 1 of 2 1 2 LastLast
Results 1 to 25 of 33

Thread: Audio Equalizer using FIR

  1. #1
    Senior Member
    Join Date
    Dec 2016
    Posts
    131

    Audio Equalizer using FIR

    I have been putting together some communications object blocks with floating point data interfaces. This format of Audio objects has been discussed on this forum https://forum.pjrc.com/threads/40568...rary-Extension, and documented by Chip Audette in his OpenAudio_ArduinoLibrary library and the Tympan library. It occurred to me that one of the blocks might have application in the 16-bit audio applications and so I converted it to have a fixed point Q15 interface, the same as the Teensy Audio Library. This note is just about the 16-bit integer version and makes this block available for such use.

    The AudioFilterEqualizer designs, analyzes and applies a general FIR-filter-based multi-band equalizer. The number of bands is essentially unlimited (51 now) and the frequency limits of the bands are arbitrary. The relative gain for each band is set by specifying a dB level. The Fourier transform of this multi-band response is windowed by a Kaiser window that allows a side-lobe specified parameter to determine the trade-off between rate of transition between bands and the side-lobe levels for a steep transition. The number of coefficients (taps) of the FIR filter is variable from 4 to 250.

    Many details are covered in the introductory material to the .h file listed below. In summary, the equalizer object is instantiated as an AudioFilterEqualizer object with a single input and output. It needs no 'begin' function as it comes up running a 4-tap pass-through filter. The equalizer details are entered by a equalizerNew() function that can also be called on-the-fly to change settings. The equalizer is specified by a pair of float arrays supplied by the .INO. Likewise, an int16_t array is .INO supplied to hold the FIR coefficients.

    An equalizerResponse() function gives back the dB response that is actually achieved, including quantization effects. A float32_t array to hold these is again .INO supplied

    This block is more general than most graphic equalizers, in that the the lower attenuation level is not limited to -12 dB. This allows standard LP, HP, BP and BR responses to be included. The biggest limitation in using the equalizer is the low frequency response. This is limited by the number of IR taps. At a sample rate of 44.1 kHz the 200 tap response below 100 or 200 Hz is in the right direction, but somewhat approximate.There are ways to improve this, but it remains a factor. That said, for communications applications where the lowest frequency of interest is around 200 Hz, it is not a problem.

    The internally designed FIR filter is symmetric which means that the delay through the filter is constant for all frequencies (linear phase).

    Here is the include AudioFilterEqualizer_I16.h file, with supplementary info.
    Code:
    /*
     * AudioFilterEqualizer_I16
     *
     * Created: Bob Larkin   W7PUA   14 May 2020
     *
     * This is a direct translation of the receiver audio equalizer written
     * by this author for the open-source DSP-10 radio in 1999.  See
     * http://www.janbob.com/electron/dsp10/dsp10.htm and
     * http://www.janbob.com/electron/dsp10/uhf3_35a.zip
     * This version processes blocks of 16-bit integer audio (as opposed to
     * the Chip Audette style F32 floating point baudio.)
     *
     * Credit and thanks to PJRC, Paul Stoffregen and Teensy products for the audio
     * system and library that this is built upon as well as the float32
     * work of Chip Audette embodied in the OpenAudio_ArduinoLibrary. Many thanks
     * for the library structures and wonderful Teensy products.
     *
     * This equalizer is specified by an array of 'nBands' frequency bands
     * each of of arbitrary frequency span.  The first band always starts at
     * 0.0 Hz, and that value is not entered.  Each band is specified by the upper
     * frequency limit to the band.
     * The last band always ends at half of the sample frequency, which for 44117 Hz
     * sample frequency would be 22058.5.  Each band is specified by its upper
     * frequency in an .INO supplied array feq[].  The dB level of that band is
     * specified by a value, in dB, arranged in an .INO supplied array
     * aeq[].  Thus a trivial bass/treble control might look like:
     *   nBands = 3;
     *   feq[] = {300.0, 1500.0, 22058.5};
     *   float32_t bass = -2.5;  // in dB, relative to anything
     *   float32_t treble = 6.0;
     *   aeq[] = {bass, 0.0, treble};
     *
     * It may be obvious that this equalizer is a more general case of the common
     * functions such as low-pass, band-pass, notch, etc.   For instance, a pair
     * of band pass filters would look like:
     *   nBands = 5;
     *   feq[] = {500.0, 700.0, 2000.0, 2200.0, 22058.5};
     *   aeq[] = {-100.0, 0.0, -100.0, 2.0, -100.0};
     * where we added 2 dB of gain to the 2200 to 2400 Hz filter, relative to the 500
     * to 700 Hz band.
     *
     * An octave band equalizer is made by starting at some low frequency, say 40 Hz for the
     * first band.  The lowest frequency band will be from 0.0 Hz up to that first frequency.
     * Next multiply the first frequency by 2, creating in our example, a band from 40.0
     * to 80 Hz.  This is continued until the last frequency is about 22058 Hz.
     * This works out to require 10 bands, as follows:
     *   nBands = 10;
     *   feq[] = {    40.0, 80.0, 160.0, 320.0, 640.0, 1280.0, 2560.0, 5120.0, 10240.0, 22058.5};
     *   aeq[] = {  5.0,  4.0,  2.0,   -3.0, -4.0,  -1.0,    3.0, 6.0,    3.0,     0.5      };
     *
     * For a "half octave" equalizer, multiply each upper band limit by the square root of 2 = 1.414
     * to get the next band limit.  For that case, feq[] would start with a sequence
     * like 40, 56.56, 80.00, 113.1, 160.0, ... for a total of about 20 bands.
     *
     * How well all of this is achieved depends on the number of FIR coefficients
     * being used.  In the Teensy 3.6 / 4.0 the resourses allow a hefty number,
     * say 201, of coefficients to be used without stealing all the processor time
     * (see Timing, below).  The coefficient and FIR memory is sized for a maximum of
     * 250 coefficients, but can be recompiled for bigger with the define FIR_MAX_COEFFS.
     * To simplify calculations, the number of FIR coefficients should be odd.  If not
     * odd, the number will be reduced by one, quietly.
     *
     * If you try to make the bands too narrow for the number of FIR coeffficients,
     * the approximation to the desired curve becomes poor.  This can all be evaluated
     * by the function getResponse(nPoints, pResponse) which fills an .INO-supplied array
     * pResponse[nPoints] with the frequency response of the equalizer in dB.  The nPoints
     * are spread evenly between 0.0 and half of the sample frequency.
     *
     * Initialization is a 2-step process.  This makes it practical to change equalizer
     * levels on-the-fly.  The constructor starts up with a 4-tap FIR setup for direct
     * pass through.  Then the setup() in the .INO can specify the equalizer.
     * The newEqualizer() function has several parameters, the number of equalizer bands,
     * the frequencies of the bands, and the sidelobe level. All of these can be changed
     * dynamically.  This function can be changed dynamically, but it may be desireable to
     * mute the audio during the change to prevent clicks.
     * 
     * This 16-bit integer version adjusts the maximum coefficient size to scale16 in the calls
     * to both equalizerNew() and getResponse().  Broadband equalizers can work with full-scale
     * 32767.0f sorts of levels, where narrow band filtering may need smaller values to
     * prevent overload. Experiment and check carefully.  Use lower values if there are doubts.
     * 
     * For a pass-through function, something like this (which can be intermixed with fancy equalizers):
     * float32_t fBand[] = {10000.0f, 22058.5f}; 
     * float32_t dbBand[] = {0.0f, 0.0f};
     * equalize1.equalizerNew(2, &fBand[0], &dbBand[0], 4, &equalizeCoeffs[0], 30.0f, 32767.0f);
     *
     * Measured Q15 timing of update() for a 128 sample block on a T3.6:
     *     Fixed time 11.6 microseconds
     *     Per FIR Coefficient time 5.6 microseconds
     *     Total for 200 FIR Coefficients = 1140 microseconds (39.3% of fs=44117 Hz available time)
     *
     * Copyright (c) 2020 Bob Larkin
     * Any snippets of code from PJRC used here brings with it
     * the associated license.
     *
     * In addition, work here is covered by MIT LIcense:
     *
     * Permission is hereby granted, free of charge, to any person obtaining a copy
     * of this software and associated documentation files (the "Software"), to deal
     * in the Software without restriction, including without limitation the rights
     * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
     * copies of the Software, and to permit persons to whom the Software is
     * furnished to do so, subject to the following conditions:
     *
     * The above copyright notice and this permission notice shall be included in all
     * copies or substantial portions of the Software.
     *
     * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
     * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
     * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
     * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
     * SOFTWARE.
     */
    
    #ifndef _filter_equalizer_h
    #define _filter_equalizer_h
    
    #include "Arduino.h"
    #include "arm_math.h"
    #include "Audio.h"
    #include "AudioStream.h"
    
    #ifndef MF_PI
    #define MF_PI 3.1415926f
    #endif
    
    // Temporary timing test
    #define TEST_TIME_EQ 0
    
    #define EQUALIZER_MAX_COEFFS 250
    
    #define ERR_EQ_BANDS 1
    #define ERR_EQ_SIDELOBES 2
    #define ERR_EQ_NFIR 3
    
    class AudioFilterEqualizer : public AudioStream
    {
      public:
        AudioFilterEqualizer(void): AudioStream(1,inputQueueArray) {
          // Initialize FIR instance (ARM DSP Math Library) with default simple passthrough FIR
          if (arm_fir_init_q15(&fir_inst, nFIRused, (q15_t *)cf16, &StateQ15[0], AUDIO_BLOCK_SAMPLES)
               != ARM_MATH_SUCCESS) {
            cf16 = NULL;
          }
        }
    
        uint16_t equalizerNew(uint16_t _nBands, float32_t *feq, float32_t *adb,
                      uint16_t _nFIR, int16_t *_cf, float32_t kdb, float32_t scale16);
    
        void getResponse(uint16_t nFreq, float32_t *rdb, float32_t scale16);
    
        void update(void);
    
      private:
        audio_block_t *inputQueueArray[1];
        uint16_t block_size = AUDIO_BLOCK_SAMPLES;
        int16_t firStart[4] = {0, 32767, 0, 0};  // Initialize to passthrough
        int16_t* cf16 = firStart;        // pointer to current coefficients
        uint16_t nFIR  = 4;              // Number of coefficients
        uint16_t nFIRdesign = 3;         // used in designing filter
        uint16_t nFIRused = 4;           // Adjusted nFIR, nFIR-1 for nFIR odd.
        uint16_t nBands = 2;
        float32_t sample_rate_Hz = AUDIO_SAMPLE_RATE;
    
        //  *Temporary* - TEST_TIME allows measuring time in microseconds for each part of the update()
    #if TEST_TIME_EQ
        elapsedMicros tElapse;
        int32_t iitt = 999000;     // count up to a million during startup
    #endif
    
        // ARM DSP Math library filter instance
        arm_fir_instance_q15 fir_inst;
        int16_t StateQ15[AUDIO_BLOCK_SAMPLES + EQUALIZER_MAX_COEFFS];  // max, max
        
        /* float i0f(float x)  Returns the modified Bessel function Io(x).
         * Algorithm is based on Abromowitz and Stegun, Handbook of Mathematical
         * Functions, and Press, et. al., Numerical Recepies in C.
         * All in 32-bit floating point
         */
        float i0f(float x) {
          float af, bf, cf;
          if( (af=fabsf(x)) < 3.75f ) {
            cf = x/3.75f;
            cf = cf*cf;
            bf=1.0f+cf*(3.515623f+cf*(3.089943f+cf*(1.20675f+cf*(0.265973f+
                 cf*(0.0360768f+cf*0.0045813f)))));
          }
          else {
            cf = 3.75f/af;
            bf=(expf(af)/sqrtf(af))*(0.3989423f+cf*(0.0132859f+cf*(0.0022532f+
                 cf*(-0.0015756f+cf*(0.0091628f+cf*(-0.0205771f+cf*(0.0263554f+
                 cf*(-0.0164763f+cf*0.0039238f))))))));
          }
          return bf;
        }
    };
    #endif
    Next is the AudioFilterEqualizer_I16.cpp file.
    Code:
    /* AudioFilterEqualizer_I16.cpp
     *
     * Bob Larkin,  W7PUA  14 May 2020
     *
     * See AudioFilterEqualizer_I16.h for much more explanation on usage.
     * 
     * Copyright (c) 2020 Bob Larkin
     * Any snippets of code from PJRC and used here brings with it
     * the associated license.
     *
     * In addition, work here is covered by MIT LIcense:
     *
     * Permission is hereby granted, free of charge, to any person obtaining a copy
     * of this software and associated documentation files (the "Software"), to deal
     * in the Software without restriction, including without limitation the rights
     * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
     * copies of the Software, and to permit persons to whom the Software is
     * furnished to do so, subject to the following conditions:
     *
     * The above copyright notice and this permission notice shall be included in all
     * copies or substantial portions of the Software.
     *
     * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
     * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
     * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
     * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
     * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
     * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
     * SOFTWARE.
     */
    
    #include "AudioFilterEqualizer_I16.h"
    
    void AudioFilterEqualizer::update(void)  {
        audio_block_t *block, *block_new;
    
    #if TEST_TIME_EQ
      if (iitt++ >1000000) iitt = -10;
      uint32_t t1, t2;
      t1 = tElapse;
    #endif
        block = receiveReadOnly();
        if (!block) return;
    
        // Check for coefficients
        if (cf16 == NULL) {
          release(block);
          return;
        }
    
        // get a block for the FIR output
        block_new = allocate();
        if (block_new) {
            //apply the FIR
            arm_fir_q15(&fir_inst, block->data, block_new->data, AUDIO_BLOCK_SAMPLES);
            transmit(block_new); // send the FIR output
            release(block_new);
        }
        release(block);
        
    #if TEST_TIME_EQ
      t2 = tElapse;
      if(iitt++ < 0) {Serial.print("At AudioEqualizer end, microseconds = ");  Serial.println (t2 - t1); }
      t1 = tElapse;
    #endif
    }
    
    /* equalizerNew() calculates the Equalizer FIR filter coefficients. Works from:
     * uint16_t equalizerNew(uint16_t _nBands, float32_t *feq, float32_t *adb,
                          uint16_t _nFIR, int16_t *_cf, float32_t kdb, float32_t scale16)
     *   nBands   Number of equalizer bands
     *   feq      Pointer to array feq[] of nBands breakpoint frequencies, fractions of sample rate, Hz
     *   adb      Pointer to array aeq[] of nBands levels, in dB, for the feq[] defined frequency bands
     *   nFIR     The number of FIR coefficients (taps) used in the equalzer 
     *   cf       Pointer to an array of int16 to hold FIR coefficients
     *   kdb      A parameter that trades off sidelobe levels for sharpness of band transition.
     *            kdb=30 sharp cutoff, higher sidelobes
     *            kdb=60 slow cutoff, low sidelobes
     *   scale16  A float number that sets the maximum int value for coefficients. Max 32768.0f
     *
     * The arrays, feq[], aeq[] and cf[] are supplied by the calling .INO
     *
     * Returns: 0 if successful, or an error code if not.
     * Errors:  1 = ERR_EQ_BANDS =     Too many bands, 50 max
     *          2 = ERR_EQ_SIDELOBES = Sidelobe level out of range, must be > 0
     *          3 = ERR_EQ_NFIR =      nFIR out of range
     *
     * Note - This function runs at setup time, and there is no need to fret about
     * processor speed.  Likewise, local arrays are created on the stack and memory space is
     * available for other use when this function closes.
     */
    uint16_t AudioFilterEqualizer::equalizerNew(uint16_t _nBands,  float32_t* feq,  float32_t* adb,
                          uint16_t _nFIR,  int16_t* pcf16,  float32_t kdb,  float32_t scale16)  {
        uint16_t i, j;
        uint16_t nHalfFIR;
        float32_t beta, kbes;
        float32_t q, xj2, scaleXj2, WindowWt;
        float32_t cf[250];
        float32_t fNorm[50];   // Normalized to the sampling frequency
        float32_t aVolts[50];  // Convert from dB to "quasi-Volts"
    
        cf16 = pcf16;   // Set the private copies
        nFIR = _nFIR;
        nBands = _nBands;
    
        // Check range of nFIR. q15 FIR requires even number of coefficients,
        // but for historic reasons, we design odd number FIR.  So add a 
        // variable nFIRused that is even, and one more than the design value.
        if (nFIR<4 || nFIR>EQUALIZER_MAX_COEFFS)
            return ERR_EQ_NFIR;
    
        if (2*(nFIR/2) == nFIR)  {       // nFIR even
    		nFIRdesign = nFIR - 1;
    		nFIRused = nFIR;
        }
        else {                            // nFIR odd
            nFIRdesign = nFIR - 2;        // Avoid this
            nFIRused = nFIR - 1;
        }
        nHalfFIR = (nFIRdesign - 1)/2;  // If nFIRdesign=199, nHalfFIR=99
    
        for (int kk = 0; kk<nFIRdesign; kk++)  // To be sure, zero the coefficients
          cf[kk] = 0.0f;
    
        // Convert dB to Voltage ratios, frequencies to fractions of sampling freq
        if(nBands <2 || nBands>50)  return ERR_EQ_BANDS;
        for (i=0; i<nBands; i++)  {
           aVolts[i]=powf(10.0, (0.05*adb[i]));
           fNorm[i]=feq[i]/sample_rate_Hz;
        }
    
        /* Find FIR coefficients, the Fourier transform of the frequency
         * response. This is done by dividing the response into a sequence
         * of nBands rectangular frequency blocks, each of a different level.
         * We can precalculate the sinc Fourier transform for each rectangular band.
         * The linearity of the Fourier transform allows us to sum the transforms
         * of the individual blocks to get pre-windowed coefficients.
         *
         * Numbering example for nFIRdesign==199:
         * Subscript 0 to 98 is 99 taps;  100 to 198 is 99 taps; 99+1+99=199 taps
         * The center coef ( for nFIRdesign=199 taps, nHalfFIR=99 ) is a
         * special case that comes from sin(0)/0 and treated first:
         */
        cf[nHalfFIR] = 2.0f*(aVolts[0]*fNorm[0]);  // Coefficient "99"
        for(i=1; i<nBands; i++) {
            cf[nHalfFIR] += 2.0f*aVolts[i]*(fNorm[i]-fNorm[i-1]);
        }
        for (j=1; j<=nHalfFIR; j++) {          // Coefficients "100 to 198"
            q = MF_PI*(float32_t)j;
            // First, deal with the zero frequency end band that is "low-pass."
            cf[j+nHalfFIR] = aVolts[0]*sinf(fNorm[0]*2.0f*q)/q;
            //  and then the rest of the bands that have low and high frequencies
            for(i=1; i<nBands; i++)
                cf[j+nHalfFIR] += aVolts[i]*( (sinf(fNorm[i]*2.0f*q)/q) - (sinf(fNorm[i-1]*2.0f*q)/q) );
        }
    
        /* At this point, the cf[] coefficients are simply truncated sin(x)/x, creating
         * very high sidelobe responses. To reduce the sidelobes, a windowing function is applied.
         * This has the side affect of increasing the rate of cutoff for sharp frequency changes.
         * The only windowing function available here is that of James Kaiser.  This has a number
         * of desirable features.  These include being able to tradeoff sidelobe level
         * for rate of cutoff cutoff between frequency bands.
         * We specify it in terms of kdb, the highest sidelobe, in dB, next to a sharp cutoff. For
         * calculating the windowing vector, we need a Kaiser parameter beta, found as follows:
         */
        if (kdb<0) return ERR_EQ_SIDELOBES;
        if (kdb>50)
            beta = 0.1102f*(kdb-8.7f);
        else if  (kdb>20.96f && kdb<=50.0f)
            beta = 0.58417f*powf((kdb-20.96f), 0.4f) + 0.07886f*(kdb-20.96f);
        else
            beta=0.0f;
        // i0f is the floating point in & out zero'th order modified Bessel function
        kbes = 1.0f / i0f(beta);      // An additional derived parameter used in loop
    
        // Apply the Kaiser window, j = 0 is center coeff window value
        scaleXj2 = 2.0f/(float32_t)nFIRdesign;
        scaleXj2 *= scaleXj2;
        for (j=0; j<=nHalfFIR; j++) {  // For 199 Taps, this is 0 to 99
             xj2 = (int16_t)(0.5f+(float32_t)j);
             xj2 = scaleXj2*xj2*xj2;
             WindowWt=kbes*(i0f(beta*sqrtf(1.0-xj2)));
             cf[nHalfFIR + j] *= WindowWt;       // Apply the Kaiser window to upper half
             cf[nHalfFIR - j] = cf[nHalfFIR +j]; // and create the lower half
        }
    
        // Find the biggest to decide the scaling factor for the FIR filter.
        // Then we will scale the coefficients according to scale16
        float32_t cfmax = 0.0f;
        for (j=0; j<=nHalfFIR; j++)      // 0 to 99  for nFIRdesign=199
            if (cfmax < fabsf(cf[j])) cfmax=fabsf(cf[j]);
        // scale16 is a float number, such as 16384.0, that sets the maximum +/- 
        // value for coefficients.  This is a complex subject that needs more discussion
        // than we can put here.  The following scales the coefficients and converts to 16 bit:
        for (j=0; j<nFIRdesign; j++)
            cf16[j] = (int)(scale16*cf[j]/cfmax);
        // nFIRused id always even. nFIRdesign is always odd.  So add a zero
        cf16[nFIRdesign] = 0;
        // The following puts the numbers into the fir_inst structure
        arm_fir_init_q15(&fir_inst, nFIRused, (int16_t *)cf16, &StateQ15[0], (uint32_t)block_size);
        return 0;
    }
    
    /* Calculate response in dB.  Leave nFreq-point-result in array rdb[] that is supplied
     * by the calling .INO  See Parks and Burris, "Digital Filter Design," p27 (Type 1).
     */
    void AudioFilterEqualizer::getResponse(uint16_t nFreq, float32_t *rdb, float32_t scale16)  {
        uint16_t i, j;
        float32_t bt;
        float32_t piOnNfreq;
        uint16_t nHalfFIR;
        float32_t cf[nFIR];
    
        nHalfFIR = (nFIRdesign - 1)/2;  // If nFIRdesign=199, nHalfFIR=99
    
        for (i=0; i<nFIRdesign; i++)
           cf[i] = ((float32_t)cf16[i]) / scale16;
        piOnNfreq = MF_PI / (float32_t)nFreq;
        for (i=0; i<nFreq; i++) {
            bt = cf[nHalfFIR];          // Center coefficient
            for (j=0; j<nHalfFIR; j++)  // Add in the others twice, as they are symmetric
                bt += 2.0f*cf[j]*cosf(piOnNfreq*(float32_t)((nHalfFIR-j)*i));
            rdb[i] = 20.0f*log10f(fabsf(bt));     // Convert to dB
        }
    }
    And two simple .INO files to show usage follow
    Code:
    /* TestEqualizer2.ino  Bob Larkin 10 May 2020
     * This is a test of the Filter Equalizer for Teensy Audio.
     * It also tests the .getResponse() function for determining
     * the actual filter response in dB.
     * This version is for 16-bit Teensy Audio Library.
     * A float32 version is available.
     */
    
    #include "Audio.h"
    #include "AudioFilterEqualizer_I16.h"
    
    AudioInputI2S           i2s1;
    AudioSynthWaveformSine  sine1;      // Test signal
    AudioFilterEqualizer    equalize1;
    AudioRecordQueue        queue1;     // The LSB output
    AudioConnection         patchCord1(sine1,     0, equalize1, 0);
    AudioConnection         patchCord2(equalize1, 0, queue1,   0);
    
    // This 10-band octave band equalizer is set strangely in order to demonstrate the Equalizer
    float32_t fBand[] = {   40.0, 80.0, 160.0, 320.0, 640.0, 1280.0, 2560.0, 5120.0, 10240.0, 22058.5};
    float32_t dbBand[] = {10.0, 8.0,  6.0,  3.0,   -2.0,   0.0,    0.0,    6.0,    10.0,   -100};
    float32_t scaleCoeff = 16384.0f;    // Max allowed size of FIR coefficients, depends on equalizer use
    int16_t equalizeCoeffs[250];
    int16_t dt1[128];
    int16_t *pq1, *pd1;
    int16_t i;
    float32_t dBResponse[500];  // Show lots of detail for a plot
    
    void setup(void) {
      AudioMemory(10); 
      Serial.begin(300);  delay(1000);
      Serial.println("*** Test Audio Equalizer ***");
      // Sine wave is default +/- 8192 max/min
      sine1.frequency(1000.0f);
    
      // Initialize the equalizer with 10 bands, 200 FIR coefficients and -60 dB sidelobes, 16384 Max coefficient
      uint16_t eq = equalize1.equalizerNew(10, &fBand[0], &dbBand[0], 200, &equalizeCoeffs[0], 60, scaleCoeff);
      if (eq == ERR_EQ_BANDS)
        Serial.println("Equalizer failed: Invalid number of frequency bands.");
      else if (eq == ERR_EQ_SIDELOBES)
        Serial.println("Equalizer failed: Invalid sidelobe specification.");
      else if (eq == ERR_EQ_NFIR)
        Serial.println("Equalizer failed: Invalid number of FIR coefficients.");
      else
        Serial.println("Equalizer initialized successfully.");
    
      // Get frequency response in dB for 500 points, uniformly spaced from 0 to 21058 Hz
      // this is an AudioFilterEqualizer function called as
      //    void getResponse(uint16_t nFreq, float32_t *rdb, float32_t scale16);
      equalize1.getResponse(500, dBResponse, scaleCoeff);
      
      Serial.println("Freq Hz, Response dB");
      for(int16_t m=0; m<500; m++) {       // Print the response in Hz, dB, suitable for a spread sheet
        Serial.print((float32_t)m * 22058.5f / 500.0f);
        Serial.print(",");
        Serial.println(dBResponse[m], 7);
      }
      i = -10;
    }
    
    void loop(void) {
      if(i<0) i++;  // Get past startup data
      if(i==0) queue1.begin();
      
      // Print a 128 sample of the filtered output with sine1 as an input.
      // This "if" will be active for i == 0
      if (queue1.available() >= 1  &&  i >= 0) {
          pq1 = queue1.readBuffer();
          pd1 = &dt1[0];
          for(uint k=0; k<128; k++)
             *pd1++ = *pq1++;
          i=1;   // Only collect 1 block
          queue1.freeBuffer();
          queue1.end();  // No more data to queue1
      }
      if(i == 1) { 
        // Uncomment the next 3 lines to printout a sample of the sine wave.
        Serial.println("128 Samples:  ");
        for (uint k=0; k<128; k++)
           Serial.println (dt1[k]); 
        i = 2;  
      }
    }
    Code:
    /* TestEqualizer2Audio.ino  Bob Larkin 10 May 2020
     * This is a test of the Filter Equalizer for Teensy Audio.
     * Runs two different equalizers, switching every 5 seconds to demonstrate
     * dynamic filter changing.
     */
    
    #include "Audio.h"
    #include "AudioFilterEqualizer_I16.h"
    
    // Signals from left ADC to Equalizer to left DAC using Teensy Audio Adaptor
    AudioInputI2S           i2sIn;
    AudioFilterEqualizer    equalize1;
    AudioOutputI2S          i2sOut;
    AudioConnection         patchCord1(i2sIn,     0, equalize1, 0);
    AudioConnection         patchCord2(equalize1, 0, i2sOut,    0);
    AudioControlSGTL5000    codec;
    
    // Some sort of octave band equalizer as a one alternative, 10 bands
    float32_t fBand1[] = {    40.0, 80.0, 160.0, 320.0, 640.0, 1280.0, 2560.0, 5120.0, 10240.0, 22058.5};
    float32_t dbBand1[] = {10.0,  2.0, -2.0,  -5.0, -2.0,  -4.0,   -20.0,   6.0,    10.0,    -100};
    float32_t scaleCoeff = 32765.0f;    // Max allowed size of FIR coefficients; a smidge under 2^15
    
    // To contrast, put a strange bandpass filter as an alternative, 5 bands
    float32_t fBand2[] = {    300.0, 500.0, 800.0, 1000.0, 22058.5};
    float32_t dbBand2[] = {-60.0,  0.0,  -20.0,   0.0,  -60.0};
    
    int16_t equalizeCoeffs[200];
    int16_t k = 0;
    
    void setup(void) {
      AudioMemory(10); 
      Serial.begin(300);  delay(1000);
      Serial.println("*** Test Audio Equalizer ***");
      codec.enable();
      codec.inputSelect(AUDIO_INPUT_LINEIN); 
    
      // Initialize the equalizer with 10 bands, fBand1[] 199 FIR coefficients
      // -65 dB sidelobes, 16384 Max coefficient value
      uint16_t eq = equalize1.equalizerNew(10, &fBand1[0], &dbBand1[0], 200, &equalizeCoeffs[0], 60, scaleCoeff);
      if (eq == ERR_EQ_BANDS)
        Serial.println("Equalizer failed: Invalid number of frequency bands.");
      else if (eq == ERR_EQ_SIDELOBES)
        Serial.println("Equalizer failed: Invalid sidelobe specification.");
      else if (eq == ERR_EQ_NFIR)
        Serial.println("Equalizer failed: Invalid number of FIR coefficients.");
      else
        Serial.println("Equalizer initialized successfully.");
        for (k=0; k<200; k++) Serial.println(equalizeCoeffs[k]);
    }
    
    void loop(void) {
      // Change between two filters every 5 seconds.
      
      // To run with just the 10-band equalizer, comment out the entire loop with "/* ... */"
    
      if(k == 0) {
         k = 1;
         equalize1.equalizerNew(10, &fBand1[0], &dbBand1[0], 200, &equalizeCoeffs[0], 65, scaleCoeff);
      }
      else  {  // Swap back and forth
         k = 0;
         equalize1.equalizerNew(5, &fBand2[0], &dbBand2[0], 190, &equalizeCoeffs[0], 40, scaleCoeff);
      }
      delay(5000);   // Interrupts will keep the audio going
    }
    Bug or success reports on any of this would be appreciated would be appreciated on any of this. It is all a work in progress.

    Thanks, Bob

  2. #2
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Also, here is a frequency response for a octave band equalizer.
    Click image for larger version. 

Name:	OctaveBandEqualizer1.gif 
Views:	25 
Size:	50.7 KB 
ID:	20169

  3. #3
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    629
    Hi Bob,

    excellent work! Very nice and easy to use! As you requested, some first observations of a test with Teensy 4.1 & Teensy audio shield rev. D:

    * I had to adjust the input and volume levels of the codec
    * I added stereo support
    * lots of drizzling noise could be eliminated by setting the audio board codec to disable highpass filter (but that has nothing to do with your filter object, but with the hardware)
    * after that, the equalizer showed very good function and usage !
    * there is still some very silent audible noise present in silent passages --> would be good to compare this version with the float version in order to rule out effects of using int arithmetic!?
    * switching coeffs makes a faint click sound, probably unavoidable

    I wonder whether it would be good to substitute the FIR function with a floating-point FFT-iFFT convolution object for saving processor cycles?

    T4.1 (@600MHz) processor usage (stereo) is quite high
    Code:
    Proc = 6.21% (6.21%),  Mem = 4 (5)
    I put together the changes I mentioned above for your second example here:

    Code:
    /* TestEqualizer2Audio.ino  Bob Larkin 10 May 2020
     * This is a test of the Filter Equalizer for Teensy Audio.
     * Runs two different equalizers, switching every 5 seconds to demonstrate
     * dynamic filter changing.
     */
    
    #include <Audio.h>
    #include <AudioFilterEqualizer_I16.h>
    
    // Signals from left ADC to Equalizer to left DAC using Teensy Audio Adaptor
    AudioInputI2S           i2sIn;
    AudioFilterEqualizer    equalize1;
    AudioFilterEqualizer    equalize2;
    AudioOutputI2S          i2sOut;
    AudioConnection         patchCord1(i2sIn,     0, equalize1, 0);
    AudioConnection         patchCord1b(i2sIn,    1, equalize2, 0);
    AudioConnection         patchCord2(equalize1, 0, i2sOut,    0);
    AudioConnection         patchCord3(equalize2, 0, i2sOut,    1);
    AudioControlSGTL5000    codec;
    
    // Some sort of octave band equalizer as a one alternative, 10 bands
    float32_t fBand1[] = {    40.0, 80.0, 160.0, 320.0, 640.0, 1280.0, 2560.0, 5120.0, 10240.0, 22058.5};
    float32_t dbBand1[] = {10.0,  2.0, -2.0,  -5.0, -2.0,  -4.0,   -10.0,   6.0,    10.0,    -100};
    float32_t scaleCoeff = 32765.0f;    // Max allowed size of FIR coefficients; a smidge under 2^15
    
    // To contrast, put a strange bandpass filter as an alternative, 5 bands
    float32_t fBand2[] = {    300.0, 500.0, 800.0, 1000.0, 22058.5};
    float32_t dbBand2[] = {-60.0,  0.0,  -20.0,   0.0,  -60.0};
    
    int16_t equalizeCoeffs[200];
    int16_t k = 0;
    
    void setup(void) {
      AudioMemory(10); 
      Serial.begin(300);  delay(1000);
      Serial.println("*** Test Audio Equalizer ***");
      codec.enable();
      codec.inputSelect(AUDIO_INPUT_LINEIN); 
      codec.adcHighPassFilterDisable(); // necessary to suppress noise
      codec.lineInLevel(2);
      codec.volume(0.8);
    
      // Initialize the equalizer with 10 bands, fBand1[] 199 FIR coefficients
      // -65 dB sidelobes, 16384 Max coefficient value
      uint16_t eq = equalize1.equalizerNew(10, &fBand1[0], &dbBand1[0], 200, &equalizeCoeffs[0], 60, scaleCoeff);
      if (eq == ERR_EQ_BANDS)
        Serial.println("Equalizer failed: Invalid number of frequency bands.");
      else if (eq == ERR_EQ_SIDELOBES)
        Serial.println("Equalizer failed: Invalid sidelobe specification.");
      else if (eq == ERR_EQ_NFIR)
        Serial.println("Equalizer failed: Invalid number of FIR coefficients.");
      else
        Serial.println("Equalizer initialized successfully.");
      eq = equalize2.equalizerNew(10, &fBand1[0], &dbBand1[0], 200, &equalizeCoeffs[0], 60, scaleCoeff);
      if (eq == ERR_EQ_BANDS)
        Serial.println("Equalizer failed: Invalid number of frequency bands.");
      else if (eq == ERR_EQ_SIDELOBES)
        Serial.println("Equalizer failed: Invalid sidelobe specification.");
      else if (eq == ERR_EQ_NFIR)
        Serial.println("Equalizer failed: Invalid number of FIR coefficients.");
      else
        Serial.println("Equalizer initialized successfully.");
        for (k=0; k<200; k++) Serial.println(equalizeCoeffs[k]);
    }
    
    void loop(void) {
      // Change between two filters every 5 seconds.
      
      // To run with just the 10-band equalizer, comment out the entire loop with "/* ... */"
    
      if(k == 0) {
         k = 1;
         equalize1.equalizerNew(10, &fBand1[0], &dbBand1[0], 200, &equalizeCoeffs[0], 65, scaleCoeff);
         equalize2.equalizerNew(10, &fBand1[0], &dbBand1[0], 200, &equalizeCoeffs[0], 65, scaleCoeff);
      }
      else  {  // Swap back and forth
          k = 0;
          equalize1.equalizerNew(5, &fBand2[0], &dbBand2[0], 190, &equalizeCoeffs[0], 40, scaleCoeff);
          equalize2.equalizerNew(5, &fBand2[0], &dbBand2[0], 190, &equalizeCoeffs[0], 40, scaleCoeff);
      }
      delay(10000);   // Interrupts will keep the audio going
        Serial.print("Proc = ");
        Serial.print(AudioProcessorUsage());
        Serial.print(" (");
        Serial.print(AudioProcessorUsageMax());
        Serial.print("),  Mem = ");
        Serial.print(AudioMemoryUsage());
        Serial.print(" (");
        Serial.print(AudioMemoryUsageMax());
        Serial.println(")");
        AudioProcessorUsageMaxReset();
        AudioMemoryUsageMaxReset();
    }

  4. #4
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Thanks, Frank. And as always, great comments. Thanks for running it on the T4. It was on my list, but I wanted to be sure that 3.6 was OK.

    It would be very useful and interesting to have a convolution version. I would want it to be a separate block to allow good comparisons.

    And the F32 version is running fine I won't post it to this thread, as it gets confusing as to which is which, but later today (Oregon time) I will get it posted separately to allow comparisons. A side issue with the F32 equalizer is that my measurements have it running quite a bit faster than the Q15. I have not fully understood that.

    73, Bob

  5. #5
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,919
    If your measurements are right it could be because the FPU adds a lot more float registers which can help the compiler to optimize it more.
    The FPU is very fast.

  6. #6
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Frank B here are microseconds, for F32 and Q15, for a 128 coefficient FIR with 128 updates. I played with this (see INO below) and it seems to be consistent and probably correct.

    F32 ARM FIR microseconds = 326
    Q15 ARM FIR microseconds = 746
    This is more than twice as fast for the F32. I looked at the C code for both, and they seem similar with emphasis on un-rolling the loops to get speed.

    I also ran simple, manually constructed nested loops to see how they did. I did not worry about getting into the right spot on the data, but did keep the right number of multiply and accumulates. This was interesting as the F32 code was quite a bit slower, but not the Q15 integer:
    F32 Manual FIR microseconds = 738
    Q15 Manual FIR microseconds = 827

    The test program was simple enough to not have errors, but maybe someone will find one!!
    Code:
    // Speed test of arm FIR for F32 and Q15
    #include <Audio.h>
    #include <arm_math.h>
    // #include <arm_const_structs.h>
    
    elapsedMicros tElapse;
    uint32_t t1, t2; 
    // *************** F32 *******************
    float32_t data[128];
    float32_t coef[128];
    float32_t dout[128];
    float32_t StateF32[256];
    arm_fir_instance_f32 fir_inst;
    
    // *************** Q15 ********************
    int16_t idata[128];
    int16_t icoef[128];
    int16_t idout[128];
    int16_t iStateF32[256];
    arm_fir_instance_q15 ifir_inst;
    
    void setup(void) {
        int i;
        Serial.begin(300); delay(1000);
    
        // For F32
        for (i = 0; i<128; i++)  data[i] = 0.1f+0.0001f*(float32_t)i;
        for (i = 0; i<128; i++)  coef[i] = 0.1f+0.00001f*(float32_t)i;
        arm_fir_init_f32(&fir_inst, 128, coef, &StateF32[0], 128);
    
        // For Q15
        for (i = 0; i<128; i++)  idata[i] = 10+i;
        for (i = 0; i<128; i++)  icoef[i] = 50+i;
        arm_fir_init_q15(&ifir_inst, 128, icoef, &iStateF32[0], 128);
    }
    
    void loop(void) {
        // ********************  F32  *********************************
        // Float32 ARM:
        t1 = tElapse;
        arm_fir_f32(&fir_inst, data, dout, 128);
        t2 = tElapse;
        Serial.print("F32 ARM FIR microseconds =   ");  Serial.println (t2 - t1);
    
        // Float32 manual, wrong but with the right number of multiply and adds
        t1 = tElapse;
        for(int j=0; j<128; j++) {
            dout[j] = 0.0f;
            for(int k=0; k<128; k++) {
               dout[j] += data[k] * coef[k];
            }
        }
        t2 = tElapse;
        Serial.print("F32 Manual FIR microseconds = ");  Serial.println (t2 - t1);
    
        // ***********************  Q15  ********************************
        // Integer 16 ARM:
        t1 = tElapse;
        arm_fir_q15(&ifir_inst, idata, idout, 128);
        t2 = tElapse;
        Serial.print("Q15 ARM FIR microseconds =   ");  Serial.println (t2 - t1);
    
        // Float32 manual, wrong but with the right number of multiply and adds
        t1 = tElapse;
        for(int j=0; j<128; j++) {
            idout[j] = 0.0f;
            for(int k=0; k<128; k++) {
               idout[j] += idata[k] * icoef[k];
            }
        }
        t2 = tElapse;
        Serial.print("Q15 Manual FIR microseconds = ");  Serial.println (t2 - t1);
        delay(2000);
    }
    Float 32 Routines for the Equalizer Below is a link to F32 versions of the equalizer. As most know, these require Chip's AudioStreaming arrangement for floating point. The zip includes just the .h and .cpp along with a support math set of files. This is not an Arduino library zip, I have a lot of organizing to get a whole group of F32 objects ready to go, so this is just the Equalizer, for those like Frank DD4WH, wanting to play with it. The Zip is at www.janbob.com/electron/teensy/AudioFilterEqualizer_F32.zip

    Bob

  7. #7
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Right at the bottom of the test code, I asked for a type conversion of 0.0f to int. idout[j] = 0.0f; should have been idout[j] = 0; The compiler knew what I meant, and so the answers didn't change when I fixed it.

  8. #8
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    629
    Quote Originally Posted by Bob Larkin View Post
    Float 32 Routines for the Equalizer Below is a link to F32 versions of the equalizer. As most know, these require Chip's AudioStreaming arrangement for floating point. The zip includes just the .h and .cpp along with a support math set of files. This is not an Arduino library zip, I have a lot of organizing to get a whole group of F32 objects ready to go, so this is just the Equalizer, for those like Frank DD4WH, wanting to play with it. The Zip is at www.janbob.com/electron/teensy/AudioFilterEqualizer_F32.zip

    Bob
    Hi Bob, thanks for the code on your homepage!
    However, I cannot manage to run it. I have the following problems:
    * I copied OpenAudio_ArduinoLibrary from chipaudettes github to my Arduino lib folder [however, I realized, that the lib is from 2017]
    * I copied the four files from your zip to the OpenAudio_ArduinoLibrary
    * I added reference to the *.h files to the OpenAudio_ArduinoLibrary.h
    * I added an #include <OpenAudio_ArduinoLibrary.h> to your example sketch
    * I modified the I2S input/output objects in the precompiler code
    * I modified "AudioFilterEqualizer" in the code to "AudioFilterEqualizer_F32"

    I get loads of errors/missing library code etc.

    Am I still missing something? Are you using the same OpenAudio_ArduinoLibrary as I am?

  9. #9
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Hi Frank - I just re-checked with, I believe, the setup you describe. The AudioFilterEqualizer_F32 compiled and ran. I believe my OpenAudio_Arduino is per Chip's https://github.com/chipaudette/OpenAudio_ArduinoLibrary

    What you describe sounds like what I experienced when I tried to use the Tympan Library. There, the collisions were primarily between SdFat_Gre #defines and the Teensy SD Library. The OpenAudio_Arduino does not have those library elements.

    The missing library issue might be the Bessel function. I included that directly into the private area of the Q15 equalizer, to be simple. But for F32 it is in a re-usable library mathDSP_F32.h and .cpp Those were in the Zip.

    I forgot to include an INO that runs for me. It is now at http://www.janbob.com/electron/teens...Equalizer1.ino

    Otherwise, we need more detail on the errors.

    Bob

  10. #10
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    629
    Hi Bob,

    really struggling here with the OpenAudio lib . . . there are several issues:

    * the codec code for tvl320 seems redundant, it is present in the standard audio lib AND in the OpenAudio lib, one of them has to be eliminated
    * I used a T4.1 . . . ;-). So it is pretty clear that code from 2017 cannot run with that, my fault!
    * with T3.6, I can compile! However, not tested yet
    * with T4.1, I need to rework the I2S input and output routines, because T4.x works quite differently in that respect. I will try that tomorrow (copy I2S T4.x code from original audio lib to OpenAudio lib) and report back!

  11. #11
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Hi Frank - Yes, I get it now! Fundamentally, the AudioInputI2S_F32 and AudioOutputI2S_F32 are not converted yet to handle T4.x. I had done all my testing on T3.6. I was able to get it to compile and run with a T4.0 and a rev D Audio Adaptor. Something like:
    * Remove input_i2s_f32.h and .cpp
    * Remove output_i2s_f32.h and .cpp
    and because of need for simple fixes
    * Remove synth_pinknoise_f32.h and .cpp
    * Remove synth_whitenoise_f32.h and .cpp

    *Then use the regular Teensy integer I2S input and output objects along with Chip's I16toF32 and F32toI16 conversion routines
    Code:
    AudioInputI2S            i2sIn;
    AudioConvert_I16toF32    convertItoF1;
    AudioConvert_I16toF32    convertItoF2;
    AudioFilterEqualizer_F32 equalize1;
    AudioFilterEqualizer_F32 equalize2;
    AudioConvert_F32toI16    convertFtoI1;
    AudioConvert_F32toI16    convertFtoI2;
    AudioOutputI2S           i2sOut;
    AudioConnection        patchCord1(i2sIn,        0, convertItoF1, 0);   
    AudioConnection        patchCord3(i2sIn,        1, convertItoF2, 0);
    AudioConnection_F32    patchCord5(convertItoF1, 0, equalize1,    0);
    AudioConnection_F32    patchCord7(convertItoF2, 0, equalize2,    0);
    AudioConnection_F32    patchCord9(equalize1,    0, convertFtoI1, 0);
    AudioConnection_F32    patchCordB(equalize2,    0, convertFtoI2, 0);
    AudioConnection        patchCordD(convertFtoI1, 0, i2sOut,       0);   
    AudioConnection        patchCordF(convertFtoI2, 0, i2sOut,       1);
    AudioControlSGTL5000   codec;
    I think this works. The resources are showing, first a 100 tap and and then a 200 tap FIR with stereo using basically your modified INO
    Code:
    Proc = 4.21 (7.24),  Mem = 4 (4)
    Proc = 7.24 (7.25),  Mem = 4 (4)
    It needs work, but this shows promise. Bob

  12. #12
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Going back to the processor load question, I looked at the T4.0 F32 numbers and they are interesting. First though, one should not try to generalize from these measurements. These numbers are based on measuring the microseconds required to update N FIR taps 128 times in a block. This activity is intense with "multiply and add" operations that are common with DSP and thus optimized in processors/compilers. I used 3 values of N like 30, 100, 200 and fit a straight line through the data. It was VERY close to a straight line, so this seems valid. The slope of the line gives the number of microseconds per tap, i.e., about the number of microseconds for 128 multiply and accumulates. The overhead is the intercept of the straight line and it is relatively small, in all cases. Here are the x128 data point update() times:
    T3.6, Fixed Point 5.6 microseconds per FIR tap
    T3.6, Floating Point 2.5 microseconds per FIR tap
    T4.0, Floating Point 0.44 microseconds per FIR tap
    This really useful speed increase shows that it is not just processor speed involved. This is seen more easily by dividing the microseconds by 128 to get (after inverting) roughly the number of multiply and adds possible per second.
    T3.6, Fixed Point, 22.8E6 / sec
    T3.6, Floating Point 51E6 / sec
    T4.0, Floating Point 290E6 / sec
    and if you divide that into the the processor instruction speed, you get a number that looks like instruction cycles per multiply and add:
    T3.6, Fixed Point, about 7
    T3.6, Floating Point about 3
    T4.0, Floating Point about 2
    Or so it seems. I left out the details here to see the big picture, but I'm happy to share if anyone is in need.
    Bob

  13. #13
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,919
    Here are the official numbers for floating point (That document is superseeded - I just found no newer one)
    http://infocenter.arm.com/help/index.../BEHJADED.html

    float multiply-add needs 3 cycles

  14. #14
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    I left out T4.0 fixed point. It is 0.41 microseconds per FIR tap, essentially the same as for Floating Point. Dazzling for both...

  15. #15
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,919
    found the list for non-fpu instructions:
    http://infocenter.arm.com/help/index.../BABBCJII.html

    Multiply accumulate : 1 cycle

    So there must be other effects. Esp if we keep in mind, that the T4 can do 2 instructions per cycle...

  16. #16
    Senior Member
    Join Date
    Dec 2016
    Posts
    131
    Thanks, Frank B. That 3 cycles (T3.6 Cortex-M4) shows that the ARM FIR routine is getting about all there is!

    For the T4.0 some combination of hardware, programmer and compiler has caught the fixed point times up to the float and both are very good.

  17. #17
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    454
    My desktop does a 65536 tap fir filter on 44 kHz stereo with 1.5% cpu (with BruteFIR). But I take it that a T4 couldn't do it, even at 100% cpu?

  18. #18
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    22,250
    Quote Originally Posted by jonr View Post
    But I take it that a T4 couldn't do it, even at 100% cpu?
    Nope, at least not directly. Perhaps it could be done using FFT & IFFT? (my guess is that's what the PC software is really doing)

    As a quick test to measure the direct way, I edited this line in filter_fir.h:

    Code:
    #define FIR_MAX_COEFFS 200
    Then I ran this trivial test program to run a filter filled with random data and print the CPU usage.

    Code:
    #include <Audio.h>
    
    AudioInputI2S    audioInput;
    AudioFilterFIR   myFilter;
    AudioOutputI2S   audioOutput;
    AudioControlSGTL5000 audioShield;
    AudioConnection c1(audioInput, 0, myFilter, 0);
    AudioConnection c2(myFilter, 0, audioOutput, 0);
    AudioConnection c3(myFilter, 0, audioOutput, 1);
    
    
    const int myInput = AUDIO_INPUT_LINEIN;
    
    int16_t coeffs[11000];
    
    void setup() {
      while (!Serial) ; // wait for serial monitor
      Serial.println("FIR filter CPU usage test\n");
      AudioMemory(20);
      audioShield.enable();
      audioShield.inputSelect(myInput);
      audioShield.volume(0.5);
      for (unsigned int i=0; i < sizeof(coeffs)/2; i++) {
        coeffs[i] = random();
      }
      myFilter.begin(coeffs, sizeof(coeffs)/2);
    }
    
    void loop() {
      static elapsedMillis msec;
      if (msec > 1000) {
        Serial.printf("CPU %.2f, Max %.2f\n",
          AudioProcessorUsage(), AudioProcessorUsageMax());
        msec = 0;
      }
    }
    Running Teensy 4.1 at 600 MHz, 11000 coefficients results in 97.5% CPU usage.

    Running at 912 MHz overclock, I could increase to 16800 coefficients for 97.9% CPU usage.

    Might also be worth mentioning the obvious delay effect. 16800 coefficients at 44100 Hz sample rate means the filter output is delayed by 0.38 seconds.

  19. #19
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    454
    Thanks. Some background on it is here: https://torger.se/anders/brutefir.html#bruteconv

    I use it for digital room correction (DRC) of speakers - which I highly recommend to anyone interested in sound quality. I use room equalization wizard (REW) to generate the impulse response then DRC (http://drc-fir.sourceforge.net/) to generate correction filters from the impulse response. There is some delay (eg, 100 msec) when processing with brutefir, but not an issue for playback.

    Even if one doesn't use convolution, REW is quite useful for equalizing speakers. It supports various equalizers - possibly it could generate filters for a teensy to apply.

  20. #20
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    454
    Looks like it's an application that needs floating point SIMD instructions.

  21. #21
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    629
    Uniformly Partitioned Convolution may be useful to reduce latency for large FIR filters.

    You can find Teensy 4 code for that in my github, the link is here:

    https://forum.pjrc.com/threads/57267...l=1#post220899

    If I remember correctly, it can be used with up to 20,000 coeffs (memory is the limit, NOT the MCU speed ;-)). [65,535 taps is not possible with this method, sorry :-)]

    Just provide the impulse response. Brian Millier has measured a latency of 15ms for an 8192tap FIR filter, but the latency should always be the same regardless of how many taps you use, because the code always uses a 128 sample block.

    Just be sure you use minimum phase coefficients to achieve that kind of latency, otherwise the latency will be no. of taps / 2.

    The theory is explained here with a bunch of references:

    https://github.com/df8oe/UHSDR/wiki/...ed-Convolution

  22. #22
    Senior Member
    Join Date
    Oct 2015
    Location
    Vermont, USA
    Posts
    252
    Regarding this side-track about super-long FIRs...why would someone want a 65536-tap FIR for audio processing? Even running at a 96 kHz sample rate, this gives a resolution of about 1.5 Hz (or maybe closer to 2.5 Hz if using a traditional FIR design method employing windowing). That is some seriously high resolution. Except for tuning your system's response in the deep bass (say, < 60 Hz), that seems super excessive to me.

    The downside of an overly-long FIR is that you might introduce audible smearing the time domain. If you don't have radical changes in response as a function of frequency, you won't have this problem, but it's still a risk. Just having the risk makes would make me uncomfortable in pursuing a super-long FIR for room correction. I tend to dislike any situation where my "cure" (ie, a long FIR) has the possibility of being more detrimental than "disease" (ie, the imprecision in one's room correction resulting from using a shorter FIR).

    Clearly I don't understand the goals of a super-long FIR. I'm not doubting that one can find them useful...I'm legitimately curious as to why you feel that these super-long FIRs are superior to shorter FIRs for your application?

    (Bonus points if your explanation also says why a bank of 1/3rd-octave IIR filters would also be inadequate!)

    Chip

  23. #23
    Senior Member
    Join Date
    Oct 2015
    Location
    Vermont, USA
    Posts
    252
    To partly answer my own question, I know that long FIR filters can have a role in creating reverb/echo effects. For example, if you want to go out and actually measure the reverb/echo in a real physical space (like a church or concert hall), you can capture that information as an impulse response. You can choose to apply that impulse response by using a super-long FIR. For many physical spaces, even 65K taps wouldn't be long enough! So, yeah, I get the need for super-long FIRs in that application.

    Similarly, simulating mic'd guitar cabinets...that's often done these days by measuring the impulse responses of an actual guitar cabinet with an actual microphone. Like with reverb/echo, this impulse response could be applied as an FIR. In the case of simulating a mic'd guitar cabinet, 65K taps seems a bit long, but I have heard of people doing fairly long FIRs for this purpose, so I can get that.

    But super-long FIRs for room correction? Very curious to hear the thinking!

    Chip

  24. #24
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    629
    Chip,

    you already answered part of your own question. Some add-ons from my personal view:

    * commercially distributed guitar cabinet impulses have different lengths, some of them up to 0.5 seconds, and they often have higher sample rates than standard 44.1kHz thus further increasing the number of coeffs. And, you really need partitioned convolution (and minimum phase coeffs!) to filter that with low latency (< 20msec)
    * as you already said, reverb IR have much higher length, in the order of some seconds, so these IRs are even longer (and cannot yet be played with the T4, because of memory constraints: I tried using PSRAM, but it is not fast enough in order to be useable for low latency partitioned convolution filters)
    * for Software Defined Radio, I need steep skirt filtering, but that is in the order of 4096 or 8192 coeffs, anything larger is overkill. So no need for superlarge filters there
    * I also see the interest in long FIR filter performance of the T4 as a test of performance in terms of CPU power and memory (without having a particular application in mind), and it is (for me) also educational to see the large advances (in terms of CPU load) of convolution filtering compared to classical FIRs

    But I think we are getting off the track of this thread.

    So, for the Audio Equalizer by Bob Larkin and in general for your OpenAudioLib, it would be very handy to have a FIR convolution filter block (well, maybe two: one real and one complex) in order to substitute the standard FIR filtering with convolution filtering. As far as I know from the literature and from my own experience, convolution is computationally less intense whenever you use a number of taps less than about 60 (but that figure depends largely on the lib/optimizations you use). BUT (and thats the reason why your OpenAudioLib is really awesome!) we need float processing, it cannot be done in fixed point. And maybe it would also be worth thinking about using a different FFT lib, because the CMSIS lib is restricted to a maximum of 4096 point FFT.

    All the best,
    Frank DD4WH

  25. #25
    Senior Member
    Join Date
    Jul 2020
    Posts
    135
    That's the complex FFT limit, you can synthesize a real FFT of size N using a complex FFT of size N/2, so 8192 point real FFT can be done,
    though I'm not sure arm_rfft_fast_f32 goes above 4096, in theory it should be able to use 4096 sized complex call and thus handle 8192.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •