FIR taps and use of CMSIS arm_fir_q15

Status
Not open for further replies.

PhilGraham

New member
Hey all,

  1. New here, haven't even ordered hardware yet.
  2. I did some forum searching before asking the question, to not reinvent the wheel
  3. I posed this question to Paul back during Teensy 3.6 Kickstarter, and he suggested I post it here.

Background:
A very common FIR filter size used in the pro audio world is 384 taps at 48kHz sample rate with 16 bit depth. I want to see if the Teensy 3.6 can process an FIR this big in real time, be it in integer or floating point. I have attached an Excel file with a representative set of taps (taken from a correction curve applied at a real event).

Questions:
  1. Can someone test these filter taps for me to see the processing overhead on a 3.6?
  2. Looking at the CMSIS documentation, there is a function "arm_fir_q15" which has a 64bit accumulator. Can I simply swap this into "Filter_Fir"? I'd rather not have to apply 8 bits of input scaling to keep the accumulator from wrapping.
  3. How much performance penalty is there for using "arm_fir_q15" versus "arm_fir_fast_q15"

Hello, and thanks in advance,
 

Attachments

  • 384 taps FIR.xls
    67.5 KB · Views: 185
Good news. It works. Well, at least CPU-wise. I didn't pass any audio through the filter... just did a quick CPU usage test.

Well, first you need to edit this line in the library to increase the maximum filter size:

https://github.com/PaulStoffregen/Audio/blob/master/filter_fir.h#L34

I copied your 384 tap filter into this simple test:

Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

// GUItool: begin automatically generated code
AudioInputI2S            i2s1;           //xy=218,105
AudioFilterFIR           fir1;           //xy=379,110
AudioOutputI2S           i2s2;           //xy=545,112
AudioConnection          patchCord1(i2s1, 0, fir1, 0);
AudioConnection          patchCord2(fir1, 0, i2s2, 0);
AudioConnection          patchCord3(fir1, 0, i2s2, 1);
AudioControlSGTL5000     sgtl5000_1;     //xy=359,183
// GUItool: end automatically generated code

const int myInput = AUDIO_INPUT_LINEIN;
//const int myInput = AUDIO_INPUT_MIC;

// Huge 384 tap filter from:
// https://forum.pjrc.com/threads/41114-FIR-taps-and-use-of-CMSIS-arm_fir_q15
const int16_t BigFilterResponse[384] = {
  17525, 15233, -8989, 157, 2735, 987, 5595, -1294, 313, 3421,
  -68, -2135, 783, 1939, -447, -870, -227, -1084, -878, 425,
  227, -363, -122, -415, -700, 186, 577, -242, -540, -149,
  -129, -137, 266, 651, 787, 556, 107, 164, 655, 730,
  474, 312, 43, -84, 245, 288, -166, -169, 203, 61,
  -228, -8, 185, -43, -127, 191, 403, 163, -137, 65,
  503, 487, 244, 382, 495, 130, -81, 215, 368, 126,
  -58, -54, -25, 24, 83, 122, 144, 75, -14, 76,
  197, 122, 55, 139, 157, 90, 113, 139, 61, -13,
  -36, -41, -17, -16, -67, -89, -106, -172, -167, -80,
  -106, -229, -264, -219, -189, -171, -176, -197, -185, -174,
  -175, -110, -44, -89, -130, -62, -5, -16, -5, 21,
  16, 19, 41, 55, 54, 21, -29, -32, -19, -50,
  -66, -44, -70, -120, -99, -63, -85, -114, -119, -124,
  -121, -111, -112, -102, -87, -92, -86, -63, -75, -102,
  -96, -99, -124, -118, -88, -83, -91, -94, -95, -79,
  -55, -59, -63, -37, -23, -25, -5, 14, 12, 18,
  35, 44, 51, 56, 52, 61, 71, 60, 59, 85,
  86, 61, 59, 68, 61, 57, 63, 63, 63, 68,
  71, 80, 85, 72, 69, 91, 97, 84, 85, 90,
  80, 75, 78, 73, 70, 66, 52, 47, 53, 49,
  45, 52, 49, 37, 39, 43, 33, 23, 21, 22,
  29, 29, 16, 13, 18, 5, -9, 1, 8, -1,
  1, 8, 4, 0, 2, 0, -3, -6, -10, -6,
  -1, -9, -13, -2, -4, -18, -16, -8, -15, -22,
  -22, -26, -31, -31, -29, -23, -17, -21, -20, -6,
  0, -1, 9, 17, 15, 21, 33, 36, 39, 47,
  46, 41, 44, 46, 46, 49, 49, 46, 46, 43,
  35, 38, 41, 30, 23, 30, 29, 21, 20, 22,
  21, 22, 18, 12, 18, 22, 10, 5, 12, 9,
  -2, -3, 1, -4, -13, -18, -16, -17, -27, -32,
  -27, -31, -44, -42, -32, -37, -42, -36, -38, -44,
  -39, -36, -43, -43, -40, -43, -46, -49, -53, -50,
  -48, -54, -52, -41, -41, -44, -34, -29, -35, -30,
  -23, -29, -26, -18, -25, -26, -15, -20, -30, -17,
  -10, -25, -25, -11, -17, -26, -14, -9, -20, -20,
  -10, -13, -21, -23, -19, -15, -20, -27, -18, -6,
  -16, -24, -9, -3, -14, -13, -2, -6, -10, -4,
  -3, -8, -4, -5};


void setup() {
  // Audio connections require memory to work.  For more
  // detailed information, see the MemoryAndCpuUsage example
  AudioMemory(12);

  // Enable the audio shield, select input, and enable output
  sgtl5000_1.enable();
  sgtl5000_1.inputSelect(myInput);
  sgtl5000_1.volume(0.5);

  // Start the FIR filter
  fir1.begin(BigFilterResponse, 384);
}

elapsedMillis msec=0;

void loop() {
  if (msec > 2000) {
    msec = 0;
    Serial.print("CPU = ");
    Serial.print(AudioProcessorUsage());
    Serial.print(" (");    
    Serial.print(AudioProcessorUsageMax());
    Serial.println(" max)");
  }
}

When running on Teensy 3.6 at 180 MHz, it prints this:

Code:
CPU = 15.48 (15.61 max)

Here's what I get when running on Teensy 3.2 at 96 MHz:

Code:
CPU = 32.03 (32.18 max)

I didn't actually listen to the results... just ran this with the input & output disconnected. But as a quick sanity check, looks like both boards can run this filter without any trouble.
 
Those results were with arm_fir_fast_q15().

I tried switching to the slower arm_fir_q15(). Teensy 3.2 can't do 384 taps. I found 220 taps results in 92.5% CPU usage.

Teensy 3.6 is able to run 384 taps with arm_fir_q15(). It consumes 76.6% CPU.

Again, these are just blind (or deaf) CPU usage tests using whatever small amount of noise is present from the SGTL5000 ADC with its inputs unconnected. I didn't do anything to confirm the filter is really correct, not clipping, or actually sounds however it's supposed to sound.
 
I'm curious to hear your experience, if you actually try using these long filters.

Does arm_fir_q15() vs arm_fir_fast_q15() make any difference in practice? Or does it only matter if your impulse response is scaled so the output has gain? I really wish I had a lot more dev time available to really explore such things... but with a ton of other more urgent stuff to do, I really depend on feedback from people like you. ;)
 
I'm curious to hear your experience, if you actually try using these long filters.

Does arm_fir_q15() vs arm_fir_fast_q15() make any difference in practice? Or does it only matter if your impulse response is scaled so the output has gain? I really wish I had a lot more dev time available to really explore such things... but with a ton of other more urgent stuff to do, I really depend on feedback from people like you. ;)

Paul,

It will be a while before I can check concretely. I'll let you know. I ordered a 3.6 today. As for tap length, 384 is common in pro audio because the delay overhead is generally not a substantial concern.

I would be interested, given the high performance floating point FFT testing elsewhere on the forum, if a floating point implementation of FIR filters would allow even more taps.

For these loudspeaker applications there is generally a mix of IIR, minimum phase, and non minimum phase filtering needs.
 
Status
Not open for further replies.
Back
Top