Different-Range FFT Algorithm

Is your application music?
What speed do you need between samples?
If you can tolerate 70 ms per sample, duff's guitar tuner would be simple and accurate, down to 20Hz.

How accurate are you wanting to be?
How low (in Hz) do you need to detect?

And Happy New Year!
 
If I have a single-order analog low-pass filter (at around 1 kHz) on the signal before I pass it to the Teensy, would that obviate using a digital filter

No. A single order filter isn't even close to what you'd need. Maybe a 6th or 8th order filter would be in the right range. Maybe...
 
Okay. I think I'll investigate both FFT decimation per Mr. Stoffregen's suggestion and using the guitar tuner code. I'll just have to see what works better. Again, thanks!

Wow. I had no idea a single order filter would be such a bad idea. I guess digital really is the way to go, then. I don't want to physically Make an eighth-order low-pass filter haha.
 
@lychrel, I don't know that I got a response...
How precise (in percent, or in Hz) do you need to be, and in what low and high range (Hz)?
 
@lychrel, I don't know that I got a response...
How precise (in percent, or in Hz) do you need to be, and in what low and high range (Hz)?

I'd like to be, essentially, as precise as possible. (Definitely more precise than 47-Hz bins, though I guess I could interpolate...)

My range is 0 to 1 kHz.
 
I'd like to be, essentially, as precise as possible.

Have you run the File > Examples > Audio > Analysis > NoteFrequency example yet? How about with the your own sounds instead of Duff's guitar?
 
Have you run the File > Examples > Audio > Analysis > NoteFrequency example yet? How about with the your own sounds instead of Duff's guitar?

Actually, I got stuck running it because of a "no serial port name defined" error. I trouble-shot from this post, and that got rid of the error, but I still couldn't get any serial output.
 
@lychrel, I process the fft1024 results (a set of 512 bins, each representing a 43Hz chunk of the spectrum.)
With sign-wave input, I can extract pitch within a percent or two.

Signal complexity and cleanliness will affect performance.

Notefreq delivers great results as well. Usually more accurate, and slower.

Still having trouble getting serial debugging working?

Are you on a Mac?
 
but I still couldn't get any serial output.

Teensy becomes a serial devices a few seconds *after* the upload completes.

Before you upload again, turn off the "auto" mode in the Teensy Loader window. Then press the button on your Teensy. You'll see the Teensy Loader will detect the board, but programming won't happen because auto mode is off. During this time, Teensy will *NOT* be a serial device.

While Teensy is in this mode, click Arduino's Tools > Ports menu. Make a mental note (or a screenshot) of the ports. Those are the ones which are *not* Teensy. Windows is the hardest, since they all look the same with only the number being different.

Then click Upload again. After Teensy Loader completes the programming and you see "reboot ok", wait a few seconds. Click the Tools > Ports menu. You should see a new port appear (on Macs, you sometimes get 2 of them, one with "tty" and one with "cu"... either works from Arduino, but some other software only works with the "cu" one). Select the new port that you know is Teensy. Then open the serial monitor.
 
Hey everyone!

I'm trying to implement Paul's second suggestion in post #3 of this thread, except with a decimation of two. I've filtered the incoming data well enough, but I'm having trouble using the arm_cfft_radix4_q15() function in my sketch. I expect this has to do with my copying and pasting from .h/.cpp files without fully understanding the code. I hope someone can help me out!

Here's my arduino sketch:

Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>
// Found a suggestion by Paul in a forum to do this
extern "C" {
  #include <../Audio/utility/sqrt_integer.h>
  #include <../Audio/utility/dspinst.h>
}

// Filter the incoming audio from the mic
and pass it to the queue (also, play it through the mic)
AudioInputI2S            i2s1;           //xy=83,98
AudioFilterBiquad        biquad1;        //xy=273,100
AudioRecordQueue         queue1;         //xy=448,100
AudioOutputI2S           i2s2;           //xy=449,189
AudioConnection          patchCord1(i2s1, 0, biquad1, 0);
AudioConnection          patchCord2(biquad1, queue1);
AudioConnection          patchCord3(biquad1, 0, i2s2, 0);
AudioConnection          patchCord4(biquad1, 0, i2s2, 1);
AudioControlSGTL5000     sgtl5000_1;     //xy=355,391



void setup() {
  AudioMemory(60);

  sgtl5000_1.enable();  // Enable the audio shield
  sgtl5000_1.inputSelect(AUDIO_INPUT_MIC);
  sgtl5000_1.volume(0.5);

  // Butterworth filter, 12 db/octave
  biquad1.setLowpass(0, 800, 0.707);

  queue1.begin();

}

int i;
// initialize the buffer that will be passed to arm_cfft_radix4_q15()
int buffer2[1024];
// used for filling up buffer2
int count = 0;

void loop() {
  if (queue1.available() >= 2) {
    byte buffer[512];
    // Fetch 2 blocks from the audio library and copy
    // into a 512 byte buffer.  The Arduino SD library
    // is most efficient when full 512 byte sector size
    // writes are used.
    memcpy(buffer, queue1.readBuffer(), 256);
    queue1.freeBuffer();
    memcpy(buffer + 256, queue1.readBuffer(), 256);
    queue1.freeBuffer();
    
    // Copy every 2nd int from the buffer into new buffer
    for (i = 0; i < 512; i++) {
      // check if index is a multiple of 2
      if ((i % 2) == 0) {
      // if it is, copy the number into the new buffer, using new counting system
      buffer2[count] = buffer[i];
      count = count + 1;
      }
    }
    
    // Once new buffer fills, run it through arm_cfft_radix4_q15()
    if (count == 1024) {
      arm_cfft_radix4_q15(&fft_inst, buffer);
      for (int i=0; i < 512; i++) {
        uint32_t tmp = *((uint32_t *)buffer + i); // real & imag
        uint32_t magsq = multiply_16tx16t_add_16bx16b(tmp, tmp);
        output[i] = sqrt_uint32_approx(magsq);
      }
    }
  }
}

This yields the error:

Code:
BiquadFilter_ManualFFT_Test: In function 'void loop()':
BiquadFilter_ManualFFT_Test:69: error: 'fft_inst' was not declared in this scope
       arm_cfft_radix4_q15(&fft_inst, buffer);
                            ^
BiquadFilter_ManualFFT_Test:73: error: 'output' was not declared in this scope
         output[i] = sqrt_uint32_approx(magsq);
         ^
'fft_inst' was not declared in this scope

What must I defined for "fft_inst" and "output" in order to avoid these errors and get my code to work? I've been unable to find the origin of "fft_inst" in the github Audio Library, or understand it's function. I've checked the CMSIS documentation but I've found little help there.

Should I simply initialize "output" with

Code:
int output[];

?

Thanks in advance!

David
 
I don't see any call to the initialization function
arm_cfft_radix4_init_q15 ( arm_cfft_radix4_instance_q15 * S, uint16_t fftLen, uint8_t ifftFlag, uint8_t bitReverseFlag )
to establish an instance of the the arm_cfft_radix4_q15 and set up the FFT parameters.

For example, in the class declaration in analyze_fft1024.h you will find in the initialization:
arm_cfft_radix4_init_q15(&fft_inst, 1024, 0, 1);
and then in the private declarations at the bottom:
arm_cfft_radix4_instance_q15 fft_inst;

You need to do something similar in your code:

Code:
    blah
    arm_cfft_radix4_instance_q15 myFFT; // declare myFFT
    blah

    void setup(){
      blah
      blah
      arm_cfft_radix4_init_q15(&myFFT, 1024, 0, 1);    // initialize
      blah
    }

    void loop(){
      blah
      arm_cfft_radix4_q15(&myFFT, buffer);                // do it to it
      blah
    }
 
To add to my previous reply, here is a development sketch I put together for test purposes last month. Note that it uses a lot of my own library functions (not included), but you should get the idea of initializing the arm_math ffts..
Code:
// Prototyping sketch for quadrature spectrum analyzer using complex "grabber" 
// Creates a pair of quadrature sinusoids, grabs complex blocks of 256 samples, windows and uses
// arm_math fft, reorders the data, computes the log_2 of magsq, smooths the results with first-order IIR filter,
// and  displays spectrum -22.05kHz to +22.05kHz on graphics board.
// Derek Rowell   July 2016
// 
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>
#include "grabber_complex_256.h"
#include "sdrMath.h"
const int myInput = AUDIO_INPUT_LINEIN;
//
AudioInputI2S            audioInput; 
AudioSynthWaveformSine   I_input1;             // quadrature sinusoid 1         
AudioSynthWaveformSine   Q_input1;          
AudioSynthWaveformSine   I_input2;             // quadrature sinusoid 2         
AudioSynthWaveformSine   Q_input2; 
AudioMixer4              mixer1;
AudioMixer4              mixer2;
AudioGrabberComplex256   myData;
//
AudioConnection          c1(I_input1, 0, mixer1, 0);
AudioConnection          c2(I_input2, 0, mixer1, 1);
AudioConnection          c3(Q_input1, 0, mixer2, 0);
AudioConnection          c4(Q_input2, 0, mixer2, 1);
AudioConnection          c5(mixer1,   0, myData, 0);
AudioConnection          c6(mixer2,   0, myData, 1);
AudioControlSGTL5000     audioShield;
//
arm_cfft_radix2_instance_q15 myFFT;
int16_t FFTbuffer[512];
int16_t log2Mag[256];

//----------------------------------------------------------------------------
void setup() {
  Serial.begin(57600);
  delay(2000);
  audioShield.enable();
  AudioMemory(12);
  audioShield.volume(0.5);
  audioShield.inputSelect(myInput);
//
  uint16_t Frequency1 = 19000;  float Amplitude1 = 0.6;
  uint16_t Frequency2 = 18000;  float Amplitude2 = 0.6/8.0;  // log difference should be 3
  AudioNoInterrupts();                     // Set up a pair of quadrature sinusoids
  mixer1.gain(0, 1.0);  mixer1.gain(1, 1.0);
  mixer2.gain(0, 1.0);  mixer2.gain(1, 1.0);
  I_input1.frequency(Frequency1); I_input1.amplitude(Amplitude1); I_input1.phase(45);
  Q_input1.frequency(Frequency1); Q_input1.amplitude(Amplitude1); Q_input1.phase(315);
  I_input2.frequency(Frequency2); I_input2.amplitude(Amplitude2); I_input2.phase(0);
  Q_input2.frequency(Frequency2); Q_input2.amplitude(Amplitude2); Q_input2.phase(90);
  AudioInterrupts();
//
  arm_cfft_radix2_init_q15(&myFFT, 256, 0, 1);                   // Initialize the complex FFT function
  windowComplex256_16(FFTbuffer, HAMMING);                       // Create a Hamming data window
}

//----------------------------------------------------------------------------------
void loop() {
   myData.grab(FFTbuffer);                                       // "grab" a 256 complex sample buffer
   windowComplex256_16(FFTbuffer, APPLY);                        // apply a Hamming window function
   arm_cfft_radix2_q15(&myFFT, FFTbuffer);                       // Use arm_math complex FFT
   for (int i=0; i<256; i++) {                   
     int16_t Re = FFTbuffer[2*i];
     int16_t Im = FFTbuffer[2*i+1];
     uint32_t magsq = (Re*Re + Im*Im)>>12;
     log2Mag[i] = fastLog2_32(magsq)>>1;                         // log(x^2) = 2log(x)
   }
   FFTshift_16(log2Mag, 256);                                    // Reorder the +ve  and -ve frequencies
   smoothVector(log2Mag, 256, (uint16_t)(0.8*16384));
   displaySpectrum(log2Mag, 256);
   delay(10);
}
 
Thanks Derek, that's much appreciated.

I put in the two initializing functions, and the errors have disappeared. However, I'm still not getting anything out of my serial moniter!

Do I need to apply a window to the data before performing the fft, as is done in analyze_fft1024.cpp? I've thrown some lines of code from analyze_fft1024.cpp into mine, but the compiler doesn't recognize "window".

This error confuses me, as window is defined in analyze_fft1024.h, which is included in Audio.h. And for that matter, the fft initializations that you brought up are also included in analyze_fft1024.h!

What's going on here? Is analyze_fft1024.h really being included in my sketch?

Thanks again - I've included my current code and error message.

Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>
// Found a suggestion by Paul in a forum to do this
extern "C" {
  #include <../Audio/utility/sqrt_integer.h>
  #include <../Audio/utility/dspinst.h>
}

// Basically: Filter the incoming audio and pass it to the queue (also, play it through the mic)
AudioInputI2S            i2s1;           //xy=83,98
AudioFilterBiquad        biquad1;        //xy=273,100
AudioRecordQueue         queue1;         //xy=448,100
AudioOutputI2S           i2s2;           //xy=449,189
AudioConnection          patchCord1(i2s1, 0, biquad1, 0);
AudioConnection          patchCord2(biquad1, queue1);
AudioConnection          patchCord3(biquad1, 0, i2s2, 0);
AudioConnection          patchCord4(biquad1, 0, i2s2, 1);
AudioControlSGTL5000     sgtl5000_1;     //xy=355,391


// Establish an instance of the fft function
arm_cfft_radix4_instance_q15 fft_inst;

// window function
static void apply_window_to_fft_buffer(void *buffer, const void *window)
{
  int16_t *buf = (int16_t *)buffer;
  const int16_t *win = (int16_t *)window;;
  for (int i=0; i < 1024; i++) {   
    int32_t val = *buf * *win++;
    // *buf = signed_saturate_rshift(val, 16, 15);
    *buf = val >> 15;
    buf += 2;
  }
}

void setup() {
  AudioMemory(60);

  sgtl5000_1.enable();  // Enable the audio shield
  sgtl5000_1.inputSelect(AUDIO_INPUT_MIC);
  sgtl5000_1.volume(0.5);

  // Butterworth filter, 12 db/octave
  biquad1.setLowpass(0, 800, 0.707);

  queue1.begin();

  //define fft params
  arm_cfft_radix4_init_q15(&fft_inst, 1024, 0, 1);

}

int i;
// initialize the buffer that will be passed to arm_cfft_radix4_q15()
short int buffer2[1024];
// initialize output vector for fft results
int output[512];
// float for storing and printing individual fft results
float n;

void loop() {
  // used for filling up buffer2
  int count = 0;
  if (queue1.available() >= 2) {
    byte buffer[512];
    // Fetch 2 blocks from the audio library and copy
    // into a 512 byte buffer.  The Arduino SD library
    // is most efficient when full 512 byte sector size
    // writes are used.
    memcpy(buffer, queue1.readBuffer(), 256);
    queue1.freeBuffer();
    memcpy(buffer + 256, queue1.readBuffer(), 256);
    queue1.freeBuffer();
    // Copy every 2nd int from the buffer into new buffer
    for (i = 0; i < 512; i++) {
      // check if index is a multiple of 2
      if ((i % 2) == 0) {
      // if it is, copy the number into the new buffer, using new counting system
      buffer2[count] = buffer[i];
      count = count + 1;
      }
    }
    // Once new buffer fills, run it through arm_cfft_radix4_q15()
    if (count == 1024) {
      
      // appy fft window
      if (window) apply_window_to_fft_buffer(&fft_inst, window);
      
      // perform fft operation
      arm_cfft_radix4_q15(&fft_inst, buffer2);
      for (i = 0; i < 512; i++) {
        uint32_t tmp = *((uint32_t *)buffer2 + i); // real & imag
        uint32_t magsq = multiply_16tx16t_add_16bx16b(tmp, tmp);
        output[i] = sqrt_uint32_approx(magsq);
      }
      // Print FFT output
      Serial.print("FFT: ");
      for (i = 0; i < 512; i++) {
        n = output[i];
        if (n >= 0.01) {
          Serial.print(n);
          Serial.print(" ");
        } else {
          Serial.print("  -  "); // don't print "0.00"
        }
      }
      Serial.println();
    }
  }
}

Code:
BiquadFilter_ManualFFT_Test: In function 'void loop()':
BiquadFilter_ManualFFT_Test:91: error: 'window' was not declared in this scope
       if (window) apply_window_to_fft_buffer(&fft_inst, window);
           ^
'window' was not declared in this scope
 
I'm sitting on my boat, with a small screen laptop, so it's a bit awkward to do a detailed analysis of your code. Nevertheless,I can see a number of issues:
1) You do not define a window function anywhere. Which function do you want to use : Hanning, Hamming, etc? And where is it defined? That's why you are getting the error message. Go back to the .h and .cpp files to see how they are defined.
You do not need a window function - the fft function works perfectly well without it - suggest you debug without one. Are you familiar with the purpose (and compromise) of windowing to minimize spectral leakage, and the rational for choosing one function over another. I usually just use a Hamming window unless I have a reason to choose another.

2) For a 1024 length FFT you need 8 data blocks (8x128 samples) in a buffer of length 2048. I don't see anywhere that you are doing this.

3) I don't understand your code to insert your data into the even-odd structure of the input fft buffer. There is a much simpler way to do it. It doesn't seem to work as I read it, and it seems to be the wrong length. How about something like:
for (int i = 0; i<1024; i++){
buffer[2*i] = sample;
buffer[2*i+1] = 0;
}

4) The line
AudioConnection patchCord2(biquad1, queue1);
needs input/output specification
AudioConnection patchCord2(biquad1, 0, queue1, 0);



Do you need a length 1024 FFT? Do you have a reason for this data resolution?

It seems that your sketch does exactly the same thing as the (working) prototype sketch in my previous post - what are you going to to with the FFT when you get it?
 
Now that I'm home, I remembered that you are trying to do decimation by two before the 1024 FFT. But this means you will need 16 audio blocks per FFT, because now you will only get 64 samples/block! You will still need a length 2048 buffer for the FFT data.

Aside: I am currently writing decimation by 2 and 4 audio library functions, with 8th-order Chebyshev II filters, and which will transmit partially filled blocks, for processor load reduction in communication quality speech processing (<5kHz).
 
I'll test these FFT-plus-decimation objects when you have them. I'll be interested to see the differences in freq defection accuracy and speed. DerekR, will you continue the reuse-the-last-half-of-the-buffer approach? It should theoretically be twice as precise and half as fast.
 
Hi,
I'm writing a FFT class with a .decimateSignal(int M) function too (M can be 2, 4, 8, 16 ,32) with cascaded arm_fir_decimate_q15 calls for the anti-aliases filters...
Maybe we should share !
 
Suffering from congenital laziness, what I am working on is simply to produce a modified version the biquad filter audio function/class which has a set of fixed Chebyshev coefficients (derived from MATLAB's fdatool) and to decimate the output block before transmitting it as a partially filled block. (I personally don't have any use for decimation by more than 4, because the samples per block becomes way too small :) )

My application is time-domain processing, not FFT analysis, so I'm also looking at interpolation filters in case I need to reconstitute the original sampling rate.

I am currently questioning whether the processing overhead in the decimation process is justified by any potential saving in processing time in subsequent filtering operations.
 
Thanks again for the help, I've successfully decimated now.

For posterity, the issue was solely an error in buffer sizing. I just talked myself through the number of bytes in each buffer and how those buffers are used, and the problem became apparent.

Working on getting better pitch detection now, as well as how to deal with the 50ms delay that comes with the decimation. This is too high a delay for live musical applications.
 
DerekR, that looks very useful! And I like the "eureka!" tone of your post, even if you did reinvent the wheel. :)

I'll parse your work ASAP.
 
Just getting back to fft algorithms, as a long time user of the FFT, but having never coded the algorithm, it seems the code of choice is FFTW downloadable at http://www.fftw.org/. This is the routine used in matlab and octave and is open source. It allows an arbitrary number of points in the FFT, and as long as the number is factorable, there is a speed increase as compared to the DFT. The more factorable, the better. Power of 2 is obviously the fastest. My guess is that the code is too big for the teensy and thus the traditional power of 2 algorithm is better for memory usage. Paul has probably gone through all these trade-offs prior to coding the audio library.
 
I'd like to use this as a guitar tuner as part of a bigger project (on T3.2) . That is, I would only want the notefreq routine to run when I was using the tuner function, and not taking up any processor time otherwise. Ideally, that would be a notefreq.end() function. I tried gating the notefreq input using a mixer, but when I set mixer gain =0, the YIN routine is still running and using 80% of CPU time (comparable to when it is actually measuring an actual guitar signal).
Is there any way of doing this?
Thanks.
 
I'd like to use this as a guitar tuner as part of a bigger project (on T3.2) . That is, I would only want the notefreq routine to run when I was using the tuner function, and not taking up any processor time otherwise. Ideally, that would be a notefreq.end() function. I tried gating the notefreq input using a mixer, but when I set mixer gain =0, the YIN routine is still running and using 80% of CPU time (comparable to when it is actually measuring an actual guitar signal).
Is there any way of doing this?
Thanks.
I have made a tuner using the noteFreq and it works good but your right about the cpu usage being too high. If your comfortable with editng the source files you can do this to get the cpu usage down:

In analyze_notefreq.cpp edit line 107, 113, 119 125 too:
Code:
[COLOR=#333333][FONT=Consolas]x += [/FONT][/COLOR][FONT=Consolas][COLOR=#0086b3]16[/COLOR][/FONT][COLOR=#333333][FONT=Consolas];[/FONT][/COLOR]
 
Back
Top