A New Accurate FFT Interpolator for Frequency Estimation

Status
Not open for further replies.

DerekR

Well-known member
Last night I was able to derive the relationship between the three largest amplitude lines in the FFT of a sinusoid under an implicit "rect" window (described by a sinc (or sin(x)/x) function), and under a raised cosine Hanning window (the sum of three shifted sinc functions), when the spectral peak lies between the FFT lines. In particular I was able to show that the ratio of the two largest amplitude lines allows the true frequency to be very easily computed.
If the largest amplitude A(L) occurs at line L in the FFT, and the second largest at line L+1, we can define the ratio R = A(L)/A(L+1), and the true frequency f of the input is simply:

f = (L +/- 1/(1+R))* Fsample/fftLength.......................(1)
for the implicit rect window, and

f = (L +/- (2-R)/(1+R))* Fsample/fftLength.................(2)
if the data is windowed with a Hanning window before the FFT, and where it is + if A(L+1) > A(L-1), and - if A(L-1)>A(L+1).
It doesn't get much easier!
I've reinvented the wheel many times in my career, and I found out today that (1) was proposed by Jain et al in 1979, and is known as Jain's algorithm, however I can find no reference to (2).
I've spent the day today simulating (1) and (2) in MATLAB, and writing Teensy Audio objects for both. I've looked at the effect of non-sinusoidal input waveforms (square wave and triangle), and additive noise. My conclusion is that the Hanning windowed version (2) is least sensitive to the spectral leakage from harmonics in the non-sinusoidal waveforms, and to additive random noise, and adjacent contaminating signals for a couple of reasons. I'll be doing more simulations and will follow up with the full derivation of (1) and (2).

I have attached a .zip file with fully functional .h, .cpp, and example .ino files for
FFTInterpHann1024 and FFTInterpHann256 Audio objects. You will see that they give sub-Hz accuracy for both FFT lengths, but that the 256 unsmoothed output has a bit more jitter. You can also add noise to look for bias in the estimates. This is a work-in-progress, and I would be delighted if they could be given a real workout in practical applications.
 

Attachments

  • AccurateFFTInterpolator.zip
    15.3 KB · Views: 315
Last edited:
Last night I was able to derive the relationship between the three largest amplitude lines in the FFT of a sinusoid under an implicit "rect" window (described by a sinc (or sin(x)/x) function), and under a raised cosine Hanning window (the sum of three shifted sinc functions), when the spectral peak lies between the FFT lines. In particular I was able to show that the ratio of the two largest amplitude lines allows the true frequency to be very easily computed.
If the largest amplitude A(L) occurs at line L in the FFT, and the second largest at line L+1, we can define the ratio R = A(L)/A(L+1), and the true frequency f of the input is simply:

f = (L +/- 1/(1+R))* Fsample/fftLength.......................(1)
for the implicit rect window, and

f = (L +/- (2-R)/(1+R))* Fsample/fftLength.................(2)
if the data is windowed with a Hanning window before the FFT, and where it is + if A(L+1) > A(L-1), and - if A(L-1)>A(L+1).
It doesn't get much easier!
I've reinvented the wheel many times in my career, and I found out today that (1) was proposed by Jain et al in 1979, and is known as Jain's algorithm, however I can find no reference to (2).
I've spent the day today simulating (1) and (2) in MATLAB, and writing Teensy Audio objects for both. I've looked at the effect of non-sinusoidal input waveforms (square wave and triangle), and additive noise. My conclusion is that the Hanning windowed version (2) is least sensitive to the spectral leakage from harmonics in the non-sinusoidal waveforms, and to additive random noise, and adjacent contaminating signals for a couple of reasons. I'll be doing more simulations and will follow up with the full derivation of (1) and (2).

I have attached a .zip file with fully functional .h, .cpp, and example .ino files for
FFTInterpHann1024 and FFTInterpHann256 Audio objects. You will see that they give sub-Hz accuracy for both FFT lengths, but that the 256 unsmoothed output has a bit more jitter. You can also add noise to look for bias in the estimates. This is a work-in-progress, and I would be delighted if they could be given a real workout in practical applications.

Very nice indeed,
(you may edit the '/' to be '(')
You can find the formula for the Hann window in Grandke 1983 IEEE TIM (OT: I prefer the term "Hann" window as the name is "von Hann" and not Hanning)

You may know it, but I would like to point out that the Hann window could be carried out in Frequency domain (formula 7 in Grandke, 1983) (2 additions an two shifts only, NO window storage required). if this is more efficient has to be tested.
 
WMXZ,
Thanks and wow - I was not aware of the Grandke paper. It shows that I've reinvented the wheel for the millionth time! :( My scheme is buried as equation (10) in the Grandke paper - is there nothing new in the world for me to discover? ;)

Let's call it the Grandke method...

But seriously, the good news is that the Grandke paper confirms all the hunches that led me to look at using the Hann window function before the FFT, and the conclusions from my MATLAB simulations. It is definitely superior to the Jain interpolation.

An aside: both Grandke and Jain use the small angle approximation sin(x) ~ x in their derivation. My derivation does not need this.

And yes, I was aware of Grandke's equation (7), in fact I mentioned it in the first sentence of my post. My Teensy code was blatantly stolen from Paul's AudioAnalyzeFFTxxxx objects, and uses time-domain windowing for convenience. (This whole project, from concept to posting the code took less than 24 hours!)

The to-do list includes adding a threshold condition to prevent garbage estimates when no signal is present, and adding a "fftshift" to prevent potential problems with very low frequencies.

BTW - I don't understand your comment "(you may edit the '/' to be '(')". Is the offending / in the post or the code?

Thanks again.

ps - I agree with your comment on Hann vs Hanning. It's a bad joke based on its similarities with the Hamming windows, which are also raised cosines.
 
Last edited:
WMXZ,
BTW - I don't understand your comment "(you may edit the '/' to be '(')". Is the offending / in the post or the code?

it is the formula in the post in the "(L +/- 1/(1+R))", and, yes, I realize that "/" and "(" are adjacent on the keyboard
 
No, that was deliberate; by "+/-" I meant "plus or minus" depending on which side of the peak L the second largest component lies. I tried to explain it by saying "it is + if A(L+1) > A(L-1), and - if A(L-1)>A(L+1)".
 
I was able to test it out with a few things I've been working on. The good news is its fast, fairly accurate at 1024 fft. The problems I found using this are mag threshold you talked about and not knowing what values to trust (analysis confidence). Also as you mention anything below the first bin frequency cannot be detected using the current method you developed but sounds like your on to cracking that nut also? The jitter at lower frequencies is more pronounced even with smoothing.

As for testing I was able to test it with the noteFrequency example and it worked pretty good on real guitar sample (will not work out of the box you need to shorten the samples) but when trying to use it with guitar synthesis it has some serious problems that the yin algorithm was better at finding the fundamental so you definitely need to come up with a good threshold limiting and confidence. I'll post some test sketches for you later today I just got back from 2 week trip and I just moved so I need to unpack today.
 
Here is test sketch that should work out of the box (if you have the FFTInterpHann1024 library installed). This just compares the yin and your spectral algorithm with the AudioSynthKarplusStrong object. This uses IntervalTimer to play a single note which every 100msec so there is some dead time. I filter out wild estimates also. I'll have another sketch using the noteFrequency example shortly.

Code:
#include <Audio.h>
#include <Wire.h>
#include <SD.h>
#include <SPI.h>
#include <SerialFlash.h>
#include "FFTInterpHann1024.h"


#define NOTE_E2   82.41
#define NOTE_F2   87.31
#define NOTE_Fs2  92.50
#define NOTE_G2   98.00
#define NOTE_Gs2 103.82
#define NOTE_A2  110.00
#define NOTE_As2 116.54
#define NOTE_B2  123.47
#define NOTE_C3  130.81
#define NOTE_Cs3 138.59
#define NOTE_D3  146.83
#define NOTE_Ds3 155.56
#define NOTE_E3  164.81
#define NOTE_F3  174.61
#define NOTE_Fs3 185.00
#define NOTE_G3  196.00
#define NOTE_Gs3 207.65
#define NOTE_A3  220.00
#define NOTE_As3 233.08
#define NOTE_B3  246.94
#define NOTE_C4  261.63
#define NOTE_Cs4 277.18
#define NOTE_D4  293.66
#define NOTE_Ds4 311.13
#define NOTE_E4  329.63
#define NOTE_F4  349.23
#define NOTE_Fs4 369.99
#define NOTE_G4  392.00
#define NOTE_Gs4 415.30
#define NOTE_A4  440.00
#define NOTE_As4 466.16
#define NOTE_B4  493.88


AudioAnalyzeNoteFrequency notefreq;
AudioSynthKarplusStrong   string;
AudioMixer4               mixer;
AudioOutputAnalog         dac;
FFTInterpHann1024         myFreqEstimator;


AudioConnection           patchCord1(string, 0, mixer,           0);
AudioConnection           patchCord2(mixer,  0, myFreqEstimator, 0);
AudioConnection           patchCord3(mixer,  0, notefreq,        0);
AudioConnection           patchCord4(mixer,  0, dac,             0);


IntervalTimer playNoteTimer;


float GuitarNote;
bool  printCR1;
bool  printCR2;


void playNote(void) {
  string.noteOn(GuitarNote, .7);
  digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
}


void setup() {
  printCR1 = false;
  printCR2 = false;
  pinMode(LED_BUILTIN, OUTPUT);
  AudioMemory(35);
  GuitarNote = NOTE_A4;
  while (!Serial);
  delay(500);
  notefreq.begin(.15);
  mixer.gain(0, 0.15);
  playNoteTimer.priority(144);
  playNoteTimer.begin(playNote, 100000);
}


void loop() {


  if (myFreqEstimator.available()) {
    float alpha = .8;
    float newEstimate  = myFreqEstimator.read();                                       // Fetch a new estimate
    float smoothedEstimate    = alpha * smoothedEstimate + (1.0 - alpha) * newEstimate;// Apply first-order IIR smoothing filter
    // this just filters out the wild estimates
    if (newEstimate < (float)(GuitarNote + 10.0) && newEstimate > (float)(GuitarNote - 10.0)) {
      printCR2 = true;
      if (printCR1 == true) { printCR1 = false; Serial.println(); }
      Serial.printf("Source Frequency:\t%3.2f | Spectural Estimate:\t%3.2f | Smoothed Estimate:\t%3.2f\n", GuitarNote, newEstimate, smoothedEstimate);
    }
  }


  if (notefreq.available()) {
    float note = notefreq.read();
    float prob = notefreq.probability();
    printCR1 = true;
    if (printCR2 == true) { printCR2 = false; Serial.println(); }
    Serial.printf("Source Frequency:\t%3.2f | Yin Estimate:\t\t%3.2f | Probability:\t\t%3.2f\n", GuitarNote, note, prob);
  }


}
edit: fixed sketch
 
Last edited:
The Grandke paper looks interesting based on your description. Since I am not a member of IEEE, I don't have a copy. Quite possibly a similar scheme was developed by Bruel and Kjear and reported by Randall as shown in the attachment. I have used the scheme for more accurate frequencies, but have not used the amplitude correction feature. Only used it in matlab however.

A quick note regarding the hanning, or any other window. Anders Brandt wrote a signal processing toolbox for matlab or octave called abravibe. This is an open source toolbox. A copy of his hanning window code is shown:

Code:
function w = ahann(N);
%AHANN Calculate a Hanning window
%
%           w = ahann(N);
%
%           w           Hanning time window
%
%           N           Blocksize, length of w
%
% Calculates a (correct!) Hanning window with length(N), in a column 
% vector. The ENBW is 1.5*df.
%
% See also aflattop winacf winenbw


% Copyright (c) 2009-2011 by Anders Brandt
% Email: abra@iti.sdu.dk
% Version: 1.0 2011-06-23   
% This file is part of ABRAVIBE Toolbox for NVA

a0=.5;
a1=.5;
t=(0:pi/N:pi-pi/N)';
w=a0-a1*cos(2*t);

This would be the same as the matlab or octave hanning window with the 'periodic' option. This makes the window start with the first non zero term and end with a zero in the time domain. According to Anders (and his book) this is the correct way to do it. Realistically, I don't expect there to be much difference, but we might as well do it right if we have the option.

The hanning window from octave is:

Code:
## Copyright (C) 1995-2015 Andreas Weingessel
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn  {Function File} {} hanning (@var{m})
## @deftypefnx {Function File} {} hanning (@var{m}, "periodic")
## @deftypefnx {Function File} {} hanning (@var{m}, "symmetric")
## Return the filter coefficients of a Hanning window of length @var{m}.
##
## If the optional argument @qcode{"periodic"} is given, the periodic form
## of the window is returned.  This is equivalent to the window of length
## @var{m}+1 with the last coefficient removed.  The optional argument
## @qcode{"symmetric"} is equivalent to not specifying a second argument.
##
## For a definition of the Hanning window see, e.g.,
## @nospell{A.V. Oppenheim & R. W. Schafer},
## @cite{Discrete-Time Signal Processing}.
## @end deftypefn

## Author: AW <Andreas.Weingessel@ci.tuwien.ac.at>
## Description: Coefficients of the Hanning window

function c = hanning (m, opt)

  if (nargin < 1 || nargin > 2)
    print_usage ();
  endif

  if (! (isscalar (m) && (m == fix (m)) && (m > 0)))
    error ("hanning: M must be a positive integer");
  endif

  N = m - 1;
  if (nargin == 2)
    switch (opt)
      case "periodic"
        N = m;
      case "symmetric"
        ## Default option, same as no option specified.
      otherwise
        error ('hanning: window type must be either "periodic" or "symmetric"');
    endswitch
  endif

  if (m == 1)
    c = 1;
  else
    m = m - 1;
    c = 0.5 - 0.5 * cos (2 * pi * (0 : m)' / N);
  endif

endfunction


%!assert (hanning (1), 1);
%!assert (hanning (2), zeros (2,1));
%!assert (hanning (15), flip (hanning (15)), 5*eps);
%!assert (hanning (16), flip (hanning (16)), 5*eps);
%!test
%! N = 15;
%! A = hanning (N);
%! assert (A(ceil (N/2)), 1);

%!assert (hanning (15), hanning (15, "symmetric"));
%!assert (hanning (16)(1:15), hanning (15, "periodic"));
%!test
%! N = 16;
%! A = hanning (N, "periodic");
%! assert (A(N/2 + 1), 1);

%!error hanning ()
%!error hanning (0.5)
%!error hanning (-1)
%!error hanning (ones (1,4))
%!error hanning (1, "invalid");

I am interested in hearing back how the Bruel & Kjear technique for frequency line interpolation compares with the DerekR/Grandke10 one mentioned above. It may be identical?
 

Attachments

  • FreqInterpolateRandall.pdf
    1.3 MB · Views: 271
The Grandke paper looks interesting based on your description. Since I am not a member of IEEE, I don't have a copy. Quite possibly a similar scheme was developed by Bruel and Kjear and reported by Randall as shown in the attachment. I have used the scheme for more accurate frequencies, but have not used the amplitude correction feature. Only used it in matlab however.
....

I am interested in hearing back how the Bruel & Kjear technique for frequency line interpolation compares with the DerekR/Grandke10 one mentioned above. It may be identical?

If you search google scholar, you may find non-ieee pdf. This is how I got it.

Unfortunately, the Bruel & Kjaer technique is given as a graph, and not as a formula. But knowing the reputation of B&K, I think they have a good method.
 
Duff:
Thanks a lot! I am surprised that it did as well as it did with a non-pure input. The problem with the smoothed output is the IIR filter start-up (initial condition) transient. With alpha --> 1.0 it can take many iterations to reach steady-state. If you look at the results you will see that the first result is 0.2 times the estimated value - which is correct for alpha = 0.8.

I should make a couple of comments about the algorithm:
1) Both Grandke and Jain point out the importance of incorporating enough data in the FFT. In particular they say the data record should encompass at least 20 cycles of the frequency of interest. I'm not sure that this altogether necessary, but if we look at a 100Hz signal, 20 cycles would require a record of 0.2 seconds - or 8820 samples at 44100 samples/sec. And of course 50Hz would require double that.
One way to get around the need for all this data is to decimate (downsample) the data to effectively decrease the sampling rate.
I have written a pair of Audio decimate/interpolate objects that allow down/upsampling by factors of 2, 4, 8 and today I'm looking at using decimation with the fft interpolator to look at this effect on low frequency estimation. Of course it takes longer to accumulate the 1024 samples so the update time also increases accordingly.

2) I never really intended this function to be used for musical pitch estimation. It is part of a general DSP library that I am creating for scientific/engineering/educational use. In fact, my whole interest in the Teensy revolves around developing a method for rapid development/deployment of scientific lab instrumentation and analysis methods. I see Paul's work in creating the Audio library as a fantastic framework for achieving this.
The problems I see with the blind application of the FFT interpolation method is in the nature of the musical signal itself. As an old musician (alto sax and flute), I'm certainly aware of the the structure of a musical note - including frequency changes in the onset and vibrato. I wonder about the "cleanliness" of the FFT. You folk will certainly know more about this than I do :)

Jake:
I'll most certainly be taking a look at this. I have family staying with us right now and, the house is a (happy) zoo, but I'll get back to you asap.
I located a pdf of the Grandke paper on the web. PM me with your email and I'll forward it to you...

I have half written a derivation of the method in "not very" technical terms. Although the actual derivation is quite brief, I have included a longish preamble explaining spectral spreading and the nature of the windowed spectrum to help understand what is really going on. Unfortunately I still had to allude to frequency domain convolution with Dirac delta functions, and other mathematical nastiness. I'll post it as soon as it's finished.
 
Question: my application is to determine the speed of an object as best I can, based on the peak frequency of an X or K-band doppler radar return (in my case the doppler signal is generally between 50 Hz and 1 kHz). The return is quite noisy (nothing like a clean sine wave) but the interesting signal is the one with the largest amplitude. Is this new algorithm a good choice to estimate the peak frequency despite some noise?
Right now I'm using what I assume is Paul's stock Audio code with myFFT.windowFunction(AudioWindowHanning1024) not because I really understand it, but just because that's what was in an example that I found.
 
Last edited:
Yep. This algorithm can detect largest peaks of your signal. I gave a try to modify the code for detect next three peaks also.
 
@Wicky21,
It is definitely not "nice" to write a nasty PM calling me "rude", when I have already spent several hours writing code specifically for you! I most certainly do not appreciate that! And here you are again, in one of my threads, after I suggested you look at this method - asking again for somebody to write your code for you. We don't work like that around here.

It has always been my intention to extend this algorithm to identify N major peaks in the spectrum, and report the interpolated frequency, magnitude, and phase of those peaks. I need it for work on pitch-shifting and time compression/expansion using a phase vocoder. It's easy to do, and it's near the top on my to-do list... but I need to finish up (and publish on this forum) one other new Audio library project first.

And I need to publish, in this thread, the underlying theory for this interpolation algorithm as I promised a year ago :) As they say: so much to do, so little time...

Wait a week or two and you might have the answer, but don't ask me ever again to look at your code.
 
New Code for Multiple Peaks - First Pass

I have included my first pass at detecting and reporting multiple spectral peaks in the magnitude spectrum. It appears to be working well. The code is located here: View attachment FFTFreqEstimatorMultiplePeaks.zip, and contains both 256 and 1024 length code (in separate folders) with test sketches that should demonstrate the use.

To do:
1) Report the magnitude (and possibly phase) of the peaks when they lie between lines. I have already written code (for another project) to do interpolation between spectral lines using Whittaker (cardinal) interpolation, and I just need to incorporate that.
2) Investigate the use of the Hilbert transform to produce an analytic signal, which hopefully should minimize image reflection from the negative frequency components at very low frequencies. (I also have the transform code, from yet another project)
3) Include a detection threshold to prevent noise peaks from being included.

Incidentally, run the test code with the sawtooth and square wave inputs, and note that the sawtooth has all harmonics reported, while the square wave shows only the odd harmonics - which is just what my buddy Jean-Baptiste Fourier predicts... :D

As always, comments and suggestions are welcome.
Derek
 
Last edited:
Wow great work. I have a sample .wav file which i'm trying to detect fundamental and harmonics. But when i use that .wav file with you'r code it's not properly identify fundamental and other harmonics. Check out the attached .WAV file. If i analyze that with Audacity i get something like below figure.

Mid_Female55.jpg

View attachment WAV_file.zip
 
@Wicky21
I took a listen and did some analysis in MATLAB on your .wav file, and it's no wonder that my method (and probably many other methods) will not work on your data!

1) The audio is so bad I couldn't understand what it is, it sounds like a child's voice crying or singing???
2) The signal-to-noise-ratio (SNR) is horrendous! The Audacity spectrum also shows this.

The assumptions behind my method are that a) the signal consists of a set of isolated sinusoids (whose windowed spectra do not overlap, and b) the SNR is "reasonable" for all components.
The present version of the code is indiscriminate, that is it will produce results regardless of whether or not the signal is actually present or is of sufficient amplitude for the results to be valid.
For example, as far as I could tell the first couple of seconds in your .wav file has no signal - but if you ask for N peaks, you will get N values, and all are complete garbage! Similarly, if the noise has large peaks in its spectrum, they will also generate garbage output.

My to-do list in a previous post said that I intend to look into a threshold detector before a peak is reported. Your data indicates the importance of this.

I really can't help you much more until I understand exactly what you are trying to do, what output you need, how that output will be used in subsequent processing or displayed, etc etc, etc. For example, are you simply going to display your data as a spectrogram such as the MATLAB analysis of your data file produced:
Wicky21Spectrogram.jpg
or is this an intermediate step, and you will be doing further analysis using the peak frequencies found?

BTW - I also intend to modify the code so that it does a STFT (short-term-Fourier-transform) which is a sliding FFT, and produce a new set of peaks for each audio block received. In other words for a length 1024 FFT, instead of having to wait 11.6 msecs you will get a result every 2.9 msecs. (The existing code is effectively a STFT with an overlap of 4 blocks)
 
Last edited:
Derek

I'm working in the field of Entomology which doing researches about mosquitoes. Im trying to analyze the mosquito wing beat frequencies. Since at research labs mostly doing by using biological methods and feature extractions.
What i have attached file is an mosquito buzzing sound and i'm trying to identify the fundamental and harmonics of the wing beat sounds. That's what i'm exactly trying to do. Actually i need to identify the maximum peaks of the .wav file.
Before real time processing my first target is to get an output of .wav file as Fundamental and other harmonic values. Something like Audacity display maximum peaks. No need to get an output every 1024 block. But after analyzing the whole .wav just give an output.
 
Derek Could you please explain about the new interpolation method your doing Whittaker (cardinal) interpolation ? Is that only for report magnitude of the lines or is that for frequency interpolation? I'm eager to know. In what sort of applications it can be use? Also in the Grandke method it can interpolate the exact frequency as well. But even from that method also magnitude can be track. What would be the advantage of using Whittaker (cardinal) interpolation over Grandke method?
 
Status
Not open for further replies.
Back
Top