Bob Larkin
Well-known member
Here is an 1024 point Fourier Transform that improves the dynamic range of the Teensy Audio library class AudioAnalyzeFFT1024. This increase is due to using floating point processing inside the update() function. As a library element it is pure I16 for the audio data. The standard I16 library FFT uses 16-bit arithmetic for the FFT that limits the RMS output range to (-16384, 16383) setting the dynamic range to about 85 dB.
The next pair of graphs show the output, using the I16 library on the right and the new FFT, nicknamed HD on the left. These are the result of generating a sine wave 1 dB below DAC/ADC full scale, outputting this via a SGTL5000 DAC, sending the analog signal wire to the SGTL5000 ADC and then to the FFT. This complicated path is the result of some other experiments but was left as it represents a good measure of several blocks.
The orange curves are for the sine wave. The spikes to the right of the main spike are the harmonics that we might want to measure. The green curves are the result of turning off the sine wave. The seven or so bumps in the left green curve are not part of the signal generation, but rather are artifacts of the SGTL5000 ADC when running with no input. These are called "idle tones," and have been discussed before. All of these curves include power averaging of 16 FFTs. This makes the various effects easier to see, and generally reduces noise on the spectral data.
Here are the two files for the AudioAnalyzeFFT_HD1024 class.
A couple of comments. The functions are very similar to those of the regular I16 FFT, but include averaging (the averageTogether
), use a half length FFT that roughly doubles the speed, provide for selection of power or dBFS output in addition to the default RMS output, and have a different selection of windowing functions that are, I think, quite versatile.
I have been using this and it works for me, so I wanted to make it available to others.
The next pair of graphs show the output, using the I16 library on the right and the new FFT, nicknamed HD on the left. These are the result of generating a sine wave 1 dB below DAC/ADC full scale, outputting this via a SGTL5000 DAC, sending the analog signal wire to the SGTL5000 ADC and then to the FFT. This complicated path is the result of some other experiments but was left as it represents a good measure of several blocks.
The orange curves are for the sine wave. The spikes to the right of the main spike are the harmonics that we might want to measure. The green curves are the result of turning off the sine wave. The seven or so bumps in the left green curve are not part of the signal generation, but rather are artifacts of the SGTL5000 ADC when running with no input. These are called "idle tones," and have been discussed before. All of these curves include power averaging of 16 FFTs. This makes the various effects easier to see, and generally reduces noise on the spectral data.
Here are the two files for the AudioAnalyzeFFT_HD1024 class.
C++:
/* analyze_fft_hd1024.h Converted from Teensy F32 OpenAudio_ArduinoLibrary
* that was conveerted from PJRC's Teensy Audio Library.
*
* Audio Library for Teensy
* Copyright (c) 2014, Paul Stoffregen, paul@pjrc.com
*
* Development of this audio library was funded by PJRC.COM, LLC by sales of
* Teensy and Audio Adaptor boards. Please support PJRC's efforts to develop
* open source software by purchasing Teensy or other PJRC products.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice, development funding notice, and this permission
* notice shall be included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
/* Translated from I16 to F32. Bob Larkin 16 Feb 2021
* Then converted back to I16 Input (June 2025), but all F32 internal processing.
* This increases the dynamic range of the FFT by 8 bits or so (about 48 dB).
* Does real input FFT of 1024 points. Output is not audio, and is magnitude
* only. Multiple output formats of RMS (same as I16 version, and default),
* Power or dBFS (full scale). Output can be bin by bin or a pointer to
* the output array is available. Several window functions are provided by
* in-class design, or a custom window can be provided from the INO.
*
* Functions (See comments below and #defines:
* bool available()
* float read(unsigned int binNumber)
* float read(unsigned int binFirst, unsigned int binLast)
* int windowFunction(int wNum)
* int windowFunction(int wNum, float _kdb) // Kaiser only
* float* getData(void)
* float* getWindow(void)
* void putWindow(float *pwin)
* void setOutputType(int _type)
* void averageTogether(uint8_t n)
*
* Timing, max is longest update() time.
* T4.0 Windowed, Power Out, 69 uSec
* T4.0 Windowed, dBFS Out, 148 uSec
* Scaling:
* The input range is -32768 to 32767. Normally the
* full scale is -1.0 to +1.0. This is an unscaled FFT and for a sine
* wave centered in frequency on a bin and of FS amplitude, the power
* at that center bin will grow by 1024^2 = 1048576 without windowing.
* Windowing loss cuts this down. The RMS level can grow to 1024.
* The dBFS has been scaled to make this max value 0 dBFS by
* removing 60.2 dB. With floating point, the dynamic range is maintained
* no matter how it is scaled, but this factor needs to be considered
* when building the INO.
*/
// Converted to using half-size FFT for real input, with no zero inputs.
// See E. Oran Brigham and many other FFT references. 16 March 2021 RSL
#ifndef analyze_fft1024_hd_h_
#define analyze_fft1024_hd_h_
#include <Arduino.h> // github.com/PaulStoffregen/cores/blob/master/teensy4/Arduino.h
#include <AudioStream.h> // github.com/PaulStoffregen/cores/blob/master/teensy4/AudioStream.h
#include "arm_math.h"
#include "mathDSP_F32.h"
#if defined(__IMXRT1062__)
#include "arm_const_structs.h"
#endif
// Doing an FFT with NFFT real inputs
#define NFFT 1024
#define NFFT_M1 NFFT-1
#define NFFT_D2 NFFT/2
#define NFFT_D2M1 (NFFT/2)-1
#define NFFT_X2 NFFT*2
#define FFT_PI 3.14159265359f
#define FFT_2PI 6.28318530718f
// Alternate output formats, FFT_RMS default
#define FFT_RMS 0
#define FFT_POWER 1
#define FFT_DBFS 2
#define NO_WINDOW 0
#define AudioWindowNone 0
#define AudioWindowHanning1024 1
#define AudioWindowKaiser1024 2
#define AudioWindowBlackmanHarris1024 3
class AudioAnalyzeFFT_HD1024 : public AudioStream {
public:
AudioAnalyzeFFT_HD1024() : AudioStream(1, inputQueueArray) {
// __MK20DX128__ T_LC; __MKL26Z64__ T3.0; __MK20DX256__T3.1 and T3.2
// __MK64FX512__) T3.5; __MK66FX1M0__ T3.6; __IMXRT1062__ T4.0 and T4.1
#if defined(__IMXRT1062__)
// Teensy4 core library has the right files for new FFT
// arm CMSIS library has predefined structures of type arm_cfft_instance_f32
Sfft = arm_cfft_sR_f32_len512; // Like this. Changes with size <<<
#else
arm_cfft_radix2_init_f32(&fft_inst, NFFT_D2, 0, 1); // for T3.x (check radix2/radix4)<<<
#endif
// This class is always 128 block size. Any sample rate. No use of "settings"
useHanningWindow();
// Factors for using half size complex FFT
for(int n=0; n<NFFT_D2; n++) {
sinN[n] = sinf(FFT_PI*((float)n)/((float)NFFT_D2));
cosN[n] = cosf(FFT_PI*((float)n)/((float)NFFT_D2));
}
}
// Inform that the output is available for read()
bool available() {
if (outputflag == true) {
outputflag = false;
return true;
}
return false;
}
// Output data from a single bin is transferred
float read(unsigned int binNumber) {
if (binNumber>NFFT_D2M1 || binNumber<0) return 0.0;
return output[binNumber];
}
// Return sum of several bins. Normally use with power output.
// This produces the equivalent of bigger bins.
float read(unsigned int binFirst, unsigned int binLast) {
if (binFirst > binLast) {
unsigned int tmp = binLast;
binLast = binFirst;
binFirst = tmp;
}
if (binFirst > NFFT_D2M1) return 0.0;
if (binLast > NFFT_D2M1) binLast = NFFT_D2M1;
float sum = 0.0f;
do {
sum += output[binFirst++];
} while (binFirst <= binLast);
return sum;
}
int windowFunction(int wNum) {
if(wNum == AudioWindowKaiser1024) // Changes with size <<<
return -1; // Kaiser needs the kdb
windowFunction(wNum, 0.0f);
return 0;
}
int windowFunction(int wNum, float _kdb) {
float kd;
pWin = window;
if(wNum == NO_WINDOW)
pWin = NULL;
else if (wNum == AudioWindowKaiser1024) { // Changes with size <<<
if(_kdb<20.0f)
kd = 20.0f;
else
kd = _kdb;
useKaiserWindow(kd);
}
else if (wNum == AudioWindowBlackmanHarris1024) // Changes with size <<<
useBHWindow();
else
useHanningWindow(); // Default
return 0;
}
// Fast pointer transfer. Be aware that the data will go away
// after the next 512 data points occur.
float* getData(void) {
return output;
}
// You can use this to design windows
float* getWindow(void) {
return window;
}
// Bring custom window from the INO
void putWindow(float *pwin) {
float *p = window;
for(int i=0; i<NFFT; i++)
*p++ = *pwin++;
}
// Output RMS (default) Power or dBFS
void setOutputType(int _type) {
outputType = _type;
}
// Output power (non-coherent) averaging
void averageTogether(uint8_t n) {
nAverage = (int)n;
}
virtual void update(void);
private:
float output[NFFT_D2];
// The next could be done in output[] but it would need to be read in
// 9 mSec for 44.1 sample rate. Use extra 512 floats to remove that.
float sumSq[NFFT_D2]; // Accumulates averages of outputs
float window[NFFT_D2]; // All windows assumed symmetrical about the middle.
float *pWin = window;
float fft_buffer[NFFT];
float saveHalf[NFFT_D2];
// The cosN and sinN would seem to be the same values as twidddle factors.
// Someday look at this and see if they can be stolen from arm math/DSP.
float cosN[NFFT_D2]; // Used for un-ravelling half-size real FFFT
float sinN[NFFT_D2];
// Total memory just over 4*(6*512 + 1024) = 16 KBytes
uint8_t state = 0;
bool outputflag = false;
audio_block_t *inputQueueArray[1];
#if defined(__IMXRT1062__)
// For T4.x, 512 length for real 1024 input, etc.
// const static arm_cfft_instance_f32 arm_cfft_sR_f32_len512;
arm_cfft_instance_f32 Sfft;
#else
arm_cfft_radix2_instance_f32 fft_inst; // Check radix2/radix4 <<<
#endif
int outputType = FFT_RMS; //Same type as I16 version init
int nAverage = 1;
int count = 0; // used to average for nAverage of powers
// The Hann window is a good all-around window
void useHanningWindow(void) {
for (int i=0; i < NFFT_D2; i++) {
float kx = FFT_2PI/((float)NFFT_M1); // 0.006141921 for 1024
window[i] = 0.5f*(1.0f - cosf(kx*(float)i));
}
}
// Blackman-Harris produces a first sidelobe more than 90 dB down.
// The price is a bandwidth of about 2 bins. Very useful at times.
void useBHWindow(void) {
for (int i=0; i < NFFT_D2; i++) {
float kx = FFT_2PI/((float)NFFT_M1); // 0.006141921 for 1024
int ix = (float) i;
window[i] = 0.35875f -
0.48829f*cosf( kx*ix) +
0.14128f*cosf(2.0f*kx*ix) -
0.01168f*cosf(3.0f*kx*ix);
}
}
/* The windowing function here is that of James Kaiser. This has a number
* of desirable features. The sidelobes drop off as the frequency away from a transition.
* Also, the tradeoff of sidelobe level versus cutoff rate is variable.
* Here we specify it in terms of kdb, the highest sidelobe, in dB, next to a sharp cutoff. For
* calculating the windowing vector, we need a parameter beta, found as follows:
*/
void useKaiserWindow(float kdb) {
float32_t beta, kbes, xn2;
mathDSP_F32 mathEqualizer; // For Bessel function
if (kdb < 20.0f)
beta = 0.0;
else
beta = -2.17+0.17153*kdb-0.0002841*kdb*kdb; // Within a dB or so
// Note: i0f is the fp zero'th order modified Bessel function (see mathDSP_F32.h)
kbes = 1.0f / mathEqualizer.i0f(beta); // An additional derived parameter used in loop
for (int n=0; n<NFFT_D2; n++) {
xn2 = 0.5f+(float32_t)n;
float kx = 4.0f/(((float)NFFT_M1) * ((float)NFFT_M1)); // 0.00000382215877f for 1024
xn2 = kx*xn2*xn2;
window[NFFT_D2M1 - n]=kbes*(mathEqualizer.i0f(beta*sqrtf(1.0-xn2)));
}
}
};
#endif
C++:
/* analyze_fft_hd1024.cpp Converted from Teensy I16 Audio Library
* and then back to I16 audio format.
* This version uses integer I16 inputs. See comments at analyze_fft1024_F32.h
* Converted to use half-length FFT 17 March 2021 RSL
*
* Conversion parts copyright (c) Bob Larkin 2021
*
* Audio Library for Teensy 3.X
* Copyright (c) 2014, Paul Stoffregen, paul@pjrc.com
*
* Development of this audio library was funded by PJRC.COM, LLC by sales of
* Teensy and Audio Adaptor boards. Please support PJRC's efforts to develop
* open source software by purchasing Teensy or other PJRC products.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice, development funding notice, and this permission
* notice shall be included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include <Arduino.h>
#include "analyze_fft_hd1024.h"
// Move audio data from an int16_t audio_block_t to the float FFT instance buffer.
// This is for 128 numbers per block, only. We are using a 512 point complex FFT to
// compute a 1024 point real FFT. The CMSIS library wants the complex input data
// as 2*fftLen interleaved values as
// {real[0], imag[0], real[1], imag[1],..}
// Conveniently, the half-size FFT algorithm (see Brigham) wants the 1024 real
// inputs as even numbered indices in the real array and odd numbered indices in the
// imag array. Therefore we just take the input data directly to fft_buffer[] and
// it is in the right order to produce 512 pairs of complex inputs.
//static void copy_to_fft_buffer(float32_t *destination, const void *source) {
// void int16_t *src = (const int16_t *)source;
// float *dst = (float *)destination;
static void copy_to_fft_buffer(float32_t *destination, int16_t *source) {
int16_t *src = (int16_t *)source;
float *dst = destination;
for (int i=0; i < 128; i++) {
*dst++ = 0.000030517578*(float32_t)*src++; // real sample for half-length FFT
}
}
void AudioAnalyzeFFT_HD1024::update(void) {
audio_block_t *block;
float outDC=0.0f;
float magsq=0.0f;
block = AudioStream::receiveReadOnly();
if (!block) return;
switch (state) {
case 0:
copy_to_fft_buffer(fft_buffer + 0, block->data);
state = 1;
break;
case 1:
copy_to_fft_buffer(fft_buffer + 128, block->data);
state = 2;
break;
case 2:
copy_to_fft_buffer(fft_buffer + 2*128, block->data);
state = 3;
break;
case 3:
copy_to_fft_buffer(fft_buffer + 3*128, block->data);
state = 4;
break;
case 4:
// Do in-place FFT.
#if defined(__IMXRT1062__)
// Teensyduino core for T4.x supports arm_cfft_f32
// arm_cfft_f32 (const arm_cfft_instance_f32 *S, float32_t *p1, uint8_t ifftFlag, uint8_t bitReverseFlag)
arm_cfft_f32 (&Sfft, fft_buffer, 0, 1);
#else
// For T3.x go back to old (deprecated) style (check radix2/radix4)<<<
arm_cfft_radix2_f32(&fft_inst, fft_buffer);
#endif
// FFT output is now in fft_buffer.
// Now the post FT processing for using half-length transform
// FFT was in state==7, but it loops around to here. Does some
// zero data at startup that is harmless. See Brigham, p166.
count++; // Next do non-coherent averaging
for(int i=0; i<NFFT_D2; i++) {
if(i>0) {
float rns = (fft_buffer[2*i] + fft_buffer[NFFT-2*i]);
float ins = (fft_buffer[2*i+1] + fft_buffer[NFFT-2*i+1]);
float rnd = (fft_buffer[2*i] - fft_buffer[NFFT-2*i]);
float ind = (fft_buffer[2*i+1] - fft_buffer[NFFT-2*i+1]);
float xr = rns + cosN[i]*ins - sinN[i]*rnd; // Real output
float xi = ind - sinN[i]*ins - cosN[i]*rnd; // Imaj output
magsq = xr*xr + xi*xi; // power
}
else {
magsq = outDC*outDC; // The DC term power
}
if(count==1) // Starting new average
sumSq[i] = magsq;
else if (count<=nAverage) // Adding on to average
sumSq[i] += magsq;
} // End, loop over all 512 powers. fft_buffer[] is now free
// Shift data for overlapping FFT
for(int ii=0; ii<512; ii++)
fft_buffer[ii] = saveHalf[ii];
// Get input data to fft_buffer[]
copy_to_fft_buffer(fft_buffer + 4*128, &block->data[0]);
state = 5;
break;
case 5:
copy_to_fft_buffer(fft_buffer + 5*128, block->data);
if (count >= nAverage) { // Average is finished
// Set outputflag false here to minimize reads of output[] data
// when it is being updated.
outputflag = false;
count = 0;
float inAf = 1.0f/(float)nAverage;
float kMaxDB = 60.206; // 20.0*log10f((float)NFFT)
for(int i=0; i<NFFT_D2; i++) {
if(outputType==FFT_RMS)
output[i] = sqrtf(inAf*sumSq[i]);
else if(outputType==FFT_POWER)
output[i] = inAf*sumSq[i];
else if(outputType==FFT_DBFS) {
if(sumSq[i]>0.0f)
output[i] = 10.0f*log10f(inAf*sumSq[i]) - kMaxDB; // Scaled to FS sine wave
else
output[i] = -193.0f; // lsb for 23 bit mantissa
}
else
output[i] = 0.0f;
} // End, set output[i] over all NFFT_D2
outputflag = true;
} // End of average is finished
state = 6;
break;
case 6:
copy_to_fft_buffer(fft_buffer + 6*128, block->data);
state = 7;
break;
case 7:
copy_to_fft_buffer(fft_buffer + 7*128, block->data);
for(int ii=0; ii<NFFT_D2; ii++)
saveHalf[ii] = fft_buffer[ii + 512];
// Apply windowing to data. window[] is symmetrical about the middle
if(pWin) {
for(int ii=0; ii<512; ii++) {
fft_buffer[ii] *= window[ii];
fft_buffer[NFFT_M1 - ii] *= window[ii];
}
}
// Special case is DC
outDC = 0.0f;
for(int i=0; i<NFFT; i++)
outDC += fft_buffer[i];
outDC /= ((float)NFFT);
state = 4; // Go back around for new 512 points, 50% data overlap
break;
} // End switch(state)
AudioStream::release(block); // We are now storing in fft_buffer, not block array
} // End update()
I have been using this and it works for me, so I wanted to make it available to others.