Genericize FFT? Also window functions representation

MarkT

Well-known member
Could the AudioAnalyzeFFT256/1024 not be subclasses of a generic FFT class, able to handle a wider range of powers of two?

I was looking at the implementation using arm_cfft_radix4_q15, whereas the fastest method seems to be arm_rfft_q15
as only real values are fed to the FFT? I benchmarked rfft as faster than cfft on Teensy 4.0 by about 50% - although it may be less
numerically accurate due to the pre- and post-processing.

The arm_cfft_radixX_XXX functions are also deprecated, according to the documentation I found. The other methods like
arm_cfft_q15 and arm_rfft_q15 allow powers of 2 for FFT size, not limited to powers of 4.


The implementation of window functions as precompiled tables becomes an issue with this, if you want to support 32/64/128/...../4096/...
sizes you'd need to be able to generate them at runtime for the windows/sizes used.

I note the existing tables are wasteful of space in that they don't take advantage of the even symmetry of a window function.
They are also not DFT-even style which may cause some issues in some contexts.

And lastly the window functions aren't structs/objects so there's no record of their noise bandwidth factor which is needed
for measuring noise rather than signal peaks.
 
I have a fairly stable implementation now for an improved FFT along these lines, although it
turns out only the T4.x have a decent implementation of arm_rfft_q31 with a wide range of
points from 32..8192 (The T3 seems to be limited to 32/128/512 and there seem to be issues
with it (doesn't match T4 behaviour).

Branch is at https://github.com/MarkTillotson/Audio/tree/generic_fft
if you want to give it a go, theres docs and example now.

It auto-generates windows, knows about window processing gain and noise-bandwidth and compensates for these,
and you can read either power spectrum or power spectral density, as dB for power or magnitude for amplitude.
You can set the nominal full-scale voltage to calibrate should you want this.

Example included as Audio/Analysis/GenericFFT which compares it against AudioAnalyzeFFT1024 for a set of
fixed tones and noise-floor.
 
Comparisons of the generic fft v. the existing FFT1024, for several windows, with
input being 5 test tones at -1dB, -20dB, -40dB, -60dB and -80dB compared to full scale sine.

fft_comparisons.png

Notable is the variation in gain in FFT1024 with different windows and its poor noise floor.
 
The last time I played around with the ARM CMSIS FFT on the Teensy it was with the T3.6. I found that a Real FFT took about 1.02mSec for FFT1024 with a Hanning window. A friend is working over the same calculations and tested the code on a T4.1 and found that the Real FFT32 took about 0.243 mSec---roughly in proportion to the clock speed difference. The calculations were a pretty straightforward translation of a Matlab script for determining power density in turbulence calculations. We stuck with 32-bit floats at the time, since this was in the pre-T4.x era and we didn't have access to a double precision FPU.

I'm curious about the relative speeds of the rfft_q15 and rfft_fast_f32 versions on the T4.X. IIRC, the q15 versions were widely used before FPUs were available--but I'm not familiar with the history. I've only used the FFTs in scientific calculations and most often left Matlab to make the decisions about algorithms.
 
I think I did measure the performance of the various FFT types a while back on the T4, though I guess it matters more
on slower chips - I only have T4.0 and T3.2 at the moment to play with.

The theoretical number of bits needed for an 2^N-point FFT for a signal of M bits is close to M + N/2, since the
smallest possible signal is one LSB over 2^N samples, which gives a flat spectrum at -6M -3(N-1) dB, which for
16 bit signal and 1024 point FFT = -6.16 -3.9 = -123dB

Or put another way the quantization noise is -6M-2 = -98dB, and that's spread over 512 frequency bins,
so each sees 1/512 of that (-27dB relative to -98dB total noise = -125dB noise floor in the power spectrum)
[ the true noise power spectral density depends also on the window and sample rate ]

So to get full information from 16 bit and 1024 FFT needs 125/6 bits at least (21). The FFT1024 in the
audio library is actually 15 bits, not 16, as it uses q15 fixed point values which range from -2.0 to +2.0.

So you only get about -90dB noise floor at best from the data representation, though the q15 fft primitives
lose several more bits in practice due to rounding errors (and it doesn't compensate for window attenuation
- as that would overflow the q15 representation - so you might get another 5 to 13dB loss from that). Practical
measurements with FFT1024 and flattop window show a noise floor of -78dB or so, rather than the theoretical
possible -125dB.

Using Python I determined 24 bits is a good minimum number for representing FFT output at 1024 points and
16 bit data:

fft_precision.png
Note that due to the reasons above the actual Audio lib fft1024 is considerably worse than the worst of these
graphs. Using q31 versions of FFT I seem to be able to get about -110dB noise floor or thereabouts with
a window function.

The graphs use 6 test tones at -1/-20/-40/-60/-80/-100dB and some band limited noise (amplitude 10 LSBs
before low-pass filtering). Python code available if anyone wants.
 
Back
Top