FFT Number of samples

Status
Not open for further replies.

imagiro1

Member
I think I've been thinking about this wrong. In the FFT_1024 function, what does the 1024 refer to? Originally I thought it referred to the buffer size is 1024. But that doesn't make sense if you can run FFT_1024 on an audio signal (44kHz). So does 1024 refer to the bin size of the sampling frequency - 44kHz/1024 ?

If so, is it possible to specify the buffer size or how much time to sample before preforming the FFT?

My idea/project involves reducing the sampling frequency, preforming FFTs every X number of milliseconds, and trying to bin the results. This idea isn't really based around audio. But the audio library is perfect for sampling/FFT/Filtering/etc.

Thanks.
 
In the FFT_1024 function, what does the 1024 refer to?

1024 is the number of audio samples it analyzes. At 44.1 kHz, each FFT result represents the spectrum based on 23.2 ms of time. By default a Hanning window scaling is applied, so you're getting results that mostly represent the middle ~13 ms of that time. This is covered in detail in the tutorial. Check out pages 27-29 in the tutorial PDF, or watch that section of the video walkthrough.

https://www.pjrc.com/store/audio_tutorial_kit.html

Fundamentally speaking, FFT math is complex. Not necessarily "complex" as in hard to understand, though it certainly can seem that way from the terribly written explanations in many academic textbooks, and sites like wikipedia which essentially copy textbooks rather than trying to explain. FFT is "complex" in that the numbers both input and output are real+imaginary numbers (or 2D vectors or single frequency waveforms or however you like to think of real+imaginary numbers).

The FFT1024 feature in the Teensy audio lib deals only with ordinary real numbers. Internally it's feeding audio data into the FFT as real numbers. The imaginary part of the input is set to zero. The FFT math gives a complex (real+imaginary) number output for each frequency bin. Conceptually, each frequency has an amplitude (or "magnitude" would be the more mathematically correct term) and a phase shift which is relative to the 23.2 ms time period where the analysis was done. That is *why* FFT output must be real+imaginary numbers; you simply can't represent amplitude *and* phase shift with a single number! The audio library's FFT1024 is written with the assumption you're doing music visualization or other spectral analysis where you only care about how intense each frequency bin is, but you couldn't care less about the relative timing or phase shift between each frequency bin. So it combines the real & imaginary numbers into only a single "magnitude" output for each bin. The downside is you can't get the phase info (at least not using the simple object from the library) but the library is simpler to use for most ordinary projects where the phase info isn't important.

FFT has a special property if you give it only real numbers input (the imaginary part of all 1024 inputs are zero). The 2nd half of the output is a redundant mirrored copy of the first half. So you put in 1024 real-only numbers, and you get out 1024 real+imaginary complex numbers out, but only the first half of those numbers are meaningful. Many textbooks go into rigorous proofs of this property, which is great if you're a mathematician, but a distraction if you only want to learn how to actually use FFT. Rather than talk in terms of hard-to-follow equations, I'll briefly mention this is related to Nyquist sampling theory, which says time-sampled (real, not imaginary numbers) data can only represent frequencies up to 1/2 of the sample rate. Again textbooks go into too much math to prove a point I believe pretty much everyone accepts, that 44 kHz sample data represents 22 kHz of audio bandwidth.

This is the reason why virtually all FFT implementations give you half as many numbers output as the number you see written in their descriptions. FFT1024 means it takes in 1024 audio samples. You get 512 (or sometimes 513 depending on the code) frequency bin numbers output.

Hopefully this explains what the 1024 means. If you haven't read the tutorial or watched the video, please do. I put quite a bit more info in there about the practical realities of actually using FFT. Unless you're analyzing waveforms which are perfectly phase sync'd to the FFT (which is pretty much never for any ordinary signals), you really do need to use a window to avoid the spectral leakage problem. Again, hopefully I explained that well enough in the tutorial material?
 
My idea/project involves reducing the sampling frequency,

Something to remember is the need for filtering before discarding samples, if you're going to turn something like 44.1 kHz sampled data into a slower rate like 11.025 kHz. To achieve good results, you must first low-pass filter. Conceptually (without difficult math to rigorously prove the point) the original signal has 22 kHz bandwidth. You need to reduce the bandwidth to the 5.5 kHz range. Only then can you discard 3 out of every 4 samples. If you neglect the filtering, any bandwidth from 5.5 to 22 kHz becomes terrible aliasing and utterly corrupts the DC to 5.5 kHz signal.

Fortunately there are 3 types of filters in the audio lib you can use pretty easily. Many people have neglected this step when writing code from scratch, in a sort of wishful thinking all that extra computation can be avoided. It can't. Nyquist sampling theory applies!
 
Hi Paul,

your explanation of the FFT is really great and helpful!

Maybe it could be put at a more prominent place, ie. the description of the FFT256 / FFT1024 in the Audio GUI object?

I wish I had read something like that before performing my own first FFT ;-). Would have saved me a lot of time and frustration (the tutorial came into existence later) :).

All the best,

Frank DD4WH
 
Excellent. I couldn't agree more with the filtering and imaginary numbers.

Windowing is still hard for me to visualize. This link helped me clear things up a bit.
https://elsl.ooo/2016/02/05/short-time-fourier-windowing.html

Hopefully you can clear up a few more things.

Per the manual, "the Teensy Audio Library uses 50% overlap in its fft1024 object". Is this only if I use a Windowing function or always?

Let's say for example I'm trying to detect a "woof" or a "meow" using a mic. As FFTs come rolling in, I look at the 100Hz bin and 1kHz bin to decide which sound I picked up (I know this isn't correct). Since looking at 2 bins isn't enough, I start looking at more and more frequencies to decide on the sound. This is more and more power the uC uses.

I would like to keep the power consumption down. So things I see to look at are overlap and sampling rate. Obviously putting the uC to sleep would be done. I've looked at changing the sampling rate. But can the overlap be changed?

I haven't looked at the power consumption of the uC. I think in the long run, I would have to move to something smaller (K22F?). I'm really using it because this an audio-ish based project and your Audio library (I think) is rather perfect and easy to use.

Thanks again for all the hard work!
 
Per the manual, "the Teensy Audio Library uses 50% overlap in its fft1024 object". Is this only if I use a Windowing function or always?

Always.

No window scaling (or "rectangular" window) is almost never useful for this sort of system where you're analyzing music or other real-time sounds.


But can the overlap be changed?

If "can" includes the ability to edit the audio library source, perhaps make an alternate FFT1024 object, then yes.

If "can" only allows for calling the functions provided by the library, then no, not so much.
 
A Fourier Transform assumes the data you are trying to analyze is periodic. If the the beginning point and the ending point are not the same value (as it would be if the data were truly periodic) then you have introduced a discontinuity, or a step, in your data. Think about your data as undulations along a string where the Fourier Transform tries to tie the ends of the string together. If the beginning point of your dataset is not identical to the endpoint, you have inadvertently introduced a step in your data. The spectral response of a step function is sin(x)/x, which looks like a lot of ringing in the spectrum. This discontinuity is called Gibbs Phenonomena. In order to avoid the introduction of this “ringing” in the frequency spectrum, you multiply your dataset with a window which is designed to “gracefully” force the beginning point and endpoint to the same value (of zero) without significantly influencing the spectral characteristics of your data. I hope this explanation helps to visualize what is actually occurring when you do the calculation.
 
Status
Not open for further replies.
Back
Top