lychrel – Audio pitch detection is a key part of my application. My evolving understanding of FFT (and Teensy’s use of it) may help you. I hope others will confirm my understanding (or refute/improve), and help me extend it.
Here’s my train of thought:
FFT is a specialized flavor of a DFT (discrete Fourier transform).
- A fairly-readable overview can be found at: blogs.mathworks.com/steve/2014/09/23/sinusoids-and-fft-frequency-bins/
- (I’ve read a dozen such overviews over the last year… I’ve found that my view of “fairly-readible” evolves.)
There are key characteristics (constraints) of FFT and the Teensy Audio Library when being used for audio frequency discovery:
- Paul S. chose to leverage the sampling rate of 44,100 to be equal to an audio CD’s sample rate. Choosing a single sample rate throughout the audio library allowed simpler design, when compared with a flexible sample rate.
- When using FFT, the maximum detectible frequency is ½ of the sample rate (someone named Nyquist gets credit for asserting this). This determines the maximum frequency discoverable by Teensy’s Audio Library at 44,100/2 = 22,050 Hz.
- FFT uses a concept of “bins” to store frequency “hits” as it uses math to estimate one or more sine waves that, if added together, would reproduce the input sound wave.
- - The number of bins used by FFT is flexible, and the Audio Library offers two options: 256 and 1024 bins. Choosing a number of bins that is a power-of-two allows dramatic math efficiencies – making it the fast Fourier transform (FFT). In these power-of-two cases, many of the calculations can be skipped, since many calculations amount to “multiplying by 1.”
- The FFT “bin size” is calculable by [Sample Rate]/[Number-of-Bins]. So, on Teensy Audio, using the FFT1024 object, bin size = 44.3 Hz/bin (from 44,100 Hz/1024 bins).
- [Disclaimer: my chosen terms may not be precise – audio engineers please correct me where it adds value. I hope my high-level picture is correct, despite imprecise terms]
- With a bin size of 43 Hz, I have been able to detect the pitch of sine waves above about 200Hz, with roughly 1% accuracy. (That is, with an input sine wave of 440 Hz; fed into an FFT1024; plus some additional calculations… I arrive at a frequency between 436 and 444 Hz).
- Below about 5 times the bin size, frequency determination is more variable or impossible in my hands.
- Smaller bin size (and presumably, lower detectible pitch) can be achieved by changing one of the two numbers that affect bin size:
- Increase number of bins – 1024 is maximum easy bin number in Teensy Audio
- Decrease Sample Rate (paradoxical, but true) – Sample rate on Teensy Audio is fixed at 44,100 in Teensy Audio
So: to do this using FFT1024 in Teensy Audio, Paul suggests that our best choice is to decimate (maybe not the precise word) the samples. Decimating in powers of 2 provides some advantages in tracking the sample batches (use every 2nd sample, every 4th sample, every 8th sample, etc.).
Teensy offers a batch size of 128 samples.
Decimation is not built in to the Audio Library, but could be added (I think). It would require using a pair of “queue” objects – one of output type and one of input type – doing the decimation between the two queue objects.
The following table lists the possible results of decimation on parameters that matter to my solution:
Hence the trade-off between low-note-detectability (small bin size), and time-to-acquire-sample.
In parallel, here’s what I’ve found for frequencies of some musical inputs:
Trumpet (or soprano voice): Min: 160 Hz Max: 1880 Hz
Trombone (or bass voice): Min: 80 Hz Max: 700 Hz
Tuba 40 Hz? one octave lower?
So, for my brass instrument work:
- I’d like to detect pitch correctly down to less than 100 Hz (to use the full range of trumpet or trombone)
- I want to be able to detect a note change as fast as possible. In the case of a trumpet, faster than 10 times per second would be best. Good trumpet players can run scales at least that fast.
- So my conclusion: For a trumpet pitch detector, I think want to decimate by 2, resulting in a 100 Hz lowest-detectible-pitch, and a time-to-collect-samples of 46.4 ms.
For this reason, I think it’s worth implementing the queue→decimate→queue concept.
It must be fast – in addition to the time to decimate, code on the Teensy will need to do the following:
- sample-acquisition time (table above),
- FFT calculation (audio library)
- FFT interpretation (my code)
- Other application processing (my code)
Paul and other experts… am I on the right track?
Dave