# Possible to modify Teensy Audio library to add other sized FFT's?

• 09-09-2019, 04:32 PM
gregallenwarner
Possible to modify Teensy Audio library to add other sized FFT's?
I've been looking into the Teensy Audio library for my project, and I need 4 FFT's running on 4 analog inputs in parallel. I'm concerned with latency, as what I'm doing more or less requires realtime response to changes in frequencies. I'm not sure what the latency of the current FFT is, but I noticed the only options are a 256-point and a 1024-point FFT. I'm concerned neither of these will be fast enough. I'm less concerned with resolution and more concerned with getting the response time down to a few milliseconds, if that's possible. I'm running on a Teensy 3.6 if that makes any difference.

Is it possible to modify the Teensy Audio library to add additional FFT objects of varying size, such as a 64-point FFT? Also, how would I calculate the length of time it takes the Teensy to finish processing one frame of data, depending on the number of points in the FFT?

Thanks.
• 09-09-2019, 05:42 PM
WMXZ
As it is designed, the audio library works in blocks of 128 (16-bit) samples so latency is 128/44.1 = 2.9 ms
if you process data with 50% overlap you end up with four 256pt FFTs per 2.9 ms.

For reducing latency to less than 2.9 ms, you will need to modify audio library (1 constant)

concerning FFT speed there should be no issue, if you are not requiring double float (16-bit integer is fastest)
the number of FFT operations is typically somewhere between N*log2(N) and N*log8(N) complex multiply-add operations depending on which radix the FFT is implementing

so 64 pt FFT would be about 2*N=128 operations, a 512 pt FFT would be 3*N = 1536 operations (they can be done with radix 8 FFTs)
other FFTs need a mixed radix implementation or a lower radix implementation, but the lower the radix, the slower the FFT.
• 09-09-2019, 05:47 PM
gregallenwarner
Thanks for the info. 2.9 ms is plenty fast I think for my application.

Is there anything I need to do to specify that the library utilize 50% overlap? Or does it do this by default? Also, integer math is fine, so do I need to specify that so it doesn't use float?

Thanks again.
• 09-09-2019, 07:10 PM
WMXZ
As I recall the 256 point FFT is indeed 50% overlap, but you may check in the audio library
Anyhow it is good to start the example codes, audio GUI, and having a look into the code of different audio objects. They work all with the same data driven paradigma.
• 09-09-2019, 08:02 PM
defragster
Looking at FFT256 once for speed I found it sets averaging to 8 by default so that needs to be changed to actually complete faster than an FFT1024.

See: ...\hardware\teensy\avr\libraries\Audio\analyze_ff t256.h

This needs to be called with '1': void averageTogether(uint8_t n) {

This shows the default setting of 'naverage = 8;' to fix with call above, Also there is this - that suggest smaller 64 blocks possible?:
Code:

```class AudioAnalyzeFFT256 : public AudioStream { public:         AudioAnalyzeFFT256() : AudioStream(1, inputQueueArray),           window(AudioWindowHanning256), count(0), outputflag(false) {                 arm_cfft_radix4_init_q15(&fft_inst, 256, 0, 1); #if AUDIO_BLOCK_SAMPLES == 128                 prevblock = NULL;                 naverage = 8; #elif AUDIO_BLOCK_SAMPLES == 64                 prevblocks[0] = NULL;                 prevblocks[1] = NULL;                 prevblocks[2] = NULL; #endif```
• 09-09-2019, 08:17 PM
gregallenwarner
Good find! Thanks for that info.

How would I implement no averaging in my code? Is there a #DEFINE or a function parameter to control this? Or would I need to modify the library files directly?
• 09-09-2019, 08:25 PM
defragster
noted in p#5:: This needs to be called with '1': void averageTogether(uint8_t n) {

So in setup() :: myFFT.averageTogether(1);

That will have each read count as a data point rather than the default 8 reads for average before calling it data. Wow - my sketch for that is last touched March 2016.
• 09-09-2019, 08:29 PM
gregallenwarner
Ok, I see it now. Thanks!
• 09-10-2019, 06:25 AM
WMXZ
Quote:

Originally Posted by defragster
Also there is this - that suggest smaller 64 blocks possible?:

That is the setting for TLC
it is defined in AudioStream.h
Code:

```#ifndef AUDIO_BLOCK_SAMPLES #if defined(__MK20DX128__) || defined(__MK20DX256__) || defined(__MK64FX512__) || defined(__MK66FX1M0__) #define AUDIO_BLOCK_SAMPLES  128 #elif defined(__MKL26Z64__) #define AUDIO_BLOCK_SAMPLES  64 #endif #endif```
So inserting
Code:

`#define AUDIO_BLOCK_SAMPLES  64`
before these lines would do the trick
• 09-10-2019, 07:09 AM
defragster
Quote:

Originally Posted by WMXZ
That is the setting for TLC
it is defined in AudioStream.h

...

Cool - I didn't go looking after it caught my eye.
• 09-10-2019, 01:21 PM
gregallenwarner
Quote:

Originally Posted by WMXZ
Code:

```#ifndef AUDIO_BLOCK_SAMPLES #if defined(__MK20DX128__) || defined(__MK20DX256__) || defined(__MK64FX512__) || defined(__MK66FX1M0__) #define AUDIO_BLOCK_SAMPLES  128 #elif defined(__MKL26Z64__) #define AUDIO_BLOCK_SAMPLES  64 #endif #endif```

I noticed all the #if's look somewhat like chip model numbers, is that right? So is there anything particularly hardware-dependent when it comes to the block size? For instance, how the library utilizes DMA channels, or some other thing? Or does this merely have to do with the amount of SRAM these chips have available?

I feel like I don't really know what I'm talking about; I'm just speculating. I'm just cautious that changing something in the library doesn't break my Teensy setup.
• 09-10-2019, 04:14 PM
WMXZ
Quote:

Originally Posted by gregallenwarner
I noticed all the #if's look somewhat like chip model numbers, is that right? So is there anything particularly hardware-dependent when it comes to the block size? For instance, how the library utilizes DMA channels, or some other thing? Or does this merely have to do with the amount of SRAM these chips have available?

I feel like I don't really know what I'm talking about; I'm just speculating. I'm just cautious that changing something in the library doesn't break my Teensy setup.

Yes, that are the different Teensy models, and yes, the limitation is RAM, but don't be shy to modify code, nothing will break. There is a button to reprogram a crashing Teensy.
I use on a T3.6 my own version of AudioStream.h with a block size of 6*128 int32 samples (compared to 128 int16 samples)
• 09-10-2019, 05:08 PM
gregallenwarner
So if I change AUDIO_BLOCK_SAMPLES to 64, what effect would that have on the AudioAnalyzeFFT256 object? Would it still perform a 256 point FFT, just now called every 64 samples instead of 128, resulting in 75% overlap?

Also, how much processing power does the Teensy 3.6 have? Is it fast enough to compute four separate 256-point FFT's every 64 samples? Would I have much power left to run anything like displays or sound generation?
• 09-11-2019, 07:50 AM
defragster
Quote:

Originally Posted by gregallenwarner
So if I change AUDIO_BLOCK_SAMPLES to 64, what effect would that have on the AudioAnalyzeFFT256 object? Would it still perform a 256 point FFT, just now called every 64 samples instead of 128, resulting in 75% overlap?

Also, how much processing power does the Teensy 3.6 have? Is it fast enough to compute four separate 256-point FFT's every 64 samples? Would I have much power left to run anything like displays or sound generation?

Try it with the default 256 setup on samples perhaps - and pull the code from \hardware\teensy\avr\libraries\Audio\examples\Memo ryAndCpuUsage\MemoryAndCpuUsage.ino to see how much CPU is left? Then for the smaller blocks follow that #ifdef perhaps to see if it has any other special case, then as @WMXZ noted - give it a try. Changes can always be backed out or edited if it compiles and uploads and doesn't work and then go from there to use if it does.

There was a recent use of FFT4096 and that ran … but trying two of them IIRC needed more time than was available between sample sets and the 'MemoryAndCpuUsage' code showed that - search for that recent FFT4096 and it will have details showing what it can do when running those larger ones - ideally the smaller will have less trouble HERE IS THE THREAD - T_3.6 can do a 4096 and a 1024 - but not two 4096's as it was tried …