This is a new thread about an optimized version of the fft. In this case the 256 point fft.

I decided not to continue the old thread about the 1024 point fft, since a lot of the discussion was oriented towards general fft usage and related items, and not necessarily about the optimized version.

I did not test all possible average values extensively. If anybody wants to test, please report any errors.

Anyway here are the results for the 256 point FFT:

The code can be optimized further, I could not get the compiler to generate the code I wanted. By inserting assemly code, I could easily save another 2400 cycles.Code:// all numbers for averaging of 8. // All cycles are approximate figures based on the cycle counter. // other interrupt/dma sources have not been explicitly disabled while testing. // // old optimized delta // copying data 11432 8688 -24% // applying window 22816 11472 -50% // fft 180496 90248 -50% // integration+output 17506 11080 -36% // // total 232250 121488 -47% // // max cpu @ 72 MHz 14% 10% -29% or -4%point

I mentioned earlier about more balanced cpu usage over the cycles. I eventually abandoned that idea, since the code became very unreadible (yes even more than my code in the attachment). The cpu usage for 8 time averaging is alternatingly 4% and 10% between cycles.

As to what I did

copying data: writing two values at the same time saved little.

applying window: again applying the window for two values at once.

fft: by using a complex fft for two real fft's simultaneously, I save 50%

integration: No need to disentangle the fft spectra in most cases.

output: replaced the division by a multiplication

Defragster mentioned something in this post about not being able to verify the improvement. The following example shows in my eyes a reasonable way to verify. As said the cycles are alternatingly 4% and 10%, so 7% on average. Since the original fft was 14%, this matches nicely with the 50% reduction in cycles. The free CPU time on average therefore goes from 86% to 93%, so an 8% increase. By executing the following example, the counter is able to count up to 178070 and 191350, for respectively the old and the optimized version (@72 MHz). So based on a counter, I see a 7.5% increase, which matches nicely with the predicted value.

Finally the code is attached. It is a drop in replacement for the analyze_fft256.c/.h files.Code:#include <Audio.h> #include <Wire.h> #include <SPI.h> #include <SD.h> AudioInputI2S i2sin; AudioAnalyzeFFT256 fft; AudioConnection pc1(i2sin, fft); void setup() { AudioMemory(12); fft.averageTogether(8); } void loop() { uint32_t cntr = 0; // make sure we start at a proper point while (!fft.available()); // just keep counting until next fft data available while (!fft.available()) cntr++; // finally output data Serial.println(cntr); }

Edit: For the latest code, see this post