This is a new thread about an optimized version of the fft. In this case the 256 point fft.
I decided not to continue the old thread about the 1024 point fft, since a lot of the discussion was oriented towards general fft usage and related items, and not necessarily about the optimized version.
I did not test all possible average values extensively. If anybody wants to test, please report any errors.
Anyway here are the results for the 256 point FFT:
The code can be optimized further, I could not get the compiler to generate the code I wanted. By inserting assemly code, I could easily save another 2400 cycles.
I mentioned earlier about more balanced cpu usage over the cycles. I eventually abandoned that idea, since the code became very unreadible (yes even more than my code in the attachment). The cpu usage for 8 time averaging is alternatingly 4% and 10% between cycles.
As to what I did
copying data: writing two values at the same time saved little.
applying window: again applying the window for two values at once.
fft: by using a complex fft for two real fft's simultaneously, I save 50%
integration: No need to disentangle the fft spectra in most cases.
output: replaced the division by a multiplication
Defragster mentioned something in this post about not being able to verify the improvement. The following example shows in my eyes a reasonable way to verify. As said the cycles are alternatingly 4% and 10%, so 7% on average. Since the original fft was 14%, this matches nicely with the 50% reduction in cycles. The free CPU time on average therefore goes from 86% to 93%, so an 8% increase. By executing the following example, the counter is able to count up to 178070 and 191350, for respectively the old and the optimized version (@72 MHz). So based on a counter, I see a 7.5% increase, which matches nicely with the predicted value.
Finally the code is attached. It is a drop in replacement for the analyze_fft256.c/.h files.
Edit: For the latest code, see this post
I decided not to continue the old thread about the 1024 point fft, since a lot of the discussion was oriented towards general fft usage and related items, and not necessarily about the optimized version.
I did not test all possible average values extensively. If anybody wants to test, please report any errors.
Anyway here are the results for the 256 point FFT:
Code:
// all numbers for averaging of 8.
// All cycles are approximate figures based on the cycle counter.
// other interrupt/dma sources have not been explicitly disabled while testing.
//
// old optimized delta
// copying data 11432 8688 -24%
// applying window 22816 11472 -50%
// fft 180496 90248 -50%
// integration+output 17506 11080 -36%
//
// total 232250 121488 -47%
//
// max cpu @ 72 MHz 14% 10% -29% or -4%point
I mentioned earlier about more balanced cpu usage over the cycles. I eventually abandoned that idea, since the code became very unreadible (yes even more than my code in the attachment). The cpu usage for 8 time averaging is alternatingly 4% and 10% between cycles.
As to what I did
copying data: writing two values at the same time saved little.
applying window: again applying the window for two values at once.
fft: by using a complex fft for two real fft's simultaneously, I save 50%
integration: No need to disentangle the fft spectra in most cases.
output: replaced the division by a multiplication
Defragster mentioned something in this post about not being able to verify the improvement. The following example shows in my eyes a reasonable way to verify. As said the cycles are alternatingly 4% and 10%, so 7% on average. Since the original fft was 14%, this matches nicely with the 50% reduction in cycles. The free CPU time on average therefore goes from 86% to 93%, so an 8% increase. By executing the following example, the counter is able to count up to 178070 and 191350, for respectively the old and the optimized version (@72 MHz). So based on a counter, I see a 7.5% increase, which matches nicely with the predicted value.
Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
AudioInputI2S i2sin;
AudioAnalyzeFFT256 fft;
AudioConnection pc1(i2sin, fft);
void setup() {
AudioMemory(12);
fft.averageTogether(8);
}
void loop() {
uint32_t cntr = 0;
// make sure we start at a proper point
while (!fft.available());
// just keep counting until next fft data available
while (!fft.available())
cntr++;
// finally output data
Serial.println(cntr);
}
Finally the code is attached. It is a drop in replacement for the analyze_fft256.c/.h files.
Edit: For the latest code, see this post
Attachments
Last edited: