I optimized the 1024 point fft a little.
In the processorUsageMax(), the percentage drops from 68% to a mere 12% (72 MHz teensy 3.1)
Most of this comes from distributing the processing power equally over all time-slots.
But also the FFT algorithm has changed. I could get it 30% more efficient.
This table presents the approximate number of cycles used in the current and the new version.
I think by unrolling loops and improving memory access, some 5000 cycles can still be gained.
I hope I caught all overflow problems due to bit-growth. If anybody wants to try, i would like to hear any of these problems.
The file can just replace the existing analyze_fft1024.cpp/h in the Audio library
In the header file only two changes were made, 2 more audio_block_t's and the reduction of the cmsis fft from 1024 to 256.
In the original file Paul(?) mentions a TODO
EDIT: for the latest code see post: https://forum.pjrc.com/threads/27905-1024-point-FFT-30-more-efficient-%28experimental%29?p=66678&viewfull=1#post66678
In the processorUsageMax(), the percentage drops from 68% to a mere 12% (72 MHz teensy 3.1)
Most of this comes from distributing the processing power equally over all time-slots.
But also the FFT algorithm has changed. I could get it 30% more efficient.
This table presents the approximate number of cycles used in the current and the new version.
Code:
old new delta
copying data: 8288 4452 -46%
applying window: 13073 9270 -23%
fft: 98030 67812 -31%
calc. magnitude: 12652 11856 -6%
I think by unrolling loops and improving memory access, some 5000 cycles can still be gained.
I hope I caught all overflow problems due to bit-growth. If anybody wants to try, i would like to hear any of these problems.
The file can just replace the existing analyze_fft1024.cpp/h in the Audio library
In the header file only two changes were made, 2 more audio_block_t's and the reduction of the cmsis fft from 1024 to 256.
In the original file Paul(?) mentions a TODO
I have some ideas to implement this, it might only take about 4000 cycles and has a variable length averaging. But I first need to do more testing on the core of the FFT.// TODO: support averaging multiple copies
EDIT: for the latest code see post: https://forum.pjrc.com/threads/27905-1024-point-FFT-30-more-efficient-%28experimental%29?p=66678&viewfull=1#post66678
Attachments
Last edited: