Forum Rule: Always post complete source code & details to reproduce any issue!

# Thread: 512 point FFT, maybe using latest CMSIS

1. ## 512 point FFT, maybe using latest CMSIS

The analyze_fft1024.cpp/h class uses arm_cfft_radix4_q15 from an old version of CMSIS. There is a corresponding function for f32, which is what I am interested in here. In the old CMSIS functions, it seems like 256 and 1024 long FFTs are supported, but not length 512. I would like to do FFTs of length 512 on Teensy 4 and I think this is supported by arm_cfft_f32.c in the current version of CMSIS, but I do not quite know how to best bring this into my project, or find some other way of (efficiently) doing a 512-length FFT.

Is there an easy way of doing efficient FFTs of length 512 using the available libraries?

Or, is there some reasonably straightforward way of making use of the functions from the newer CMSIS library together with Teensy 4 in the Arduino environment on Windows? The whole CMSIS is huge and I do not quite know where to start. Should it be used as a library in the Arduino environment, or should it be put under src in my project?

2. I also looked into this a while back, see threads
and

3. By the way the reason for this:
Originally Posted by PerM
it seems like 256 and 1024 long FFTs are supported, but not length 512.
is that the library routine used is radix4 - it deals only in powers of 4, not all powers of 2. The recursive
decomposition in radix 4 requires only sign changes and swapping real<->imaginary since the 4th
roots of unity are 1, i, -1, -i (or 1, j, -1, -j if you use engineering notation). The radix 4 FFT is somewhat more
efficient than radix 2 as fewer stages are needed.

Hopefully at least one of the threads I link mentions the real fft primitives arm_xxxxx_rfft_xxx, which
can be more efficient in space and time for real signals. The real FFT of size N uses a complex FFT of
size N/2 internally so is more efficient than naive use of the FFT. The FFT is an inherently complex transform.

Note that the q15 versions of FFT lose a lot of precision applied to 16 bit signals, you should consider the
q31 or f32 versions if you don't want to lose information.

4. Thanks Mark!

You are right that my input data is real, so rfft functions would be a better choice than cfft. But I do want to use floats and the arm_rfft_init_f32/arm_rfft_f32 functions do still seem to have the radix4 limit, even though the corresponding q31 functions might not. I have not tested yet, but my guess would be that the newer arm_rfft_fast_f32 is not supported out of the box.

Am I right in concluding that your examples (e.g. the zip archive) does not make use of the newest CMSIS, but could anyway support 512-length FFTs since the fix-point arm_rfft_init_q31 you use does not seem to have the radix4 limitation? Or did you find some way of importing CMSIS 1.9.0 which seems to be the current version?

5. I've only used the system as it comes out of the box I believe there are various issues with upgrading CMSIS
that affect other parts of the system - Paul will know the details. I used arm_rfft_q31 which is radix2.

However you want 512 which is supported by arm_rfft_f32 (this is implemented on top of a 256 point complex FFT),
the real FFTs implemented ontop of radix 4 complex FFTs are of size (4^n)*2

6. It seems like arm_rfft_fast_f32 is indeed supported! And it can handle the following FFT lengths: 32, 64, 128, 256, 512, 1024, 2048, 4096.
This is great since it means I can write code that can easily support a few lengths (particularly 256, 512 and 1024) that differ by a factor of 2 rather than 4.

I have not yet changed my code to use arm_rfft_fast_f32, but it should be reasonably straightforward.

Thanks again Mark for pointing me in the directions of the rfft functions.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•