CMSIS-DSP library supports

it would be possible to integrate this library, as it is now designed the Teensy 3.0?
sorry if the question is stupid, but it is because I am a beginner in this world.
 
If you do anything with the math library, even pretty simple stuff, I hope you'll consider posting about it.

This library is pretty complex, so even pretty simple "how to" info might really help everyone who tries to use it.
 
If you do anything with the math library, even pretty simple stuff, I hope you'll consider posting about it.

This library is pretty complex, so even pretty simple "how to" info might really help everyone who tries to use it.

Here is an NXP link to some DSP benchmarks https://community.nxp.com/thread/327833 (fft, mult, sin, cos, fir, biquad cascade)
I ported some of the tests from the zip file and ran on T3.5@120mhz (IDE 1.8.5/1.42)

Code:
- arm_mult_f32         -  1.086 us ; // real float32        8
- arm_mult_f32         -  4.852 us ; // real float32       64
- arm_mult_f32         - 17.704 us ; // real float32      256
- arm_mult_f32         - 69.027 us ; // real float32     1024
- arm_mult_q31         -  1.671 us ; // real q31            8
- arm_mult_q31         -  6.891 us ; // real q31           64
- arm_mult_q31         - 24.553 us ; // real q31          256
- arm_mult_q31         - 95.128 us ; // real q31         1024
- arm_mult_q15         -  1.086 us ; // real q15            8
- arm_mult_q15         -  5.303 us ; // real q15           64
- arm_mult_q15         - 19.729 us ; // real q15          256
- arm_mult_q15         - 77.460 us ; // real q15         1024
- arm_sin_cos_f32      -  0.579 us ; // real float32                
- arm_sin_cos_q31      -  0.671 us ; // real q31_t                  
- arm_cfft_radix2_q15  -   54.8 us ; // real q15_t             64
- arm_cfft_radix2_q15  -  257.0 us ; // real q15_t            256
- arm_cfft_radix2_q15  - 1169.7 us ; // real q15_t           1024
- arm_cfft_radix4_q15  -   34.5 us ; // real q15_t             64
- arm_cfft_radix4_q15  -  166.5 us ; // real q15_t            256
- arm_cfft_radix4_q15  -  784.7 us ; // real q15_t           1024
- arm_cfft_radix2_q31  -  102.6 us ; // real q31_t             64
- arm_cfft_radix2_q31  -  516.1 us ; // real q31_t            256
- arm_cfft_radix2_q31  - 2489.1 us ; // real q31_t           1024
- arm_cfft_radix4_q31  -   73.6 us ; // real q31_t             64
- arm_cfft_radix4_q31  -  390.8 us ; // real q31_t            256
- arm_cfft_radix4_q31  - 1947.9 us ; // real q31_t           1024
- arm_cfft_radix2_f32  -   71.1 us ; // real float32_t         64
- arm_cfft_radix2_f32  -  361.2 us ; // real float32_t        256
- arm_cfft_radix2_f32  - 1710.7 us ; // real float32_t       1024
- arm_cfft_radix4_f32  -   44.1 us ; // real float32_t         64
- arm_cfft_radix4_f32  -  220.9 us ; // real float32_t        256
- arm_cfft_radix4_f32  - 1079.8 us ; // real float32_t       1024
 
Last edited:
Hi,

thanks for your tests!

it seems the thread and the link is very old and relates to a very old version of CMSIS, e.g. some/most of the functions are deprecated now and have been superseded by new functions (e.g. arm_cfft_f32).

There are (much?) faster and more accurate versions of CMSIS available now, which have been used on the Teensy, Jan has a detailed description on how to use it:

https://forum.pjrc.com/threads/4059...Defined-Radio)?p=129081&viewfull=1#post129081

All the best,

Frank DD4WH
 
it seems the thread and the link is very old and relates to a very old version of CMSIS, e.g. some/most of the functions are deprecated now and have been superseded by new functions

Tests were run with latest IDE 1.8.5/1.42. The arm_math.h for current IDE appears to be V1.1.0
 
Yes, you are right, the recent IDE contains a very old version of CMSIS (and thus arm_math.h).

Therefore I provided the link to a way how to use a newer version which provides faster (and more accurate) algorithms, eg. for FFTs.
 
No, they only apply to the version given.

But there was a recent post (a few weeks ago, if I remember correctly) of somebody succesfully updating to version 5 by only editing one or two more lines. Unfortunately I could not find it by a forum search . . .
 
Using the same benchmark (deprecated functions), and upgrading to v4.5 https://github.com/ARM-software/CMSIS, functions are faster
Code:
          T3.5@120mhz arm_math.h v1.4.5
          - arm_mult_f32         -  0.919 us ; // real float32        8
          - arm_mult_f32         -  5.248 us ; // real float32       64
          - arm_mult_f32         - 20.090 us ; // real float32      256
          - arm_mult_f32         - 79.456 us ; // real float32     1024
          - arm_mult_q31         -  1.187 us ; // real q31            8
          - arm_mult_q31         -  5.984 us ; // real q31           64
          - arm_mult_q31         - 22.431 us ; // real q31          256
          - arm_mult_q31         - 88.219 us ; // real q31         1024
          - arm_mult_q15         -  1.086 us ; // real q15            8
          - arm_mult_q15         -  5.179 us ; // real q15           64
          - arm_mult_q15         - 19.215 us ; // real q15          256
          - arm_mult_q15         - 75.363 us ; // real q15         1024
          - arm_sin_cos_f32      -  1.337 us ; // real float32
          - arm_sin_cos_q31      -  2.506 us ; // real q31_t
          - arm_cfft_radix2_q15  -   46.0 us ; // real q15_t             64
          - arm_cfft_radix2_q15  -  214.9 us ; // real q15_t            256
          - arm_cfft_radix2_q15  -  982.8 us ; // real q15_t           1024
          - arm_cfft_radix4_q15  -   27.0 us ; // real q15_t             64
          - arm_cfft_radix4_q15  -  136.4 us ; // real q15_t            256
          - arm_cfft_radix4_q15  -  668.6 us ; // real q15_t           1024
          - arm_cfft_radix2_q31  -  106.6 us ; // real q31_t             64
          - arm_cfft_radix2_q31  -  542.2 us ; // real q31_t            256
          - arm_cfft_radix2_q31  - 2643.6 us ; // real q31_t           1024
          - arm_cfft_radix4_q31  -   58.9 us ; // real q31_t             64
          - arm_cfft_radix4_q31  -  316.2 us ; // real q31_t            256
          - arm_cfft_radix4_q31  - 1585.1 us ; // real q31_t           1024
          - arm_cfft_radix2_f32  -   57.8 us ; // real float32_t         64
          - arm_cfft_radix2_f32  -  291.5 us ; // real float32_t        256
          - arm_cfft_radix2_f32  - 1417.6 us ; // real float32_t       1024
          - arm_cfft_radix4_f32  -   37.6 us ; // real float32_t         64
          - arm_cfft_radix4_f32  -  186.1 us ; // real float32_t        256
          - arm_cfft_radix4_f32  -  893.1 us ; // real float32_t       1024

For V5.3 I had to ifdef out all of hardware/teensy/avr/cores/teensy3/core_cm4_simd.h for compile to work
https://github.com/ARM-software/CMSIS_5
Code:
  T3.5@120mhz arm_math.h V1.5.3
- arm_mult_f32         -  0.919 us ; // real float32        8
- arm_mult_f32         -  5.248 us ; // real float32       64
- arm_mult_f32         - 20.089 us ; // real float32      256
- arm_mult_f32         - 79.454 us ; // real float32     1024
- arm_mult_q31         -  1.338 us ; // real q31            8
- arm_mult_q31         -  6.140 us ; // real q31           64
- arm_mult_q31         - 22.564 us ; // real q31          256
- arm_mult_q31         - 88.342 us ; // real q31         1024
- arm_mult_q15         -  0.944 us ; // real q15            8
- arm_mult_q15         -  4.922 us ; // real q15           64
- arm_mult_q15         - 18.558 us ; // real q15          256
- arm_mult_q15         - 73.103 us ; // real q15         1024
- arm_sin_cos_f32      -  1.254 us ; // real float32                
- arm_sin_cos_q31      -  2.506 us ; // real q31_t                  
- arm_cfft_radix2_q15  -   45.3 us ; // real q15_t             64
- arm_cfft_radix2_q15  -  213.7 us ; // real q15_t            256
- arm_cfft_radix2_q15  -  988.5 us ; // real q15_t           1024
- arm_cfft_radix4_q15  -   27.2 us ; // real q15_t             64
- arm_cfft_radix4_q15  -  136.8 us ; // real q15_t            256
- arm_cfft_radix4_q15  -  658.5 us ; // real q15_t           1024
- arm_cfft_radix2_q31  -  117.3 us ; // real q31_t             64
- arm_cfft_radix2_q31  -  606.3 us ; // real q31_t            256
- arm_cfft_radix2_q31  - 2985.4 us ; // real q31_t           1024
- arm_cfft_radix4_q31  -   58.4 us ; // real q31_t             64
- arm_cfft_radix4_q31  -  314.3 us ; // real q31_t            256
- arm_cfft_radix4_q31  - 1577.9 us ; // real q31_t           1024
- arm_cfft_radix2_f32  -   57.9 us ; // real float32_t         64
- arm_cfft_radix2_f32  -  291.6 us ; // real float32_t        256
- arm_cfft_radix2_f32  - 1417.7 us ; // real float32_t       1024
- arm_cfft_radix4_f32  -   39.0 us ; // real float32_t         64
- arm_cfft_radix4_f32  -  192.5 us ; // real float32_t        256
- arm_cfft_radix4_f32  -  919.5 us ; // real float32_t       1024
Some fft's are slower than 4.5

Comparative anatomy

The Teensy audio library uses arm_cfft_radix4_q15(). So here are some performance comparisons.
Code:
q15 radix 4  1024 FFT, MCU @120MHz

MCU    microseconds REVERSEBITS arm_math.h
T3.5     784.7       860.4        1.1.0   GCC Faster
         668.6       726.9        1.4.5
         658.5       717.0        1.5.3
K64F     635.7       691.6        1.4.5   mbed ARM CC, new arm_cfft_q15()  640.4 us
MK70F    755.0                    1.1.0?  NXP DSP benchmark, IAR CC

recent DSP benchmarks or in perf.txt
 
Last edited:
Here's the Version 5.3 performance for the floating point real and complex fft routines using the Teensy 3.6. The forward and inverse rfft (real-fft) uses arm_rfft_fast_f32. The forward and inverse cfft (complex-fft) uses arm_cfft_f32.

It is difficult to directly compare these results with the performance from Manitou above because they are obtained using a Teensy 3.6 @ 180 MHz. Assuming 1417.7 us is for a 1024 point forward transform using arm_cfft_radix4_f32, Version 5.3 is approximately 1.6 times faster when using the latest algorithms and taking the differences in clock speeds into account.
 

Attachments

  • Timing.txt
    334 bytes · Views: 149
Back
Top