Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 14 of 14

Thread: CMSIS-DSP library supports

  1. #1

    CMSIS-DSP library supports

    it would be possible to integrate this library, as it is now designed the Teensy 3.0?
    sorry if the question is stupid, but it is because I am a beginner in this world.

  2. #2
    Senior Member
    Join Date
    Nov 2012
    Posts
    1,136
    The ARM cortex M4 math library is installed as part of Teensyduino 1.15.
    See my message #42 in this thread http://forum.pjrc.com/threads/14845 I posted some of the CMSIS examples which run on Teensy 3.

    Pete

  3. #3
    many thanks, I missed it

  4. #4
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,309
    If you do anything with the math library, even pretty simple stuff, I hope you'll consider posting about it.

    This library is pretty complex, so even pretty simple "how to" info might really help everyone who tries to use it.

  5. #5
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,084
    Quote Originally Posted by PaulStoffregen View Post
    If you do anything with the math library, even pretty simple stuff, I hope you'll consider posting about it.

    This library is pretty complex, so even pretty simple "how to" info might really help everyone who tries to use it.
    Here is an NXP link to some DSP benchmarks https://community.nxp.com/thread/327833 (fft, mult, sin, cos, fir, biquad cascade)
    I ported some of the tests from the zip file and ran on T3.5@120mhz (IDE 1.8.5/1.42)

    Code:
    - arm_mult_f32         -  1.086 us ; // real float32        8
    - arm_mult_f32         -  4.852 us ; // real float32       64
    - arm_mult_f32         - 17.704 us ; // real float32      256
    - arm_mult_f32         - 69.027 us ; // real float32     1024
    - arm_mult_q31         -  1.671 us ; // real q31            8
    - arm_mult_q31         -  6.891 us ; // real q31           64
    - arm_mult_q31         - 24.553 us ; // real q31          256
    - arm_mult_q31         - 95.128 us ; // real q31         1024
    - arm_mult_q15         -  1.086 us ; // real q15            8
    - arm_mult_q15         -  5.303 us ; // real q15           64
    - arm_mult_q15         - 19.729 us ; // real q15          256
    - arm_mult_q15         - 77.460 us ; // real q15         1024
    - arm_sin_cos_f32      -  0.579 us ; // real float32                
    - arm_sin_cos_q31      -  0.671 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -   54.8 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -  257.0 us ; // real q15_t            256
    - arm_cfft_radix2_q15  - 1169.7 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -   34.5 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -  166.5 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -  784.7 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -  102.6 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -  516.1 us ; // real q31_t            256
    - arm_cfft_radix2_q31  - 2489.1 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -   73.6 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -  390.8 us ; // real q31_t            256
    - arm_cfft_radix4_q31  - 1947.9 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -   71.1 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -  361.2 us ; // real float32_t        256
    - arm_cfft_radix2_f32  - 1710.7 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -   44.1 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -  220.9 us ; // real float32_t        256
    - arm_cfft_radix4_f32  - 1079.8 us ; // real float32_t       1024
    Last edited by manitou; 07-19-2018 at 09:53 AM.

  6. #6
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    407
    Hi,

    thanks for your tests!

    it seems the thread and the link is very old and relates to a very old version of CMSIS, e.g. some/most of the functions are deprecated now and have been superseded by new functions (e.g. arm_cfft_f32).

    There are (much?) faster and more accurate versions of CMSIS available now, which have been used on the Teensy, Jan has a detailed description on how to use it:

    https://forum.pjrc.com/threads/40590...l=1#post129081

    All the best,

    Frank DD4WH

  7. #7
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,084
    Quote Originally Posted by DD4WH View Post
    it seems the thread and the link is very old and relates to a very old version of CMSIS, e.g. some/most of the functions are deprecated now and have been superseded by new functions
    Tests were run with latest IDE 1.8.5/1.42. The arm_math.h for current IDE appears to be V1.1.0

  8. #8
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    407
    Yes, you are right, the recent IDE contains a very old version of CMSIS (and thus arm_math.h).

    Therefore I provided the link to a way how to use a newer version which provides faster (and more accurate) algorithms, eg. for FFTs.

  9. #9
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,084
    Quote Originally Posted by DD4WH View Post
    Therefore I provided the link to a way how to use a newer version which provides faster (and more accurate) algorithms, eg. for FFTs.
    I see there is now a version 5 https://github.com/ARM-software/CMSIS_5
    Do your instructions for updating teensy core CMSIS includes/libs apply to version 5 as well?

  10. #10
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    407
    No, they only apply to the version given.

    But there was a recent post (a few weeks ago, if I remember correctly) of somebody succesfully updating to version 5 by only editing one or two more lines. Unfortunately I could not find it by a forum search . . .

  11. #11
    Senior Member DD4WH's Avatar
    Join Date
    Oct 2015
    Location
    Central Europe
    Posts
    407
    found it:

    https://forum.pjrc.com/threads/44570-Request-update-CMSIS-DSP-(arm_math-h)?p=182720&viewfull=1#post182720


    I personally have succesfully used v4.5, but not version 5, so experiment yourself :-)
    Last edited by DD4WH; 07-19-2018 at 03:13 PM.

  12. #12
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,084
    Using the same benchmark (deprecated functions), and upgrading to v4.5 https://github.com/ARM-software/CMSIS, functions are faster
    Code:
              T3.5@120mhz arm_math.h v1.4.5
              - arm_mult_f32         -  0.919 us ; // real float32        8
              - arm_mult_f32         -  5.248 us ; // real float32       64
              - arm_mult_f32         - 20.090 us ; // real float32      256
              - arm_mult_f32         - 79.456 us ; // real float32     1024
              - arm_mult_q31         -  1.187 us ; // real q31            8
              - arm_mult_q31         -  5.984 us ; // real q31           64
              - arm_mult_q31         - 22.431 us ; // real q31          256
              - arm_mult_q31         - 88.219 us ; // real q31         1024
              - arm_mult_q15         -  1.086 us ; // real q15            8
              - arm_mult_q15         -  5.179 us ; // real q15           64
              - arm_mult_q15         - 19.215 us ; // real q15          256
              - arm_mult_q15         - 75.363 us ; // real q15         1024
              - arm_sin_cos_f32      -  1.337 us ; // real float32
              - arm_sin_cos_q31      -  2.506 us ; // real q31_t
              - arm_cfft_radix2_q15  -   46.0 us ; // real q15_t             64
              - arm_cfft_radix2_q15  -  214.9 us ; // real q15_t            256
              - arm_cfft_radix2_q15  -  982.8 us ; // real q15_t           1024
              - arm_cfft_radix4_q15  -   27.0 us ; // real q15_t             64
              - arm_cfft_radix4_q15  -  136.4 us ; // real q15_t            256
              - arm_cfft_radix4_q15  -  668.6 us ; // real q15_t           1024
              - arm_cfft_radix2_q31  -  106.6 us ; // real q31_t             64
              - arm_cfft_radix2_q31  -  542.2 us ; // real q31_t            256
              - arm_cfft_radix2_q31  - 2643.6 us ; // real q31_t           1024
              - arm_cfft_radix4_q31  -   58.9 us ; // real q31_t             64
              - arm_cfft_radix4_q31  -  316.2 us ; // real q31_t            256
              - arm_cfft_radix4_q31  - 1585.1 us ; // real q31_t           1024
              - arm_cfft_radix2_f32  -   57.8 us ; // real float32_t         64
              - arm_cfft_radix2_f32  -  291.5 us ; // real float32_t        256
              - arm_cfft_radix2_f32  - 1417.6 us ; // real float32_t       1024
              - arm_cfft_radix4_f32  -   37.6 us ; // real float32_t         64
              - arm_cfft_radix4_f32  -  186.1 us ; // real float32_t        256
              - arm_cfft_radix4_f32  -  893.1 us ; // real float32_t       1024
    For V5.3 I had to ifdef out all of hardware/teensy/avr/cores/teensy3/core_cm4_simd.h for compile to work
    https://github.com/ARM-software/CMSIS_5
    Code:
      T3.5@120mhz arm_math.h V1.5.3
    - arm_mult_f32         -  0.919 us ; // real float32        8
    - arm_mult_f32         -  5.248 us ; // real float32       64
    - arm_mult_f32         - 20.089 us ; // real float32      256
    - arm_mult_f32         - 79.454 us ; // real float32     1024
    - arm_mult_q31         -  1.338 us ; // real q31            8
    - arm_mult_q31         -  6.140 us ; // real q31           64
    - arm_mult_q31         - 22.564 us ; // real q31          256
    - arm_mult_q31         - 88.342 us ; // real q31         1024
    - arm_mult_q15         -  0.944 us ; // real q15            8
    - arm_mult_q15         -  4.922 us ; // real q15           64
    - arm_mult_q15         - 18.558 us ; // real q15          256
    - arm_mult_q15         - 73.103 us ; // real q15         1024
    - arm_sin_cos_f32      -  1.254 us ; // real float32                
    - arm_sin_cos_q31      -  2.506 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -   45.3 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -  213.7 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  988.5 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -   27.2 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -  136.8 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -  658.5 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -  117.3 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -  606.3 us ; // real q31_t            256
    - arm_cfft_radix2_q31  - 2985.4 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -   58.4 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -  314.3 us ; // real q31_t            256
    - arm_cfft_radix4_q31  - 1577.9 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -   57.9 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -  291.6 us ; // real float32_t        256
    - arm_cfft_radix2_f32  - 1417.7 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -   39.0 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -  192.5 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -  919.5 us ; // real float32_t       1024
    Some fft's are slower than 4.5

    Comparative anatomy

    The Teensy audio library uses arm_cfft_radix4_q15(). So here are some performance comparisons.
    Code:
    q15 radix 4  1024 FFT, MCU @120MHz
    
    MCU    microseconds REVERSEBITS arm_math.h
    T3.5     784.7       860.4        1.1.0   GCC Faster
             668.6       726.9        1.4.5
             658.5       717.0        1.5.3
    K64F     635.7       691.6        1.4.5   mbed ARM CC, new arm_cfft_q15()  640.4 us
    MK70F    755.0                    1.1.0?  NXP DSP benchmark, IAR CC
    Last edited by manitou; 07-23-2018 at 02:13 PM.

  13. #13
    Good comparison but instead recommend running arm_cfft_f32 for Version 5 .3 as the radix2 and radix4 algorithms have been deprecated.

  14. #14
    Here's the Version 5.3 performance for the floating point real and complex fft routines using the Teensy 3.6. The forward and inverse rfft (real-fft) uses arm_rfft_fast_f32. The forward and inverse cfft (complex-fft) uses arm_cfft_f32.

    It is difficult to directly compare these results with the performance from Manitou above because they are obtained using a Teensy 3.6 @ 180 MHz. Assuming 1417.7 us is for a 1024 point forward transform using arm_cfft_radix4_f32, Version 5.3 is approximately 1.6 times faster when using the latest algorithms and taking the differences in clock speeds into account.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •