Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 10 of 10

Thread: Cmsis 5.9.0 + cmsis dsp 1.12

  1. #1
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409

    Cmsis 5.9.0 + cmsis dsp 1.12

    Alcon

    The latest version of CMSIS DSP is no longer easy to port to Teensy with out going through and generating your arm_math.a files. DSP is now broken out from CMSIS5 into its own repository as of version 5.5.1

    The following folders have been removed:

    CMSIS/Lib/ (superseded by CMSIS/DSP/Lib/)
    CMSIS/DSP_Lib/ (superseded by CMSIS/DSP/)
    The following folders are deprecated:
    CMSIS/Include/ (superseded by CMSIS/DSP/Include/ and CMSIS/Core/Include/)
    for 5.5.1
    CMSIS/Include/ (superseded by CMSIS/DSP/Include/ and CMSIS/Core/Include/)
    In previous versions ARM CMSIS provided the gcc pre-compiled binaries for arm_math. Now with the latest versions you have to build your own libraries. However, think I broke the code on how that you can do that.

    With that said with come poking around I did find that in version 5.5.0 they provided a uVision project file that you can you to compile the binaries for the library:
    1. Downlowad the MDK ÁVision« IDE from ARM-Keil website
    https://www2.keil.com/mdk5/uvision/ (Overview)

    https://www.keil.com/download/product/ (download MDK-ARM - you will have to register)

    2. Download Arm GNU Toolchain (my case I picked 11.3.1) to be inline with what we are currently using.
    https://developer.arm.com/downloads/...hain-downloads

    3. Then I downloaded the latest versions of CMSIS5 and CMSIS-DSP and put them on my D-drive (made my life easier.

    4. Copied the project file for gcc and made some major edits to cover all the new library functions and edited the file paths for each board I wanted to compile for.
    a. In [ project->manage -> project Items ] I deleted all the groups and recreated them using all the new groups and files from the latest CMSIS-DSP source folder.
    Click image for larger version. 

Name:	Capture.PNG 
Views:	4 
Size:	32.6 KB 
ID:	29432

    5. Then I selected the processor I wanted to compile for and did a left click and selected: Options for target board. In the CC Tab i edited the paths for files:

    Click image for larger version. 

Name:	Capture1.PNG 
Views:	4 
Size:	60.9 KB 
ID:	29433

    Then I did the Build for the target and got my compiled binary for the library. However the library files are now 9Meg versus the 2-3 meg they were before so not sure if all the compile options are correct but they are the default ones that come with the project file.

    The last step was to update the arm_structs.h and arm_constants.h files in the cores.

    I will post the files in a repository if anybody wants to do some further testing - need to the end of the day.

    EDIT: Ref this post: https://forum.pjrc.com/threads/71074...l=1#post312904
    Last edited by mjs513; 09-20-2022 at 10:11 PM.

  2. #2
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409
    Using @manitou's dsp benchmark sketch:
    Code:
    Teensy Micromod
    
    - arm_mult_f32         -  0.092 us ; // real float32        8
    - arm_mult_f32         -  0.418 us ; // real float32       64
    - arm_mult_f32         -  1.539 us ; // real float32      256
    - arm_mult_f32         -  6.019 us ; // real float32     1024
    - arm_mult_q31         -  0.185 us ; // real q31            8
    - arm_mult_q31         -  1.072 us ; // real q31           64
    - arm_mult_q31         -  4.112 us ; // real q31          256
    - arm_mult_q31         - 16.273 us ; // real q31         1024
    - arm_mult_q15         -  0.150 us ; // real q15            8
    - arm_mult_q15         -  0.804 us ; // real q15           64
    - arm_mult_q15         -  3.044 us ; // real q15          256
    - arm_mult_q15         - 12.004 us ; // real q15         1024
    - arm_sin_cos_f32      -  0.170 us ; // real float32                
    - arm_sin_cos_q31      -  0.180 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -    5.8 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -   28.0 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  133.9 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -    2.8 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -   14.5 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -   71.2 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -    8.7 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -   43.9 us ; // real q31_t            256
    - arm_cfft_radix2_q31  -  213.7 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -    4.2 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -   22.6 us ; // real q31_t            256
    - arm_cfft_radix4_q31  -  114.1 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -    5.4 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -   28.2 us ; // real float32_t        256
    - arm_cfft_radix2_f32  -  140.0 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -    3.2 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -   17.0 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -   85.9 us ; // real float32_t       1024
    - arm_cfft_q15         -   71.4 us ; // real q15_t           1024
    - arm_cfft_q31         -  114.1 us ; // real q31_t           1024
    - arm_cfft_f32         -   87.0 us ; // real float32_t       1024
    Code:
    Teensy 3.5
    
    - arm_mult_f32         -  0.911 us ; // real float32        8
    - arm_mult_f32         -  5.354 us ; // real float32       64
    - arm_mult_f32         - 20.587 us ; // real float32      256
    - arm_mult_f32         - 81.519 us ; // real float32     1024
    - arm_mult_q31         -  1.078 us ; // real q31            8
    - arm_mult_q31         -  6.222 us ; // real q31           64
    - arm_mult_q31         - 23.861 us ; // real q31          256
    - arm_mult_q31         - 94.415 us ; // real q31         1024
    - arm_mult_q15         -  1.044 us ; // real q15            8
    - arm_mult_q15         -  5.010 us ; // real q15           64
    - arm_mult_q15         - 18.660 us ; // real q15          256
    - arm_mult_q15         - 73.144 us ; // real q15         1024
    - arm_sin_cos_f32      -  1.253 us ; // real float32                
    - arm_sin_cos_q31      -  1.962 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -   44.7 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -  207.2 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  929.9 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -   25.0 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -  124.7 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -  609.8 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -   88.7 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -  448.6 us ; // real q31_t            256
    - arm_cfft_radix2_q31  - 2178.0 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -   48.7 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -  257.5 us ; // real q31_t            256
    - arm_cfft_radix4_q31  - 1284.1 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -   63.0 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -  323.0 us ; // real float32_t        256
    - arm_cfft_radix2_f32  - 1587.3 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -   36.7 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -  182.9 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -  880.8 us ; // real float32_t       1024
    - arm_cfft_q15         -  616.0 us ; // real q15_t           1024
    - arm_cfft_q31         - 1281.6 us ; // real q31_t           1024
    - arm_cfft_f32         -  990.0 us ; // real float32_t       1024
    For comparisons check this link:
    https://forum.pjrc.com/threads/24037...l=1#post183614

    Some are slower and some are faster.
    Last edited by mjs513; 09-20-2022 at 10:14 PM.

  3. #3
    Senior Member
    Join Date
    Sep 2021
    Posts
    111
    Quote Originally Posted by mjs513 View Post
    Alcon

    The latest version of CMSIS DSP is no longer easy to port to Teensy with out going through and generating your arm_math.a files.
    Paul has a script for that. I had it, but deleted it. Maybe you can find it on github.
    I guess he must have run it for the new compiler(?) I don't think that the lib is still the sameas for the old compiler. (One could compare the files, but I don't bother).

  4. #4
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409
    Quote Originally Posted by Mcu32 View Post
    Paul has a script for that. I had it, but deleted it. Maybe you can find it on github.
    I guess he must have run it for the new compiler(?) I don't think that the lib is still the sameas for the old compiler. (One could compare the files, but I don't bother).
    No the library has completely changed from the version that is currently in the Teensy. If compared what was say in version 5.3 versus now you will see that there are an close to 50% more files than there was before.

    Even if I could find the script it was probably based on the older versions and incomplete. I just posted everything to Github including the library's plus the updated core files. Pretty much just a cut and paste operation now.

    https://github.com/mjs513/Teensy-DSP-1.12-Updates

    Basically just copy and paste files in the TeensyX Files to the associated cores where ever you placed you TD install's. For me that would be
    ..\arduino-1.8.19-1131\hardware\teensy\avr\cores\teensy3 or teensy4 directories

    Then copy the lib files in the Precompiled Binaries to:
    ..\arduino-1.8.19-1131\hardware\tools\arm\arm-none-eabi\lib

    Then you are good to go. No need to edit the files.

    There is are benchmarks you can run but I haven't got around to porting them. Right now recovering from getting shots in my eye and its a bit annoying.

  5. #5
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409
    @PaulStoffregen

    In a very old forum post (https://forum.pjrc.com/threads/24037...ll=1#post34228) you asked:
    If you do anything with the math library, even pretty simple stuff, I hope you'll consider posting about it.

    This library is pretty complex, so even pretty simple "how to" info might really help everyone who tries to use it.
    Hope this begins to answer some it.
    Last edited by mjs513; 09-21-2022 at 12:45 PM.

  6. #6
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409
    @PaulStoffregen

    wondering if the following is correct for the Teensy4's - would like to reduce that 9mb file size at some point

    Misc controls
    Code:
    -fno-strict-aliasing -ffunction-sections -fdata-sections -mfpu=fpv5-sp-d16 -mfloat-abi=hard -ffp-contract=off
    Compiler Control String
    Code:
    -c -mcpu=cortex-m7 -mthumb -gdwarf-2 -MD -Wall -O3 -I ..\..\..\Core\Include -I D:\CMSIS-DSP-1.12.0\Source -I D:\CMSIS-DSP-1.12.0\Include -I D:\CMSIS_5-5.9.0\CMSIS\Core\Include -I D:\CMSIS-DSP-1.12.0\PrivateInclude -fno-strict-aliasing -ffunction-sections -fdata-sections -mfpu=fpv5-sp-d16 -mfloat-abi=hard -ffp-contract=off -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\arm-none-eabi\include" -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\lib\gcc\arm-none-eabi\11.3.1\include" -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\arm-none-eabi\include\c++\11.3.1" -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\arm-none-eabi\include\c++\11.3.1\arm-none-eabi" -D__UVISION_VERSION="537" -D__GCC -D__GCC_VERSION="1131" -DARMCM7_SP -DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -DARM_MATH_LOOPUNROLL -o *.o

  7. #7
    Senior Member
    Join Date
    Sep 2021
    Posts
    111
    About ffp-contract: https://kristerw.github.io/2021/11/09/fp-contract/
    Well... "off" disables fast multiply/add.
    You should decide what is more important in CMSIS math / dsp (which mostly is written for speed, if i'm correct?)

  8. #8
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409
    Ok fixed issue I was having with the 9mb prebinary compile sizes. Reduced it to <4mb. Updated Github with the changes.

    If anyone has any fun test sketches please give it try. Or post their examples for the fun of it.

  9. #9
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,409
    Out of this tendency I have for torture I ran an example I found on line using a 697hz tone that was generated from audacity: https://m0agx.eu/2018/05/23/practica...ing-cmsis-dsp/.

    Porting the example to for the T4 - get the same bin for the max frequency as they got, bin 44. As for the graph - since I am a glutton for punishment I used @KrisKasprzak ILI9341_controls library to generate the following graph:
    Click image for larger version. 

Name:	IMG-0743.jpg 
Views:	3 
Size:	51.5 KB 
ID:	29448

    If you are interested here is the file:
    Code:
    // https://m0agx.eu/2018/05/23/practical-fft-on-microcontrollers-using-cmsis-dsp/
    
    #include <ILI9341_t3.h>           // fast display driver lib
    #include <ILI9341_t3_Controls.h>
    #include <font_Arial.h>           // custom fonts that ships with ILI9341_t3.h
    
    // you must create and pass fonts to the function
    #define FONT_TITLE Arial_16
    #define FONT_DATA Arial_10
    
    // For the Adafruit shield, these are the default.
    #define TFT_DC  9
    #define TFT_CS 10
    
    // Use hardware SPI (on Uno, #13, #12, #11) and the above for CS/DC
    ILI9341_t3 tft = ILI9341_t3(TFT_CS, TFT_DC);
    
    // defines for graph location and scales
    #define X_ORIGIN    50      //GraphXLoc
    #define Y_ORIGIN    200     //GraphYLoc
    #define X_WIDE       250    //GraphHeight
    #define Y_HIGH      150     //GraphWidth
    #define X_LOSCALE   0       //XAxisLow
    #define X_HISCALE   300
    #define X_INC       50       //XAxisInc
    #define Y_LOSCALE   0
    #define Y_HISCALE   10000
    #define Y_INC       2000
    
    #define TEXTCOLOR C_WHITE
    #define GRIDCOLOR C_GREY
    #define AXISCOLOR C_GREEN
    #define BACKCOLOR C_BLACK
    #define PLOTCOLOR C_DKGREY
    #define VOLTSCOLOR C_RED
    #define SINCOLOR C_YELLOW
    #define COSCOLOR C_BLUE
    
    // used to monitor elaspsed time
    unsigned long oldTime;
    
    // create a variable for each data data point
    //float x, volts;
    
    // create an ID for each data to be plotted
    int fftID;
    
    // create the display object
    ILI9341_t3 Display(TFT_CS, TFT_DC);
    
    // create the cartesian coordinate graph object
    CGraph MyGraph(&Display, X_ORIGIN, Y_ORIGIN, X_WIDE, Y_HIGH, X_LOSCALE, X_HISCALE, X_INC, Y_LOSCALE, Y_HISCALE, Y_INC);
    
    //=== End Graph setup
    
    #include "arm_math.h"
    #include "arm_const_structs.h"
    #include "testData.h"
    
    #define printf Serial.printf
    
    #define FFT_SIZE 256
     
    typedef struct {
      const char *desc;
      unsigned char *data;
    } test_wave_t;
     
    static const test_wave_t WAVES[] = {
        { "697Hz", __697hz_raw },
    };
     
    void fft_test(void){
      static arm_rfft_instance_q15 fft_instance;
      static q15_t output[FFT_SIZE*2]; //has to be twice FFT size
      static q15_t pResult;
      uint32_t index;
      
      arm_status status;
     
      status = arm_rfft_init_q15(&fft_instance, 256/*bin count*/, 0/*forward FFT*/, 1/*output bit order is normal*/);
      printf("FFT init %d\n", status);
     
      for (uint32_t i = 0; i < sizeof(WAVES)/sizeof(WAVES[0]); i++){
     
        uint32_t c_start = micros();
     
        arm_rfft_q15(&fft_instance, (q15_t*)WAVES[i].data, output);
     
        arm_abs_q15(output, output, FFT_SIZE);
     
        uint32_t c_stop = micros();
     
        //printf("%s %ld \n", WAVES[i].desc, c_stop-c_start);
     
        for (uint32_t j = 0; j < FFT_SIZE; j++){
          //printf("%d, %d\n ", j, output[j]);
            MyGraph.setX(j);
            MyGraph.plot(fftID, output[j]);
        }
        printf("\n");
    
        arm_max_q15(output, FFT_SIZE, &pResult, &index);
        //printf("Max Val: %d, Bin: %d\n", pResult, index);
        Display.setCursor(50,20);
        Display.setTextColor(C_YELLOW);
        Display.print("Peak at Bin "); Display.print(index);
      }
    }
    void setup() {
      Serial.begin(9600);
      while(!Serial){};
    
      // fire up the display
      Display.begin();
      Display.setRotation(1);
      Display.fillScreen(C_BLACK);
    
      // initialize the graph object
      MyGraph.init("", "bin", "mag", TEXTCOLOR, GRIDCOLOR, AXISCOLOR, BACKCOLOR, PLOTCOLOR, FONT_TITLE, FONT_DATA);
      fftID = MyGraph.add("mag", SINCOLOR);
      MyGraph.drawGraph();  //draws empty graph
      
      fft_test();
      
    }
    
    void loop() {
      // put your main code here, to run repeatedly:
    
    }

  10. #10
    Senior Member
    Join Date
    May 2022
    Posts
    163
    Very nice. Now plot that in dB's, using log10 in the library. dBout = 20 * log10( mag ). Then you will be able to detect any low level spurs in the output since the log function lets you see big and little stuff at the same time.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •