Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 17 of 17

Thread: Cmsis 5.9.0 + cmsis dsp 1.12

  1. #1
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465

    Cmsis 5.9.0 + cmsis dsp 1.12

    Alcon

    The latest version of CMSIS DSP is no longer easy to port to Teensy with out going through and generating your arm_math.a files. DSP is now broken out from CMSIS5 into its own repository as of version 5.5.1

    The following folders have been removed:

    CMSIS/Lib/ (superseded by CMSIS/DSP/Lib/)
    CMSIS/DSP_Lib/ (superseded by CMSIS/DSP/)
    The following folders are deprecated:
    CMSIS/Include/ (superseded by CMSIS/DSP/Include/ and CMSIS/Core/Include/)
    for 5.5.1
    CMSIS/Include/ (superseded by CMSIS/DSP/Include/ and CMSIS/Core/Include/)
    In previous versions ARM CMSIS provided the gcc pre-compiled binaries for arm_math. Now with the latest versions you have to build your own libraries. However, think I broke the code on how that you can do that.

    With that said with come poking around I did find that in version 5.5.0 they provided a uVision project file that you can you to compile the binaries for the library:
    1. Downlowad the MDK ÁVision« IDE from ARM-Keil website
    https://www2.keil.com/mdk5/uvision/ (Overview)

    https://www.keil.com/download/product/ (download MDK-ARM - you will have to register)

    2. Download Arm GNU Toolchain (my case I picked 11.3.1) to be inline with what we are currently using.
    https://developer.arm.com/downloads/...hain-downloads

    3. Then I downloaded the latest versions of CMSIS5 and CMSIS-DSP and put them on my D-drive (made my life easier.

    4. Copied the project file for gcc and made some major edits to cover all the new library functions and edited the file paths for each board I wanted to compile for.
    a. In [ project->manage -> project Items ] I deleted all the groups and recreated them using all the new groups and files from the latest CMSIS-DSP source folder.
    Click image for larger version. 

Name:	Capture.PNG 
Views:	15 
Size:	32.6 KB 
ID:	29432

    5. Then I selected the processor I wanted to compile for and did a left click and selected: Options for target board. In the CC Tab i edited the paths for files:

    Click image for larger version. 

Name:	Capture1.PNG 
Views:	16 
Size:	60.9 KB 
ID:	29433

    Then I did the Build for the target and got my compiled binary for the library. However the library files are now 9Meg versus the 2-3 meg they were before so not sure if all the compile options are correct but they are the default ones that come with the project file.

    The last step was to update the arm_structs.h and arm_constants.h files in the cores.

    I will post the files in a repository if anybody wants to do some further testing - need to the end of the day.

    EDIT: Ref this post: https://forum.pjrc.com/threads/71074...l=1#post312904
    Last edited by mjs513; 09-20-2022 at 11:11 PM.

  2. #2
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Using @manitou's dsp benchmark sketch:
    Code:
    Teensy Micromod
    
    - arm_mult_f32         -  0.092 us ; // real float32        8
    - arm_mult_f32         -  0.418 us ; // real float32       64
    - arm_mult_f32         -  1.539 us ; // real float32      256
    - arm_mult_f32         -  6.019 us ; // real float32     1024
    - arm_mult_q31         -  0.185 us ; // real q31            8
    - arm_mult_q31         -  1.072 us ; // real q31           64
    - arm_mult_q31         -  4.112 us ; // real q31          256
    - arm_mult_q31         - 16.273 us ; // real q31         1024
    - arm_mult_q15         -  0.150 us ; // real q15            8
    - arm_mult_q15         -  0.804 us ; // real q15           64
    - arm_mult_q15         -  3.044 us ; // real q15          256
    - arm_mult_q15         - 12.004 us ; // real q15         1024
    - arm_sin_cos_f32      -  0.170 us ; // real float32                
    - arm_sin_cos_q31      -  0.180 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -    5.8 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -   28.0 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  133.9 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -    2.8 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -   14.5 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -   71.2 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -    8.7 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -   43.9 us ; // real q31_t            256
    - arm_cfft_radix2_q31  -  213.7 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -    4.2 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -   22.6 us ; // real q31_t            256
    - arm_cfft_radix4_q31  -  114.1 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -    5.4 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -   28.2 us ; // real float32_t        256
    - arm_cfft_radix2_f32  -  140.0 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -    3.2 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -   17.0 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -   85.9 us ; // real float32_t       1024
    - arm_cfft_q15         -   71.4 us ; // real q15_t           1024
    - arm_cfft_q31         -  114.1 us ; // real q31_t           1024
    - arm_cfft_f32         -   87.0 us ; // real float32_t       1024
    Code:
    Teensy 3.5
    
    - arm_mult_f32         -  0.911 us ; // real float32        8
    - arm_mult_f32         -  5.354 us ; // real float32       64
    - arm_mult_f32         - 20.587 us ; // real float32      256
    - arm_mult_f32         - 81.519 us ; // real float32     1024
    - arm_mult_q31         -  1.078 us ; // real q31            8
    - arm_mult_q31         -  6.222 us ; // real q31           64
    - arm_mult_q31         - 23.861 us ; // real q31          256
    - arm_mult_q31         - 94.415 us ; // real q31         1024
    - arm_mult_q15         -  1.044 us ; // real q15            8
    - arm_mult_q15         -  5.010 us ; // real q15           64
    - arm_mult_q15         - 18.660 us ; // real q15          256
    - arm_mult_q15         - 73.144 us ; // real q15         1024
    - arm_sin_cos_f32      -  1.253 us ; // real float32                
    - arm_sin_cos_q31      -  1.962 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -   44.7 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -  207.2 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  929.9 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -   25.0 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -  124.7 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -  609.8 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -   88.7 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -  448.6 us ; // real q31_t            256
    - arm_cfft_radix2_q31  - 2178.0 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -   48.7 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -  257.5 us ; // real q31_t            256
    - arm_cfft_radix4_q31  - 1284.1 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -   63.0 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -  323.0 us ; // real float32_t        256
    - arm_cfft_radix2_f32  - 1587.3 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -   36.7 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -  182.9 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -  880.8 us ; // real float32_t       1024
    - arm_cfft_q15         -  616.0 us ; // real q15_t           1024
    - arm_cfft_q31         - 1281.6 us ; // real q31_t           1024
    - arm_cfft_f32         -  990.0 us ; // real float32_t       1024
    For comparisons check this link:
    https://forum.pjrc.com/threads/24037...l=1#post183614

    Some are slower and some are faster.
    Last edited by mjs513; 09-20-2022 at 11:14 PM.

  3. #3
    Senior Member
    Join Date
    Sep 2021
    Posts
    161
    Quote Originally Posted by mjs513 View Post
    Alcon

    The latest version of CMSIS DSP is no longer easy to port to Teensy with out going through and generating your arm_math.a files.
    Paul has a script for that. I had it, but deleted it. Maybe you can find it on github.
    I guess he must have run it for the new compiler(?) I don't think that the lib is still the sameas for the old compiler. (One could compare the files, but I don't bother).

  4. #4
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Quote Originally Posted by Mcu32 View Post
    Paul has a script for that. I had it, but deleted it. Maybe you can find it on github.
    I guess he must have run it for the new compiler(?) I don't think that the lib is still the sameas for the old compiler. (One could compare the files, but I don't bother).
    No the library has completely changed from the version that is currently in the Teensy. If compared what was say in version 5.3 versus now you will see that there are an close to 50% more files than there was before.

    Even if I could find the script it was probably based on the older versions and incomplete. I just posted everything to Github including the library's plus the updated core files. Pretty much just a cut and paste operation now.

    https://github.com/mjs513/Teensy-DSP-1.12-Updates

    Basically just copy and paste files in the TeensyX Files to the associated cores where ever you placed you TD install's. For me that would be
    ..\arduino-1.8.19-1131\hardware\teensy\avr\cores\teensy3 or teensy4 directories

    Then copy the lib files in the Precompiled Binaries to:
    ..\arduino-1.8.19-1131\hardware\tools\arm\arm-none-eabi\lib

    Then you are good to go. No need to edit the files.

    There is are benchmarks you can run but I haven't got around to porting them. Right now recovering from getting shots in my eye and its a bit annoying.

  5. #5
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    @PaulStoffregen

    In a very old forum post (https://forum.pjrc.com/threads/24037...ll=1#post34228) you asked:
    If you do anything with the math library, even pretty simple stuff, I hope you'll consider posting about it.

    This library is pretty complex, so even pretty simple "how to" info might really help everyone who tries to use it.
    Hope this begins to answer some it.
    Last edited by mjs513; 09-21-2022 at 01:45 PM.

  6. #6
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    @PaulStoffregen

    wondering if the following is correct for the Teensy4's - would like to reduce that 9mb file size at some point

    Misc controls
    Code:
    -fno-strict-aliasing -ffunction-sections -fdata-sections -mfpu=fpv5-sp-d16 -mfloat-abi=hard -ffp-contract=off
    Compiler Control String
    Code:
    -c -mcpu=cortex-m7 -mthumb -gdwarf-2 -MD -Wall -O3 -I ..\..\..\Core\Include -I D:\CMSIS-DSP-1.12.0\Source -I D:\CMSIS-DSP-1.12.0\Include -I D:\CMSIS_5-5.9.0\CMSIS\Core\Include -I D:\CMSIS-DSP-1.12.0\PrivateInclude -fno-strict-aliasing -ffunction-sections -fdata-sections -mfpu=fpv5-sp-d16 -mfloat-abi=hard -ffp-contract=off -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\arm-none-eabi\include" -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\lib\gcc\arm-none-eabi\11.3.1\include" -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\arm-none-eabi\include\c++\11.3.1" -I"D:\Program Files (x86)\Arm GNU Toolchain arm-none-eabi\11.3 rel1\arm-none-eabi\include\c++\11.3.1\arm-none-eabi" -D__UVISION_VERSION="537" -D__GCC -D__GCC_VERSION="1131" -DARMCM7_SP -DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -DARM_MATH_LOOPUNROLL -o *.o

  7. #7
    Senior Member
    Join Date
    Sep 2021
    Posts
    161
    About ffp-contract: https://kristerw.github.io/2021/11/09/fp-contract/
    Well... "off" disables fast multiply/add.
    You should decide what is more important in CMSIS math / dsp (which mostly is written for speed, if i'm correct?)

  8. #8
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Ok fixed issue I was having with the 9mb prebinary compile sizes. Reduced it to <4mb. Updated Github with the changes.

    If anyone has any fun test sketches please give it try. Or post their examples for the fun of it.

  9. #9
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Out of this tendency I have for torture I ran an example I found on line using a 697hz tone that was generated from audacity: https://m0agx.eu/2018/05/23/practica...ing-cmsis-dsp/.

    Porting the example to for the T4 - get the same bin for the max frequency as they got, bin 44. As for the graph - since I am a glutton for punishment I used @KrisKasprzak ILI9341_controls library to generate the following graph:
    Click image for larger version. 

Name:	IMG-0743.jpg 
Views:	14 
Size:	51.5 KB 
ID:	29448

    If you are interested here is the file:
    Code:
    // https://m0agx.eu/2018/05/23/practical-fft-on-microcontrollers-using-cmsis-dsp/
    
    #include <ILI9341_t3.h>           // fast display driver lib
    #include <ILI9341_t3_Controls.h>
    #include <font_Arial.h>           // custom fonts that ships with ILI9341_t3.h
    
    // you must create and pass fonts to the function
    #define FONT_TITLE Arial_16
    #define FONT_DATA Arial_10
    
    // For the Adafruit shield, these are the default.
    #define TFT_DC  9
    #define TFT_CS 10
    
    // Use hardware SPI (on Uno, #13, #12, #11) and the above for CS/DC
    ILI9341_t3 tft = ILI9341_t3(TFT_CS, TFT_DC);
    
    // defines for graph location and scales
    #define X_ORIGIN    50      //GraphXLoc
    #define Y_ORIGIN    200     //GraphYLoc
    #define X_WIDE       250    //GraphHeight
    #define Y_HIGH      150     //GraphWidth
    #define X_LOSCALE   0       //XAxisLow
    #define X_HISCALE   300
    #define X_INC       50       //XAxisInc
    #define Y_LOSCALE   0
    #define Y_HISCALE   10000
    #define Y_INC       2000
    
    #define TEXTCOLOR C_WHITE
    #define GRIDCOLOR C_GREY
    #define AXISCOLOR C_GREEN
    #define BACKCOLOR C_BLACK
    #define PLOTCOLOR C_DKGREY
    #define VOLTSCOLOR C_RED
    #define SINCOLOR C_YELLOW
    #define COSCOLOR C_BLUE
    
    // used to monitor elaspsed time
    unsigned long oldTime;
    
    // create a variable for each data data point
    //float x, volts;
    
    // create an ID for each data to be plotted
    int fftID;
    
    // create the display object
    ILI9341_t3 Display(TFT_CS, TFT_DC);
    
    // create the cartesian coordinate graph object
    CGraph MyGraph(&Display, X_ORIGIN, Y_ORIGIN, X_WIDE, Y_HIGH, X_LOSCALE, X_HISCALE, X_INC, Y_LOSCALE, Y_HISCALE, Y_INC);
    
    //=== End Graph setup
    
    #include "arm_math.h"
    #include "arm_const_structs.h"
    #include "testData.h"
    
    #define printf Serial.printf
    
    #define FFT_SIZE 256
     
    typedef struct {
      const char *desc;
      unsigned char *data;
    } test_wave_t;
     
    static const test_wave_t WAVES[] = {
        { "697Hz", __697hz_raw },
    };
     
    void fft_test(void){
      static arm_rfft_instance_q15 fft_instance;
      static q15_t output[FFT_SIZE*2]; //has to be twice FFT size
      static q15_t pResult;
      uint32_t index;
      
      arm_status status;
     
      status = arm_rfft_init_q15(&fft_instance, 256/*bin count*/, 0/*forward FFT*/, 1/*output bit order is normal*/);
      printf("FFT init %d\n", status);
     
      for (uint32_t i = 0; i < sizeof(WAVES)/sizeof(WAVES[0]); i++){
     
        uint32_t c_start = micros();
     
        arm_rfft_q15(&fft_instance, (q15_t*)WAVES[i].data, output);
     
        arm_abs_q15(output, output, FFT_SIZE);
     
        uint32_t c_stop = micros();
     
        //printf("%s %ld \n", WAVES[i].desc, c_stop-c_start);
     
        for (uint32_t j = 0; j < FFT_SIZE; j++){
          //printf("%d, %d\n ", j, output[j]);
            MyGraph.setX(j);
            MyGraph.plot(fftID, output[j]);
        }
        printf("\n");
    
        arm_max_q15(output, FFT_SIZE, &pResult, &index);
        //printf("Max Val: %d, Bin: %d\n", pResult, index);
        Display.setCursor(50,20);
        Display.setTextColor(C_YELLOW);
        Display.print("Peak at Bin "); Display.print(index);
      }
    }
    void setup() {
      Serial.begin(9600);
      while(!Serial){};
    
      // fire up the display
      Display.begin();
      Display.setRotation(1);
      Display.fillScreen(C_BLACK);
    
      // initialize the graph object
      MyGraph.init("", "bin", "mag", TEXTCOLOR, GRIDCOLOR, AXISCOLOR, BACKCOLOR, PLOTCOLOR, FONT_TITLE, FONT_DATA);
      fftID = MyGraph.add("mag", SINCOLOR);
      MyGraph.drawGraph();  //draws empty graph
      
      fft_test();
      
    }
    
    void loop() {
      // put your main code here, to run repeatedly:
    
    }

  10. #10
    Senior Member
    Join Date
    May 2022
    Posts
    189
    Very nice. Now plot that in dB's, using log10 in the library. dBout = 20 * log10( mag ). Then you will be able to detect any low level spurs in the output since the log function lets you see big and little stuff at the same time.

  11. #11
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Been playing a bit more with the CMSIS update using a Teensy micromod. This time I used a test case from a matlab example: https://github.com/Zafar577/MATLAB-DSP/blob/main/TTT.m. Basically a sine wave plus noise.

    But I ported the windows from the GNURadio library that has support for:
    Code:
            WIN_NONE = -1,       //!< don't use a window
            WIN_HAMMING = 0,     //!< Hamming window; max attenuation 53 dB
            WIN_HANN = 1,        //!< Hann window; max attenuation 44 dB
            WIN_HANNING = 1,     //!< alias to WIN_HANN
            WIN_BLACKMAN = 2,    //!< Blackman window; max attenuation 74 dB
            WIN_RECTANGULAR = 3, //!< Basic rectangular window; max attenuation 21 dB
            WIN_KAISER = 4, //!< Kaiser window; max attenuation see window::max_attenuation
            WIN_BLACKMAN_hARRIS = 5, //!< Blackman-harris window; max attenuation 92 dB
            WIN_BLACKMAN_HARRIS =
                5,            //!< alias to WIN_BLACKMAN_hARRIS for capitalization consistency
            WIN_BARTLETT = 6, //!< Barlett (triangular) window; max attenuation 26 dB
            WIN_FLATTOP = 7,  //!< flat top window; useful in FFTs; max attenuation 93 dB
            WIN_NUTTALL = 8,  //!< Nuttall window; max attenuation 114 dB
            WIN_BLACKMAN_NUTTALL = 8, //!< Nuttall window; max attenuation 114 dB
            WIN_NUTTALL_CFD =
                9, //!< Nuttall continuous-first-derivative window; max attenuation 112 dB
            WIN_WELCH = 10,  //!< Welch window; max attenuation 31 dB
            WIN_PARZEN = 11, //!< Parzen window; max attenuation 56 dB
            WIN_EXPONENTIAL =
                12, //!< Exponential window; max attenuation see window::max_attenuation
            WIN_RIEMANN = 13, //!< Riemann window; max attenuation 39 dB
            WIN_GAUSSIAN =
                14,         //!< Gaussian window; max attenuation see window::max_attenuation
            WIN_TUKEY = 15, //!< Tukey window; max attenuation see window::max_attenuation
    and added that into the test sketch.

    Just for reference Matlab shows a max value with a frequency at 500hz.
    Click image for larger version. 

Name:	Capture1.PNG 
Views:	12 
Size:	49.5 KB 
ID:	29498

    and using a 32-byte Hamming window I show the same thing.

    First chart shows the raw signal (yellow) and windowed signal (red)
    Click image for larger version. 

Name:	IMG-0750.jpg 
Views:	11 
Size:	121.1 KB 
ID:	29496

    With spectrum:
    Click image for larger version. 

Name:	IMG-0751.jpg 
Views:	10 
Size:	121.6 KB 
ID:	29497

    for which I get:
    Code:
    Index: 128, freq: 500.000000,  MaxValue: 247.432098
    If interested here is the sketch:
    sig-noise.zip

  12. #12
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    11,128
    Quote Originally Posted by mjs513 View Post
    Out of this tendency I have for torture I ran an example I found on line using a 697hz tone that was generated from audacity: https://m0agx.eu/2018/05/23/practica...ing-cmsis-dsp/.

    Porting the example to for the T4 - get the same bin for the max frequency as they got, bin 44. As for the graph - since I am a glutton for punishment I used @KrisKasprzak ILI9341_controls library to generate the following graph:
    Click image for larger version. 

Name:	IMG-0743.jpg 
Views:	14 
Size:	51.5 KB 
ID:	29448
    Looks good!

    I am sure glad that I don't have that same tendency

  13. #13
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    As a quick last test of the updated DSP for the T4 I ran the Waterfall waveform audio example using this as the input sounds: https://www.youtube.com/watch?v=PAsMlDptjx8. Still worked even with the updated DSP. Ok done playing.

  14. #14
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Since I am the curious type wanted to see if the DSP Benchmark Sketch or the sketch in post #11 would run on the T3.x. Should run un-modified. What I found was that for the T3.x's looks like it uses a subset of the FFT commands in the DSP Library. Was gettting this error:
    Code:
    fatal error: arm_const_structs.h: No such file or directory
        4 | #include "arm_const_structs.h"
    and sure enough that header is only included in the Teensy4 core and not the Teensy3 core - @PaulStoffregen I am assuming that was the intention here?

    Anyway if I update the Teensy3 core the same way I did the Teensy4 core here are the comparisons for the sketch in post11:
    Code:
    Teensy 3.2 (96 Mhz)
    ---------------------------------------------------
    Press anykey to continue
    time to copy test array to FFT Array: 132
    64
    Time to setup window: 1782 (microseconds)
    Press anykey to continue
    FFT in Frequency Domain next
    FFT init 0
    Time to perform 2048 FFT: 52388 (microseconds)
    Index: 128, freq: 500.000000,  MaxValue: 247.432098
    =============================================
    Teensy 3.6 (180Mhz)
    ---------------------------------------------------
    Press anykey to continue
    time to copy test array to FFT Array: 70
    64
    Time to setup window: 185 (microseconds)
    Press anykey to continue
    FFT in Frequency Domain next
    FFT init 0
    Time to perform 2048 FFT: 1103 (microseconds)
    Index: 128, freq: 500.000000,  MaxValue: 247.432114
    
    ===========================================
    Teensy MicroMod (600Mhz)
    ---------------------------------------------------
    Press anykey to continue
    time to copy test array to FFT Array: 9
    64
    Time to setup window: 36 (microseconds)
    Press anykey to continue
    FFT in Frequency Domain next
    FFT init 0
    Time to perform 2048 FFT: 203 (microseconds)
    Index: 128, freq: 500.000000,  MaxValue: 247.432098
    Unfortunately if you try to use even the FFT audio example there is not enough space for it compile even the core is not updated.

    Out of curiosity I change the Clock to 150Mhz on the Teensy Micromod:
    Code:
    Press anykey to continue
    time to copy test array to FFT Array: 35
    64
    Time to setup window: 147 (microseconds)
    Press anykey to continue
    FFT in Frequency Domain next
    FFT init 0
    Time to perform 2048 FFT: 803 (microseconds)
    Index: 128, freq: 500.000000,  MaxValue: 247.432098
    Still seems to be faster than the T3.6 at 180Mhz,

  15. #15
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Comparisons for the DSP_Benchmark sketch:
    Code:
    Teensy 3.2 (96 Mhz)
    ---------------------------------------------------
    - arm_mult_f32         -  4.763 us ; // real float32        8
    - arm_mult_f32         - 31.081 us ; // real float32       64
    - arm_mult_f32         - 121.313 us ; // real float32      256
    - arm_mult_f32         - 482.219 us ; // real float32     1024
    - arm_mult_q31         -  1.327 us ; // real q31            8
    - arm_mult_q31         -  7.760 us ; // real q31           64
    - arm_mult_q31         - 29.816 us ; // real q31          256
    - arm_mult_q31         - 118.040 us ; // real q31         1024
    - arm_mult_q15         -  1.181 us ; // real q15            8
    - arm_mult_q15         -  6.152 us ; // real q15           64
    - arm_mult_q15         - 23.196 us ; // real q15          256
    - arm_mult_q15         - 91.366 us ; // real q15         1024
    - arm_sin_cos_f32      - 28.143 us ; // real float32                
    - arm_sin_cos_q31      -  2.965 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -   55.0 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -  255.8 us ; // real q15_t            256
    - arm_cfft_radix2_q15  - 1161.7 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -   30.9 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -  152.2 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -  732.1 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -  113.3 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -  570.7 us ; // real q31_t            256
    - arm_cfft_radix2_q31  - 2776.5 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -   63.6 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -  326.3 us ; // real q31_t            256
    - arm_cfft_radix4_q31  - 1592.7 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -  930.8 us ; // real float32_t         64
    - arm_cfft_radix2_f32  - 5170.9 us ; // real float32_t        256
    - arm_cfft_radix2_f32  - 27314.8 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -  654.5 us ; // real float32_t         64
    - arm_cfft_radix4_f32  - 3772.4 us ; // real float32_t        256
    - arm_cfft_radix4_f32  - 21554.9 us ; // real float32_t       1024
    - arm_cfft_q15         -  732.1 us ; // real q15_t           1024
    - arm_cfft_q31         - 1582.2 us ; // real q31_t           1024
    - arm_cfft_f32         - 20333.8 us ; // real float32_t       1024
    - arm_rfft_fast_f32    -  527.7 us ; // real float32_t         64
    - arm_rfft_fast_f32    - 2702.8 us ; // real float32_t        256
    - arm_rfft_fast_f32    - 13909.8 us ; // real float32_t       1024
    
    
    =============================================
    Teensy 3.6 (180Mhz)
    ---------------------------------------------------
    - arm_mult_f32         -  0.606 us ; // real float32        8
    - arm_mult_f32         -  3.564 us ; // real float32       64
    - arm_mult_f32         - 13.705 us ; // real float32      256
    - arm_mult_f32         - 54.269 us ; // real float32     1024
    - arm_mult_q31         -  0.717 us ; // real q31            8
    - arm_mult_q31         -  4.142 us ; // real q31           64
    - arm_mult_q31         - 15.884 us ; // real q31          256
    - arm_mult_q31         - 62.853 us ; // real q31         1024
    - arm_mult_q15         -  0.628 us ; // real q15            8
    - arm_mult_q15         -  3.275 us ; // real q15           64
    - arm_mult_q15         - 12.348 us ; // real q15          256
    - arm_mult_q15         - 48.641 us ; // real q15         1024
    - arm_sin_cos_f32      -  0.640 us ; // real float32                
    - arm_sin_cos_q31      -  0.941 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -   26.8 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -  127.5 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  607.5 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -   15.3 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -   77.0 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -  376.6 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -   59.3 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -  303.0 us ; // real q31_t            256
    - arm_cfft_radix2_q31  - 1484.3 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -   31.0 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -  165.0 us ; // real q31_t            256
    - arm_cfft_radix4_q31  -  822.9 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -   42.7 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -  220.3 us ; // real float32_t        256
    - arm_cfft_radix2_f32  - 1082.1 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -   22.8 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -  116.0 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -  564.6 us ; // real float32_t       1024
    - arm_cfft_q15         -  360.9 us ; // real q15_t           1024
    - arm_cfft_q31         -  764.0 us ; // real q31_t           1024
    - arm_cfft_f32         -  538.0 us ; // real float32_t       1024
    - arm_rfft_fast_f32    -   22.1 us ; // real float32_t         64
    - arm_rfft_fast_f32    -   98.5 us ; // real float32_t        256
    - arm_rfft_fast_f32    -  388.0 us ; // real float32_t       1024
    
    ===========================================
    Teensy MicroMod (600Mhz)
    ---------------------------------------------------
    - arm_mult_f32         -  0.093 us ; // real float32        8
    - arm_mult_f32         -  0.421 us ; // real float32       64
    - arm_mult_f32         -  1.541 us ; // real float32      256
    - arm_mult_f32         -  6.022 us ; // real float32     1024
    - arm_mult_q31         -  0.182 us ; // real q31            8
    - arm_mult_q31         -  1.069 us ; // real q31           64
    - arm_mult_q31         -  4.109 us ; // real q31          256
    - arm_mult_q31         - 16.269 us ; // real q31         1024
    - arm_mult_q15         -  0.153 us ; // real q15            8
    - arm_mult_q15         -  0.805 us ; // real q15           64
    - arm_mult_q15         -  3.045 us ; // real q15          256
    - arm_mult_q15         - 12.006 us ; // real q15         1024
    - arm_sin_cos_f32      -  0.170 us ; // real float32                
    - arm_sin_cos_q31      -  0.180 us ; // real q31_t                  
    - arm_cfft_radix2_q15  -    5.8 us ; // real q15_t             64
    - arm_cfft_radix2_q15  -   28.0 us ; // real q15_t            256
    - arm_cfft_radix2_q15  -  133.9 us ; // real q15_t           1024
    - arm_cfft_radix4_q15  -    2.8 us ; // real q15_t             64
    - arm_cfft_radix4_q15  -   14.5 us ; // real q15_t            256
    - arm_cfft_radix4_q15  -   71.2 us ; // real q15_t           1024
    - arm_cfft_radix2_q31  -    8.7 us ; // real q31_t             64
    - arm_cfft_radix2_q31  -   44.0 us ; // real q31_t            256
    - arm_cfft_radix2_q31  -  214.5 us ; // real q31_t           1024
    - arm_cfft_radix4_q31  -    4.2 us ; // real q31_t             64
    - arm_cfft_radix4_q31  -   22.7 us ; // real q31_t            256
    - arm_cfft_radix4_q31  -  114.6 us ; // real q31_t           1024
    - arm_cfft_radix2_f32  -    5.4 us ; // real float32_t         64
    - arm_cfft_radix2_f32  -   27.5 us ; // real float32_t        256
    - arm_cfft_radix2_f32  -  135.1 us ; // real float32_t       1024
    - arm_cfft_radix4_f32  -    3.2 us ; // real float32_t         64
    - arm_cfft_radix4_f32  -   17.1 us ; // real float32_t        256
    - arm_cfft_radix4_f32  -   86.3 us ; // real float32_t       1024
    - arm_cfft_q15         -   71.0 us ; // real q15_t           1024
    - arm_cfft_q31         -  114.6 us ; // real q31_t           1024
    - arm_cfft_f32         -   87.0 us ; // real float32_t       1024
    - arm_rfft_fast_f32    -    3.4 us ; // real float32_t         64
    - arm_rfft_fast_f32    -   15.0 us ; // real float32_t        256
    - arm_rfft_fast_f32    -   60.7 us ; // real float32_t       1024

  16. #16
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,729
    Thanks for all the heavy lifting to get DSP libs updated and built!!

    I tracked DSP arm_math.h versions from the comments in arm_math.h. In your github, version is "Revision: V.1.5.1", in the CMSIS-DSP/Include/arm_math.h repository it is "@version V1.10.0" and forum post suggests DSP-1.12

    I realize it is cosmetic, but what do you think the proper version number is for your arm_math.h?

  17. #17
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    8,465
    Quote Originally Posted by manitou View Post
    Thanks for all the heavy lifting to get DSP libs updated and built!!

    I tracked DSP arm_math.h versions from the comments in arm_math.h. In your github, version is "Revision: V.1.5.1", in the CMSIS-DSP/Include/arm_math.h repository it is "@version V1.10.0" and forum post suggests DSP-1.12

    I realize it is cosmetic, but what do you think the proper version number is for your arm_math.h?
    Well thats kind of an interesting question. Arm-math.h is really different than before, in 1.10 its:
    Code:
    #include "arm_math_types.h"
    #include "arm_math_memory.h"
    
    #include "dsp/none.h"
    #include "dsp/utils.h"
    
    #include "dsp/basic_math_functions.h"  
    #include "dsp/interpolation_functions.h"
    #include "dsp/bayes_functions.h"
    #include "dsp/matrix_functions.h"
    #include "dsp/complex_math_functions.h"
    #include "dsp/statistics_functions.h"
    #include "dsp/controller_functions.h"
    #include "dsp/support_functions.h"
    #include "dsp/distance_functions.h"
    #include "dsp/svm_functions.h"
    #include "dsp/fast_math_functions.h"
    #include "dsp/transform_functions.h"
    #include "dsp/filtering_functions.h"
    #include "dsp/quaternion_math_functions.h"
    Most of what is currently in arm_math now is in arm_math_types. So I kind of just left it the at the current rev.

    Probably should spend a bit more time on it but no one else seemed to be playing along.

    The other thing is that for NEON processors it will include Bayes functions and for HELIUM processors it will include vector math and MVE functions which I currently included.

    One question for you though is currently it does support float64+t but that is not defined for teensy

    If you have any suggestions or recommendations I can make the changes. Relatively easy to recompile now that I broke the code

    Mike

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •