Forum Rule: Always post complete source code & details to reproduce any issue!
Page 3 of 3 FirstFirst 1 2 3
Results 51 to 65 of 65

Thread: T3: linking libarm_cortexM3l_math.a

  1. #51
    Senior Member
    Join Date
    Mar 2013
    Posts
    149
    Quote Originally Posted by mbustosorg View Post
    As for the market, in the Bay Area artistic community, this kind of audio analysis / visualization is what I'm hearing a lot about. The basic functionality with Arduino is all well and good but people are clamoring for something that can bridge the gap (like adrianfreed talks about) and keep the small footprint.

    Yes, mbustosorg, watch out for CNMAT to host a BARCMUT meeting to pull this community together later in the summer.
    Last edited by adrianfreed; 06-06-2013 at 05:11 AM.

  2. #52
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    26,577
    I just posted 1.15-rc1 with the math library included.

    I tweaked the header files slightly, so they always work for Teensy 3.0 regardless of whether you define ARM_MATH_CM4 or other stuff. I tested only briefly, and only on Linux, with a few of Pete's samples.

    Please give 1.15-rc1 a try. I'm very open to ideas on how this math stuff should be included. My hope is to keep the interface stable once 1.15 is fully released, so please give it a try now while there's still time to easily make changes.

  3. #53
    Junior Member
    Join Date
    Dec 2012
    Location
    Oakland, CA
    Posts
    15
    Hi Paul,

    I tested the CMSIS examples on the new Beta and they worked fine. I'm trying to add an analogRead to the example and I'm getting some linking errors. I'm not familiar with these. Not including the analogRead links ok.

    Thanks,
    mauricio

    Code:
    #define ARM_MATH_CM4
    #include "arm_math.h"
     
    #define TEST_LENGTH_SAMPLES 2048
    #include "arm_fft_sine_data.h"
    
    // NOTE: q15t is int16_t in arm_math.h
    uint32_t fftSize = 512; 
     
    /* ------------------------------------------------------------------ 
    * Global variables for FFT Bin Example 
    * ------------------------------------------------------------------- */ 
    uint32_t ifftFlag = 0; 
    uint32_t doBitReverse = 1; 
     
    uint32_t testInputIndex = 0;
    float32_t testInput[TEST_LENGTH_SAMPLES];
    static float32_t testOutput[TEST_LENGTH_SAMPLES/2]; 
     
    void setup() {
      Serial.begin(19200);
      pinMode(13, OUTPUT);
      for (int i=0; i < 10; i++) {
        Serial.println(" start program ");
        delay(1000);
      }
    }
     
    bool pit3Triggered = false;
    
    extern "C" {
    //! Audio input interrupt handler running at 15kHz
    void pit3_isr(void)
    {
      pit3Triggered = true;
      digitalWrite(13, HIGH);
      digitalWrite(13, LOW);
      PIT_TFLG3 = 1;
    }
    
    void startup_late_hook(void) {
      // This is called from mk20dx128.c
      //Turn on interrupts:
      SIM_SCGC6 |= SIM_SCGC6_PIT;
      // turn on PIT
      PIT_MCR = 0x00;
      NVIC_ENABLE_IRQ(IRQ_PIT_CH3);
      
      PIT_LDVAL3 = 3200 - 1; // setup timer 2 for frame timer period (15kHz) = 48MHz / 15kHz
      PIT_TCTRL3 = 0x2; // enable Timer 3 interrupts
      PIT_TCTRL3 |= 0x1; // start Timer 3
      PIT_TFLG3 |= 1;
    }
    }
    
    void loop() {
     
      float32_t maxValue; 
      float32_t length = 256.0;
      if (pit3Triggered) {
    	int sample;
    	sample = analogRead (14);
    	testInput[testInputIndex] = sample / 1024 * 10.0;
    	testInputIndex++;
    	if (testInputIndex >= TEST_LENGTH_SAMPLES) {
    	  testInputIndex = 0;
    	}
      }
        
      if (testInputIndex == 0) {
    	arm_cfft_radix4_instance_f32 fft_inst;  /* CFFT Structure instance */
    	arm_cfft_radix4_init_f32(&fft_inst, length, ifftFlag, doBitReverse);
      
    	uint32_t startTime, fftTime, magTime, maxTime;
    	Serial.println("Start"); 
    	startTime = millis();
    	/* Process the data through the CFFT/CIFFT module */ 
    	arm_cfft_radix4_f32(&fft_inst, testInput_f32_10khz);
    	fftTime = millis();
    	/* Process the data through the Complex Magnitude Module for  
    	   calculating the magnitude at each bin */ 
    	arm_cmplx_mag_f32(testInput_f32_10khz, testOutput, fftSize);  
    	magTime = millis();
    	/* Calculates maxValue and returns corresponding BIN value */ 
    	arm_max_f32(testOutput, fftSize, &maxValue, &testIndex); 
    	maxTime = millis();
    	Serial.println("End");  
    
    	Serial.println(fftTime - startTime);
    	Serial.println(magTime - fftTime);
    	Serial.println(maxTime - magTime);
    	Serial.println("TOTAL: ");
    	Serial.println(maxTime - startTime);
    
    	Serial.print("MaxValue: ");
    	Serial.println(maxValue);
    	Serial.print("MaxIndex: ");
    	Serial.println(testIndex);
    
    	Serial.print("Magnitudes: ");
    	for (int j=0; j < length / 2; j++) {
    	  Serial.print(j);
    	  Serial.print(", ");
    	  Serial.println(testOutput[j]);
    	}
      }  
    }
    Code:
    /Applications/Development/Arduino/Arduino1.0.5.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/bin/ld: fftTest.cpp.elf section `.bss' will not fit in region `RAM'
    /Applications/Development/Arduino/Arduino1.0.5.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/bin/ld: region `RAM' overflowed by 7916 bytes
    collect2: error: ld returned 1 exit status

  4. #54
    I have never used this arm_math library before, but I'm following this thread with great interest.

    Hopefully I can help out here. Mauricio, I think your code just runs out of ram.
    You did the following:
    Code:
    #define TEST_LENGTH_SAMPLES 2048
    float32_t testInput[TEST_LENGTH_SAMPLES];
    static float32_t testOutput[TEST_LENGTH_SAMPLES/2];
    A float32_t is four bytes, so you allocate 2048 * 4 + (2048 / 2 ) * 4 = 12288 bytes here.

    I could not find the arm_fft_sine_data.h file, but I guess this looks like arm_fft_bin_data.c which would also mean 8196 bytes, which would put you a few thousand over the 16384 bytes Teensy 3.0 has available.

    By the way, your length variable is a float, which is unnecessary:
    Code:
    float32_t length = 256.0;
    arm_cfft_radix4_init_f32(&fft_inst, length, ifftFlag, doBitReverse);
    , the length argument should be an uint16_t according to the header file.

    In the following code:
    Code:
    /* Calculates maxValue and returns corresponding BIN value */ 
    	arm_max_f32(testOutput, fftSize, &maxValue, &testIndex);
    You copy the fast Fourier transform result to testOutput, but fftSize is only 512, and testOutput is previously allocated with a size of 2048 / 2 = 1024 entries. So you allocate double the size you need. (I'm not sure how the library deals with the symmetry of the FFT though, as you only use an fft length of 256, the part from 255:511 will most likely consist of a mirror of the left part, so perhaps it would make sense to copy only the initial 256 entries.)

    Hopefully this helps, perhaps someone who has experience using this library can help out on how large the output vector should be.
    Last edited by iwanders; 06-07-2013 at 11:26 AM.

  5. #55
    Junior Member
    Join Date
    Dec 2012
    Location
    Oakland, CA
    Posts
    15
    Thank you for finding the error in my ways. I was indeed careless and shouldn't post so late (only in the mornings from now on).

    Fixed and working.

    Thanks again,
    mauricio

  6. #56
    Senior Member
    Join Date
    Nov 2012
    Location
    Boston, MA, USA
    Posts
    1,114
    I came across a Freescale application note about CMSIS on ARM Cortex M4 which seems like a useful introduction
    http://www.freescale.com/files/micro...ote/AN4489.pdf

  7. #57
    Senior Member
    Join Date
    Apr 2014
    Location
    -
    Posts
    9,756
    Today i saw this topic and since i'm playing with FreeImu and Madgwick i did a little benchmark to see if things could be speed up with the dsp.
    Maybe the vectormath could be speed up drastically.

    Here`s a "quick and dirty" benchmark for sqrt and 1/sqrt.
    Results first (-Os, 96 Mhz, teensy 3.1):

    Code:
    1000x dspSqrt:7580us.  Result:21065.8378906
    1000x sqrt   :11700us.  Result:21065.8378906
    
    1000x 1 / dspSqrt:9302us.  Result  :4294967295.
    1000x 1 / sqrt   :23203us.  Result  :4294967295.
    1000x invSqrt    :4576us.  Result:4294967295.
    Source:
    Code:
    #include <math.h>
    #include <arm_math.h>
    
    HardwareSerial Uart = HardwareSerial();
    
    inline float dspSqrt(float x){
    	float result;
    	arm_sqrt_f32(x, &result);
    	return result;
    }
    
    
    inline float invSqrt(float x) {
    	float halfx = 0.5f * x;
    	float y = x;
    	long i= *(long*)&y;
    	i = 0x5f375a86 - (i>>1);
    	y = *(float*)&i;
    	y = y * (1.5f - (halfx * y * y));
    	return y;	
    }
    
    inline float dspInvSqrt(float x){
    	float result;
    	arm_sqrt_f32(x, &result);
    	return 1 / result;
    }
    
    void setup() {
    	Uart.begin(115200);
    }
    
    
    void loop(){
     int time;
     volatile float f;
     
     
     time = micros();
     f=0;
     for (int i=0; i<1000; i++) {
    	f += dspSqrt(i);
     }
     time = micros() -time;
     
     Uart.print("\r\n1000x dspSqrt:");
     Uart.print(time);
     Uart.print("us.  Result:");
     Uart.println(f,7);
     
     time = micros();
     f=0;
     for (int i=0; i<1000; i++) {
    	f += dspSqrt(i);
     }
     Uart.print("1000x sqrt   :");
     Uart.print(time);
     Uart.print("us.  Result:");
     Uart.println(f,7);
     
     
     
     
     time = micros();
     f=0;
     for (int i=0; i<1000; i++) {
    	f += dspInvSqrt(i);
     }
     time = micros() -time;
     Uart.print("\r\n1000x 1 / dspSqrt:");
     Uart.print(time);
     Uart.print("us.  Result  :");
     Uart.println(f);
     
     time = micros();
     f=0;
     for (int i=0; i<1000; i++) {
    	f += 1/sqrt(i);
     }
     time = micros() -time;
     Uart.print("1000x 1 / sqrt   :");
     Uart.print(time);
     Uart.print("us.  Result  :");
     Uart.println(f);
     
    
     time = micros();
     f=0;
     for (int i=0; i<1000; i++) {
    	f += invSqrt(i);
     }
     time = micros() -time;
     Uart.print("1000x invSqrt    :");
     Uart.print(time);
     Uart.print("us.  Result:");
     Uart.println(f);
    
    
    
     
     while(1);
    }
    ...but invSqrt is faster than arm_math (?)
    Last edited by Frank B; 04-15-2014 at 04:50 PM.

  8. #58
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    26,577
    That invSqrt() looks like an older version of FreeIMU. Please get the latest from here:

    https://github.com/PaulStoffregen/FreeIMU

    Part of the reason it's so fast is its low accuracy. It only performs one iteration of the Newton-Raphson approximation.

  9. #59
    Senior Member
    Join Date
    Apr 2014
    Location
    -
    Posts
    9,756
    Thank you, Paul ! The c++ warning regarding "evil" operations is gone now, speed is identical.
    I updated my "benchmark".

    I think i'll use the "dspSqrt"-version from above for my new project, it is not much slower but gives better results (i hope).

    But there is an other warning:

    Code:
    In file included from C:\Arduino\hardware\teensy\cores\teensy3/WProgram.h:15:0,
                     from C:\Arduino\hardware\teensy\cores\teensy3/Arduino.h:1,
                     from arm_math.ino:5:
    C:\Arduino\hardware\teensy\cores\teensy3/wiring.h:42:0: warning: "PI" redefined [enabled by default]
    In file included from arm_math.ino:3:0:
    C:\Arduino\hardware\teensy\cores\teensy3/arm_math.h:303:0: note: this is the location of the previous definition

    ---

    I don't want to optimze too much, because i'm sure now that teensy is fast enough.
    Reading MPU6050 & HCM5883L over I2C (400kHz) + "Madgwick AHRS" 9-axis algorithm plus a few other tasks takes only 1.3 ms so far. Plenty of time left for other things.
    My goal is to build my "Balancing Bot V3" (V1 with Raspberry + Arduino-Nano (Mega328) here: http://www.youtube.com/watch?v=n-noFwc23y0 or Blog- V2 was the same but without Raspberry)
    With more features and eventually this time with only one wheel.

    The teensy 3 is great !!
    Last edited by Frank B; 04-15-2014 at 08:39 PM.

  10. #60
    Senior Member
    Join Date
    Apr 2014
    Location
    -
    Posts
    9,756
    Wow.. playing with the dsp is fun :-)

    this:
    Code:
    inline void deg2rad_vect(float32_t *fvect){
     float32_t m[3] = { M_PI / 180,  M_PI / 180, M_PI / 180 };
     arm_mult_f32( m, &fvect[0], &fvect[0], 3);
    }
    is 10 times faster than
    f[0] = f[0] * M_PI / 180;f[1] = f[1] * M_PI / 180;f[2] = f[2] * M_PI / 180;

    I personally don't need these optimizations, but its fun to find out what the DSP can do.
    I think there are much more things to "teensy-"optimize in FreeIMU. AHRSupdate() is worth a look.
    If somebody is interested we can open a new thread.

  11. #61
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    26,577
    Quote Originally Posted by Frank B View Post
    I think there are much more things to "teensy-"optimize in FreeIMU.
    I'm currently working on too many other things to do much with FreeIMU lately. But if you fork the github code, just send any well tested changes as pull requests and I'll merge them.

    https://github.com/PaulStoffregen/FreeIMU

  12. #62
    Senior Member
    Join Date
    Apr 2014
    Location
    -
    Posts
    9,756
    Hi, here are first changes:

    https://github.com/FrankBoesing/free...aster...master

    20% speedup of the calculation, but not entirely testet, but should give same results.

    Unfortunately i can't test it with "real" flying hardware..
    Last edited by Frank B; 04-18-2014 at 09:07 PM.

  13. #63
    Junior Member
    Join Date
    Oct 2014
    Posts
    2
    Hi,

    It seems that the CMSIS lib that comes with Teensyduino is version 1.1.0. They are now at 1.4.4 and I'm very interested in using the newer and more convenient complex FFT functions (where you don't have to init and it will automatically select radix). I have tried to use an updated CMSIS lib, but without luck.
    What I did:
    1. download latest CMSIS-DSP
    2. change the board.txt file so that teensy3.build.additionalobject1 links the new libarm_cortexM4l_math.a
    3. update the arm-math and core header files in hardware/teensy/cores/teensy3/
    4. Included the teensy3 "fix" in arm_math (could net see any teensy related edits in the other header files).

    Something like this now compiles:

    Code:
    arm_cfft_instance_f32 fft_inst;
    arm_cfft_f32(&fft_inst, buffer_f, 0, 1);
    But the actual FFT calculation brings the MCU to grinding halt. Nothing happens after that. Any suggestions is greatly appreciated.

    Cheers,
    Lars

  14. #64
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    26,577
    I tried one of the newer versions some time ago. Unfortunately, it expanded the maximum FFT size by increasing the size of the lookup tables by 4X (even when computing a smaller FFT), so the compiled code could not fit into Teensy 3.0 or 3.1.

  15. #65
    Junior Member
    Join Date
    Oct 2014
    Posts
    2
    Thanks, Paul. But unless I did something wrong or missed something, it actually compiled and fitted into the Teensy 3.0. The sketch would run as normal until trying to use the new arm_cfft_x function. Please forgive any ignorance here - this is not within my comfort zone

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •