Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 3 of 3

Thread: Help converting arm assembler for teensy 3.6

  1. #1

    Help converting arm assembler for teensy 3.6

    I have been using this sine approximation by Jasper Vijn as a function, which works well. http://www.coranac.com/2009/07/sines/

    Code:
    //fast sine approximation by Jasper Vijn
    /// A sine approximation via a third-order approx.
    /// @param x    Angle (with 2^15 units/circle)
    /// @return     Sine value (Q12)
    static inline int32_t SIN3(int32_t x) __attribute__((always_inline, unused));
    static inline int32_t SIN3(int32_t x)
    {
        // S(x) = x * ( (3<<p) - (x*x>>r) ) >> s
        // n : Q-pos for quarter circle             13
        // A : Q-pos for output                     12
        // p : Q-pos for parentheses intermediate   15
        // r = 2n-p                                 11
        // s = A-1-p-n                              17
    
        static const int qN = 13, qA= 12, qP= 15, qR= 2*qN-qP, qS= qN+qP+1-qA;
    
        x= x<<(30-qN);          // shift to full s32 range (Q13->Q30)
    
        if( (x^(x<<1)) < 0)     // test for quadrant 1 or 2
            x= (1<<31) - x;
    
        x= x>>(30-qN);
    
        return x * ( (3<<qP) - (x*x>>qR) ) >> qS;
    }
    I would like to use the assembler version, It is supposed to be even faster than a lookup table and Linear interpolation.
    I have no Idea how to translate that in to code that can be used in the arduino IDE. I do use a couple of other assembler functions which I have adapted from the PJRC audio library, they look nothing like this!

    Code:
    @ ARM assembly version, using n=13, p=15, A=12
    
    @ A sine approximation via a third-order approx.
    @ @param r0   Angle (with 2^15 units/circle)
    @ @return     Sine value (Q12)
        .arm
        .align
        .global isin_S3a
    isin_S3a:
        mov     r0, r0, lsl #(30-13)
        teq     r0, r0, lsl #1
        rsbmi   r0, r0, #1<<31
        mov     r0, r0, asr #(30-13)
        mul     r1, r0, r0
        mov     r1, r1, asr #11
        rsb     r1, r1, #3<<15
        mul     r0, r1, r0
        mov     r0, r0, asr #17
        bx      lr
    cheers to anyone who can help me learn how to translate it!

  2. #2
    Senior Member
    Join Date
    Jan 2013
    Posts
    830
    The C version is fine. GCC generates almost identical code compared to the assembly version (uses 'eors' instead of 'teq').

    The C version can be inlined, so it's potentially faster than the assembly version.
    Last edited by tni; 05-19-2017 at 06:25 PM.

  3. #3
    Quote Originally Posted by tni View Post
    The C version is fine. GCC generates almost identical code compared to the assembly version (uses 'eors' instead of 'teq').

    The C version can be inlined, so it's potentially faster than the assembly version.
    Thanks, I will leave it as is for now then.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •