Forum Rule: Always post complete source code & details to reproduce any issue!
Page 1 of 2 1 2 LastLast
Results 1 to 25 of 26

Thread: A little competition :-)

  1. #1
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920

    A little competition :-)

    A little competition:

    You can win nothing but glory, honor, and your mention in the source code.

    Code:
    /* From coder.h, ORIGINAL:
     do y <<= n, clipping to range [-2^30, 2^30 - 1] (i.e. output has one guard bit) 
    */
    #define CLIP_2N_SHIFT(y, n) {                   \
            int sign = (y) >> 31;                   \
            if (sign != (y) >> (30 - (n)))  {       \
                (y) = sign ^ (0x3fffffff);          \
            } else {                                \
                (y) = (y) << (n);                   \
            }                                       \
        }
    Wanted is a terribly fast(er) implementation of this macro:

    The rules:
    a) Understand the macro.
    b) Assembler is expressly permitted, but not a must. If c-code is used: Use GCC 4.8.4
    c) Prove that your solution is faster. And correct.
    d) There may be no quicker solution. That is bad luck.
    e) It is fun to you.

    Explanation: This macro is used in the Helix codecs (compressed audio). Not very often. But I do not like it. :-)

    Edit : it's for Teensy 3. y is signed int (32 Bit), n too, but only positive values for n. We use GCC 4.8.4.
    n is NOT compile-time constant.
    Last edited by Frank B; 12-04-2014 at 09:29 PM.

  2. #2
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    22,492
    Is "n" a variable or compile-time constant?

  3. #3
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    n is a variable and not constant.
    hey it should'nt be too easy :-)

  4. #4
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    3,831
    Be sure to label your code as for ARM only (int is 16-its on AVR platforms), or perhaps change the int's to int32_t or long. Obviously if people write in ASM, it will tie you to the ARM 32-bit architecture.

    Also, on some platforms, signed shift right might not sign extend, but those systems may be fairly rare these days. I wrote a C front end for a machine (long since RIP) that did not do arithmetic shifts, and the ISO C standard explicitly says that it is implementation defined. Most implementations support arithmetic shifts, but you are not guaranteed about this.

  5. #5
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Ok, it's for Teensy 3. y is signed int (32 Bit), n too, but only positive values for n. We use GCC 4.8.4.
    Last edited by Frank B; 12-04-2014 at 09:28 PM.

  6. #6
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    22,492
    Well, I just tried ARM's SSAT instruction, and I can confirm it does not work.

    ARM's documentation isn't clear (at least to me) about how it saturates. But a little testing shows the shift is done first without saturation, and then the result is saturated.

  7. #7
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Yes, i tried SSAT too, with no luck. OK: i'm willing to accept SSAT - if, at the end of the day - it sounds (the sound-output .. uh..my english is not good enough) )equal or better. Maybe this is the case ?
    There is an other Macro "CLIPTOSHORT", and i used SSAT there. It did not make any difference.

    Edit: i'm NOT a thumb-assembler-expert.
    Last edited by Frank B; 12-04-2014 at 09:42 PM.

  8. #8
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    22,492
    What values are actually used for "n" ?

    If it's only a few cases, perhaps each can be optimized?

  9. #9
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    It's midnight here.. i have to go to work tomorrow...so please understand, that i can't answer more questions now :-)

  10. #10
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Quote Originally Posted by PaulStoffregen View Post
    What values are actually used for "n" ?

    If it's only a few cases, perhaps each can be optimized?
    Whoops , to be honest : I do not know.

  11. #11
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    Not, that I can contribute to this optimization, but I find interesting, what the complier does
    Code:
    #define CLIP_2N_SHIFT(y, n) {   \
        int sign = (y)>>31;         \
        if(sign != (y)>>(30-(n))){  \
            (y)=sign ^ 0x3fffffff;  \
        } else {                    \
            (y)=(y)<<(n);           \
        }                           \
    }
    
    int F1(int y, int n)
    { 	CLIP_2N_SHIFT(y,n);
    	return y;
    }
    becomes (in principle)
    Code:
      15              		.text
      16              		.align	2
      17              		.global	F1
      18              		.thumb
      19              		.thumb_func
      20              		.type	F1, %function
      21              	F1:
      22              		@ args = 0, pretend = 0, frame = 0
      23              		@ frame_needed = 0, uses_anonymous_args = 0
      24              		@ link register save eliminated.
      25 0000 C1F11E02 		rsb	r2, r1, #30
      26 0004 40FA02F2 		asr	r2, r0, r2
      27 0008 C317     		asrs	r3, r0, #31
      28 000a 9A42     		cmp	r2, r3
      29 000c 1ABF     		itte	ne
      30 000e 83F04040 		eorne	r0, r3, #-1073741824
      31 0012 C043     		mvnne	r0, r0
      32 0014 8840     		lsleq	r0, r0, r1
      33 0016 7047     		bx	lr
      34              		.size	F1, .-F1
      35              		.ident	"GCC: (GNU Tools for ARM Embedded Processors) 4.8.4 20140725 (release) [ARM/embedded-4_8-br
    It is always instructive to see what assembler will do with your code.

  12. #12
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    It seems that the following macro is about 6% faster
    Code:
    #define CLIP_2N_SHIFT2(y, n) {            \
    	(y) = (y)<<(n);	                  \
    	if((y)>  1<<30 -1) (y) =  1<<30-1;\
    	if((y)< -1<<30)    (y) = -1<<30;  \
    }
    Also, the numbers of instructions are fewer.

  13. #13
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Quote Originally Posted by WMXZ View Post
    It seems that the following macro is about 6% faster
    Code:
    #define CLIP_2N_SHIFT2(y, n) {            \
    	(y) = (y)<<(n);	                  \
    	if((y)>  1<<30 -1) (y) =  1<<30-1;\
    	if((y)< -1<<30)    (y) = -1<<30;  \
    }
    Also, the numbers of instructions are fewer.
    Further proof that simple code also is better: -)
    Code:
    	lsls	r0, r0, r1
    	cmp	r0, #536870912
    	bgt	.L7
    	cmp	r0, #-1073741824
    	it	lt
    	movlt	r0, #-1073741824
    	bx	lr
    .L7:
    	mov	r0, #536870912
    	bx	lr
    Btw, GCC 4.9 produces the same output.

  14. #14
    Junior Member
    Join Date
    Apr 2014
    Location
    Seattle, WA
    Posts
    12
    You forgot that the original code correctly saturates when the left-shift overflows. It's the same problem that Paul describes above with __SSAT(y<<n, 31). Bits are lost during the shift before saturation takes place.

    Here is an alternate method:

    Code:
    int CLIP_2N_SHIFT(int x, int n)
    { 
    	int y = __SSAT(x << n, 31);
    	if (x != y >> n)
    		y = 0x3fffffff ^ (x >> 31);
    	return y;
    }
    that can be implemented in two less instructions than the original, if the constant is hoisted outside the loop:

    Code:
    // outside the loop
            MVN      r12,#0xc0000000
    
    // r2 = CLIP_2N_SHIFT(r0, r1)
            LSL      r2,r0,r1
            SSAT     r2,#31,r2
            ASR      r1,r2,r1
            CMP      r1,r0
            IT       NE
            EORNE    r2,r12,r0,ASR #31

  15. #15
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Thanks sublinear,
    i don't think that the constant can moved outside.
    But one cycle less is one cycle less ( Again , it 's not worth it to really optimize it, but it's a brain teaser :- )

    There is a similar macro, which is used more often, without shift:
    Code:
    //clip to [-2^n, 2^n-1], valid range of n = [1, 30]
    #define CLIP_2N(y, n) { \
    	int sign = (y) >> 31;  \
    	if (sign != (y) >> (n))  { \
    		(y) = sign ^ ((1 << (n)) - 1); \
    	} \
    }

  16. #16
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    Quote Originally Posted by Frank B View Post
    But one cycle less is one cycle less ( Again , it 's not worth it to really optimize it, but it's a brain teaser :- )
    Nevertheless, it was a nice exercise and I refreshed my (limited) Assembler skills.
    Also concerning optimization, here a bit, there a bit, could result in something useful.

    BTW: Anyone knows how to program in C, or tell the compiler the following:
    Code:
            IT       NE
            EORNE    r2,r12,r0,ASR #31

  17. #17
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Quote Originally Posted by WMXZ View Post
    Nevertheless, it was a nice exercise and I refreshed my (limited) Assembler skills.
    Also concerning optimization, here a bit, there a bit, could result in something useful.
    I agree.

    My Assembler skills are 20..25 years old (z80, 6502)..since then..nothing. The only ARM on the market was the ACORN., i think....

    Code:
            IT       NE
            EORNE    r2,r12,r0,ASR #31
    AFAIK: IT is "IF-THEN", make the next instruction(s) conditional.It is not a "real" instruction and does not need space.
    EOR NE is then the conditional instruction, only executed when NE flag is set ( i can be wrong )

    So.. i think there is no direct translation to c.
    EOR is ^ in C

    In pseudocode: if last operation was (edit: NOT) "eqal" then r2 = r12 ^ (r0 << 31) (i hope this is correct ?!)
    Last edited by Frank B; 12-05-2014 at 08:05 PM.

  18. #18
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    @Frank
    Is the following not what you wanted (here only for bit 9 and with positive and negative numbers), or do I miss something?

    Code:
    int nn;
    void setup()
    { 
    }
    
    void loop()
    {
      int ii,jj,nn;
      char text[80];
      int yy1,yy2;
    
      while(1)
      {  delay(1000);
        nn=2;
        for(ii=0; ii<100;ii+=5)
        {
          asm volatile("ssat %0, %1, %2" : "=r" (yy1) : "I" (9), "r" (-ii<<nn));
          asm volatile("ssat %0, %1, %2" : "=r" (yy2) : "I" (9), "r" (ii<<nn));
          sprintf(text,"%d %d %x %d %x",ii,yy1,yy1,yy2,yy2);
          Serial.println(text);
        }
      }
    }
    which results to
    Code:
    0 0 0 0 0
    5 -20 ffffffec 20 14
    10 -40 ffffffd8 40 28
    15 -60 ffffffc4 60 3c
    20 -80 ffffffb0 80 50
    25 -100 ffffff9c 100 64
    30 -120 ffffff88 120 78
    35 -140 ffffff74 140 8c
    40 -160 ffffff60 160 a0
    45 -180 ffffff4c 180 b4
    50 -200 ffffff38 200 c8
    55 -220 ffffff24 220 dc
    60 -240 ffffff10 240 f0
    65 -256 ffffff00 255 ff
    70 -256 ffffff00 255 ff
    75 -256 ffffff00 255 ff
    80 -256 ffffff00 255 ff
    85 -256 ffffff00 255 ff
    90 -256 ffffff00 255 ff
    95 -256 ffffff00 255 ff
    It seems to me, that there is no need to check on negative numbers, or am I wrong?

    The assembler has only two lines (apart from moving data to and from registers)
    Code:
      60 0024 04FA02F2 		lsl	r2, r4, r2
      61              	@ 110 "src/main.c" 1
      62 0028 02F30802 		ssat r2, #9, r2
      63              	@ 0 "" 2
      64              		.thumb

  19. #19
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Quote Originally Posted by WMXZ View Post
    Code:
      62 0028 02F30802 		ssat r2, #9, r2
    almost..but now replace #9 with 2n

  20. #20
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    Quote Originally Posted by Frank B View Post
    almost..but now replace #9 with 2n
    Is my #9 (limiting to +255/-256) not equivalent to your (2^30-1/ -2^30) of post #1?

    The left shift is, because not constant, anyway outside ssat.

  21. #21
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Code:
    int nn;
    void setup()
    { 
      delay(1000);
    }
    
    /* From coder.h, ORIGINAL:
     do y <<= n, clipping to range [-2^30, 2^30 - 1] (i.e. output has one guard bit) 
    */
    #define CLIP_2N_SHIFT(y, n) {                   \
            int sign = (y) >> 31;                   \
            if (sign != (y) >> (30 - (n)))  {       \
                (y) = sign ^ (0x3fffffff);          \
            } else {                                \
                (y) = (y) << (n);                   \
            }                                       \
        }
    
    void loop()
    {
      int ii,nn;
      int yy1,yy2;
    
      while(1)
      {  
        Serial.println();
        delay(1000);
        nn=0;
        for(ii=((1<<30)-5); ii<((1<<30)+5);ii++)
        {
          asm volatile("ssat %0, %1, %2" : "=r" (yy1) : "I" (30), "r" (-ii<<nn));
          asm volatile("ssat %0, %1, %2" : "=r" (yy2) : "I" (30), "r" (ii<<nn));
          int z1 = ii;
          CLIP_2N_SHIFT(z1,nn);
          int z2 = -ii;
          CLIP_2N_SHIFT(z2,nn);
          Serial.printf("%x: %d %x %d %x   !=    %x %x\r\n",ii,yy1,yy1,yy2,yy2, z1, z2);
     
        }
      }
    }
    hmm...

  22. #22
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    Quote Originally Posted by Frank B View Post
    [code]

    hmm...
    but replace by 30 by 31 (one bit above the maximum number, in my previous case 9 for -256 to 255 (256 =2^8, right?)
    as in
    Code:
    int nn;
    void setup()
    { 
      delay(1000);
    }
    
    /* From coder.h, ORIGINAL:
     do y <<= n, clipping to range [-2^30, 2^30 - 1] (i.e. output has one guard bit) 
    */
    #define CLIP_2N_SHIFT(y, n) {                   \
            int sign = (y) >> 31;                   \
            if (sign != (y) >> (30 - (n)))  {       \
                (y) = sign ^ (0x3fffffff);          \
            } else {                                \
                (y) = (y) << (n);                   \
            }                                       \
        }
    
    void loop()
    {
      int ii,nn;
      int yy1,yy2;
    
      while(1)
      {  
        Serial.println();
        delay(1000);
        nn=0;
        for(ii=((1<<30)-5); ii<((1<<30)+5);ii++)
        {
          asm volatile("ssat %0, %1, %2" : "=r" (yy1) : "I" (31), "r" (-ii<<nn));
          asm volatile("ssat %0, %1, %2" : "=r" (yy2) : "I" (31), "r" (ii<<nn));
          int z1 = -ii;
          CLIP_2N_SHIFT(z1,nn);
          int z2 = ii;
          CLIP_2N_SHIFT(z2,nn);
          Serial.printf("%x: %x %x   ==    %x %x\r\n",ii,yy1,yy2, z1, z2);
     
        }
      }
    }
    na also

  23. #23
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    6,920
    Quote Originally Posted by WMXZ View Post
    but replace by 30 by 31 (one bit above the maximum number, in my previous case 9 for -256 to 255 (256 =2^8, right?)
    as in
    Code:
    int nn;
    void setup()
    { 
      delay(1000);
    }
    
    /* From coder.h, ORIGINAL:
     do y <<= n, clipping to range [-2^30, 2^30 - 1] (i.e. output has one guard bit) 
    */
    #define CLIP_2N_SHIFT(y, n) {                   \
            int sign = (y) >> 31;                   \
            if (sign != (y) >> (30 - (n)))  {       \
                (y) = sign ^ (0x3fffffff);          \
            } else {                                \
                (y) = (y) << (n);                   \
            }                                       \
        }
    
    void loop()
    {
      int ii,nn;
      int yy1,yy2;
    
      while(1)
      {  
        Serial.println();
        delay(1000);
        nn=0;
        for(ii=((1<<30)-5); ii<((1<<30)+5);ii++)
        {
          asm volatile("ssat %0, %1, %2" : "=r" (yy1) : "I" (31), "r" (-ii<<nn));
          asm volatile("ssat %0, %1, %2" : "=r" (yy2) : "I" (31), "r" (ii<<nn));
          int z1 = -ii;
          CLIP_2N_SHIFT(z1,nn);
          int z2 = ii;
          CLIP_2N_SHIFT(z2,nn);
          Serial.printf("%x: %x %x   ==    %x %x\r\n",ii,yy1,yy2, z1, z2);
     
        }
      }
    }
    na also
    Ok.. hast du nn=1 versucht ?

  24. #24
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,721
    Quote Originally Posted by Frank B View Post
    Ok.. hast du nn=1 versucht ?
    OK, one has to test also if shift overflows the 32 bit boundary. So it comes back to sublinear's solution.

  25. #25
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    22,492
    Does this need to support all 31 possible values of "n"? Or are only certain, compile-time constant values used?

    If only 2 or 3 values are ever used for n, and they're constants, maybe each case could be optimized separately?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •