Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 12 of 12

Thread: Bitcrush library is not correct

  1. #1
    Member
    Join Date
    Aug 2016
    Location
    Texas
    Posts
    31

    Bitcrush library is not correct

    I just wanted to point out that the Bitcrush library is incorrect, due to the way bits are shifted right and then left to truncate the lower bits.

    The problem with this method is that although the variable is an unsigned int, the value is still 2's compliment. So when the sign bit is set to 1, 1's should be entered into the unused bits instead of 0's.

    As the library currently is, if no signal is feeding the bitcrusher, the more bits that are reduced the louder the noise becomes.

    I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.

  2. #2
    Senior Member
    Join Date
    Jul 2014
    Posts
    3,316
    Quote Originally Posted by FRS View Post
    I just wanted to point out that the Bitcrush library is incorrect, due to the way bits are shifted right and then left to truncate the lower bits.

    The problem with this method is that although the variable is an unsigned int, the value is still 2's compliment. So when the sign bit is set to 1, 1's should be entered into the unused bits instead of 0's.

    As the library currently is, if no signal is feeding the bitcrusher, the more bits that are reduced the louder the noise becomes.

    I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.
    I know Bitcrush only from Wikipedia, but if your observation is correct, why not simply applying the shifts to signed integers?
    Last edited by WMXZ; 08-08-2016 at 03:46 PM.

  3. #3
    Member
    Join Date
    Aug 2016
    Location
    Texas
    Posts
    31
    I can only assume that the author of the bitcrush library was trying to simplify the code by shifting bits right and then left to clear out unused bits. In their comments an unsigned int is used so the sign bit is not bringing in ones when shifting right.

    If you only shift right/left when the sign bit is set, then you are only affecting the "-" portion of the waveform, and you would still be bringing in 0's instead of 1's when shifting left.

    Rather than shift right/left to remove the unused bits, a better solution is to use a for-loop to clear/set bits one at a time from bit 0 to bit ? defined by the crushBit value. Also testing first if sampleSquidge bit 15 is 0, then clear the unused bits, else, set the unused bits.

    Notice that by throwing away LSB's using a process like bitcrush, you create a sort of noise gate for any lower level signals as well.
    But as the library currently is, the lower the input signal the more ON the negative portion of the waveform becomes.

  4. #4
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    Quote Originally Posted by FRS View Post
    I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.
    You can use division / multiplication. E.g. to crush 2 bits: "v = v / 4 * 4".

  5. #5
    Senior Member
    Join Date
    Dec 2014
    Posts
    134
    division is super processor heavy.

  6. #6
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    9,056
    ..depends..up to 11(?) cycles..for int ...this is not very heavy ;-)

  7. #7
    Senior Member
    Join Date
    Dec 2014
    Posts
    134
    It is compared to all other options like bit shifting or look up tables etc.. especially when you need to do it as quickly as possible to allow as much audio generation/processing as possible in a small chip.

  8. #8
    Senior Member
    Join Date
    Jul 2014
    Posts
    3,316
    Quote Originally Posted by MacroMachines View Post
    It is compared to all other options like bit shifting or look up tables etc.. especially when you need to do it as quickly as possible to allow as much audio generation/processing as possible in a small chip.
    Have a look into the compiler output, my hunch is that the compiler translates division and multiplications by power of two constants into shifts.
    IMO, if you really need speed, it is always good to analyze the results of the compiler. I always generate assembler listings for this purpose.

  9. #9
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    Quote Originally Posted by MacroMachines View Post
    division is super processor heavy.
    Huh??? Decent compilers haven't used division instructions for constants in decades.

    Code:
    int testDiv4(int v) {
        return v / 4;
    }
    
    uint32_t testDiv4u(uint32_t v) {
        return v / 4;
    }
    Disassembly:

    Code:
    00000000 <testDiv4>:
       0:	2800      	cmp	r0, #0
       2:	bfb8      	it	lt
       4:	3003      	addlt	r0, #3
       6:	1080      	asrs	r0, r0, #2
       8:	4770      	bx	lr
       a:	bf00      	nop
    
    00000000 <testDiv4u>:
       0:	0880      	lsrs	r0, r0, #2
       2:	4770      	bx	lr
    They don't for non-power of 2 constants either:
    Code:
    int testDiv7(int v) {
        return v / 7;  
    }
    
    00000000 <testDiv7>:
       0:	4b03      	ldr	r3, [pc, #12]	; (10 <_Z8testDiv7i+0x10>)
       2:	fb83 2300 	smull	r2, r3, r3, r0
       6:	4403      	add	r3, r0
       8:	17c0      	asrs	r0, r0, #31
       a:	ebc0 00a3 	rsb	r0, r0, r3, asr #2
       e:	4770      	bx	lr
      10:	92492493 	.word	0x92492493

  10. #10
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,784
    The audio lib has cycle counting stats you can query as current and worst case CPU usage. I put that in from the very beginning, to make these sorts of trade-offs easy to investigate.

    Of course, disassembling the compiler's output is the best way... if you want to put in that kind of work.

  11. #11
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    4,088

    Cool

    Quote Originally Posted by PaulStoffregen View Post
    The audio lib has cycle counting stats you can query as current and worst case CPU usage. I put that in from the very beginning, to make these sorts of trade-offs easy to investigate.

    Of course, disassembling the compiler's output is the best way... if you want to put in that kind of work.
    But generally even then, you need to have detailed analysis tools to see what is usually going on.

    And trust me, even if you work for the company that produces the chips, there are often little performance gaps that are not mentioned in the manual, including the internal manuals that you need so sign a NDA to read.

    In terms of division/modulus by a constant, if the constant is a power of 2, and the type is an unsigned type, it almost always will generate a shift right or an and immediate. If the machine has a multiply high instruction (nxn multiply that gives you the upper n-bits instead of the lower n-bits, or all 2n-bits in a pair of registers), you can do a multiply high by the reciprocal to do the division. Within GCC, this is mostly controlled by the RTX_COSTS macro where the compiler judges whether to issue the straight forward instruction, or a sequence of instructions to replace the original instruction. However there can be a trap in doing so, if you need a lot more registers to do the replacement instructions, you can cause more register spilling which can cause everything to slow down, if you added the spilling in the hot loop.
    Last edited by MichaelMeissner; 08-15-2016 at 02:51 PM.

  12. #12
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    A little benchmark (loop time in microseconds):
    DivMul: 13686
    Mask: 20531
    Shift: 20533

    Code:
    #include <limits>
    
    __attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_divmul(int16_t v)  {
        return v / 4 * 4;  
    }
    
    __attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_msk(int16_t v) {
        if(v < 0) return ((v-1) | 0x3u) + 1;
        else return v & ~0x3u;
    }
    
    __attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_shift(int16_t v) {
        if(v < 0) return -(-v >> 2 << 2);
        else return v >> 2 << 2;
    }
    
    __attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4() {
        auto start_time = micros();
        for(int16_t i = std::numeric_limits<int16_t>::min();
            i < std::numeric_limits<int16_t>::max();
            i++)
        {
            volatile int16_t res = crush4_divmul(i);
        }
        auto end_time = micros();
        Serial.print("DivMul: ");
        Serial.println(end_time - start_time);
    }
    
    __attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4_msk() {
        auto start_time = micros();
        for(int16_t i = std::numeric_limits<int16_t>::min();
            i < std::numeric_limits<int16_t>::max();
            i++)
        {
            volatile int16_t res = crush4_msk(i);
        }
        auto end_time = micros();
        Serial.print("Mask: ");
        Serial.println(end_time - start_time);
    }
    
    __attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4_shift() {
        auto start_time = micros();
        for(int16_t i = std::numeric_limits<int16_t>::min();
            i < std::numeric_limits<int16_t>::max();
            i++)
        {
            volatile int16_t res = crush4_shift(i);
        }
        auto end_time = micros();
        Serial.print("Shift: ");
        Serial.println(end_time - start_time);
    }
    
    void setup() {
        while(1) {
            b_brush4();
            b_brush4_msk();
            b_brush4_shift();
            delay(5000);
        }
    }
    
    void loop() {
    }

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •