Bitcrush library is not correct

Status
Not open for further replies.

FRS

Active member
I just wanted to point out that the Bitcrush library is incorrect, due to the way bits are shifted right and then left to truncate the lower bits.

The problem with this method is that although the variable is an unsigned int, the value is still 2's compliment. So when the sign bit is set to 1, 1's should be entered into the unused bits instead of 0's.

As the library currently is, if no signal is feeding the bitcrusher, the more bits that are reduced the louder the noise becomes.

I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.
 
I just wanted to point out that the Bitcrush library is incorrect, due to the way bits are shifted right and then left to truncate the lower bits.

The problem with this method is that although the variable is an unsigned int, the value is still 2's compliment. So when the sign bit is set to 1, 1's should be entered into the unused bits instead of 0's.

As the library currently is, if no signal is feeding the bitcrusher, the more bits that are reduced the louder the noise becomes.

I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.

I know Bitcrush only from Wikipedia, but if your observation is correct, why not simply applying the shifts to signed integers?
 
Last edited:
I can only assume that the author of the bitcrush library was trying to simplify the code by shifting bits right and then left to clear out unused bits. In their comments an unsigned int is used so the sign bit is not bringing in ones when shifting right.

If you only shift right/left when the sign bit is set, then you are only affecting the "-" portion of the waveform, and you would still be bringing in 0's instead of 1's when shifting left.

Rather than shift right/left to remove the unused bits, a better solution is to use a for-loop to clear/set bits one at a time from bit 0 to bit ? defined by the crushBit value. Also testing first if sampleSquidge bit 15 is 0, then clear the unused bits, else, set the unused bits.

Notice that by throwing away LSB's using a process like bitcrush, you create a sort of noise gate for any lower level signals as well.
But as the library currently is, the lower the input signal the more ON the negative portion of the waveform becomes.
 
I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.
You can use division / multiplication. E.g. to crush 2 bits: "v = v / 4 * 4".
 
It is compared to all other options like bit shifting or look up tables etc.. especially when you need to do it as quickly as possible to allow as much audio generation/processing as possible in a small chip.
 
It is compared to all other options like bit shifting or look up tables etc.. especially when you need to do it as quickly as possible to allow as much audio generation/processing as possible in a small chip.

Have a look into the compiler output, my hunch is that the compiler translates division and multiplications by power of two constants into shifts.
IMO, if you really need speed, it is always good to analyze the results of the compiler. I always generate assembler listings for this purpose.
 
division is super processor heavy.
Huh??? Decent compilers haven't used division instructions for constants in decades.

Code:
int testDiv4(int v) {
    return v / 4;
}

uint32_t testDiv4u(uint32_t v) {
    return v / 4;
}

Disassembly:

Code:
00000000 <testDiv4>:
   0:	2800      	cmp	r0, #0
   2:	bfb8      	it	lt
   4:	3003      	addlt	r0, #3
   6:	1080      	asrs	r0, r0, #2
   8:	4770      	bx	lr
   a:	bf00      	nop

00000000 <testDiv4u>:
   0:	0880      	lsrs	r0, r0, #2
   2:	4770      	bx	lr

They don't for non-power of 2 constants either:
Code:
int testDiv7(int v) {
    return v / 7;  
}

00000000 <testDiv7>:
   0:	4b03      	ldr	r3, [pc, #12]	; (10 <_Z8testDiv7i+0x10>)
   2:	fb83 2300 	smull	r2, r3, r3, r0
   6:	4403      	add	r3, r0
   8:	17c0      	asrs	r0, r0, #31
   a:	ebc0 00a3 	rsb	r0, r0, r3, asr #2
   e:	4770      	bx	lr
  10:	92492493 	.word	0x92492493
 
The audio lib has cycle counting stats you can query as current and worst case CPU usage. I put that in from the very beginning, to make these sorts of trade-offs easy to investigate.

Of course, disassembling the compiler's output is the best way... if you want to put in that kind of work.
 
The audio lib has cycle counting stats you can query as current and worst case CPU usage. I put that in from the very beginning, to make these sorts of trade-offs easy to investigate.

Of course, disassembling the compiler's output is the best way... if you want to put in that kind of work.

But generally even then, you need to have detailed analysis tools to see what is usually going on.

And trust me, even if you work for the company that produces the chips, there are often little performance gaps that are not mentioned in the manual, including the internal manuals that you need so sign a NDA to read.;)

In terms of division/modulus by a constant, if the constant is a power of 2, and the type is an unsigned type, it almost always will generate a shift right or an and immediate. If the machine has a multiply high instruction (nxn multiply that gives you the upper n-bits instead of the lower n-bits, or all 2n-bits in a pair of registers), you can do a multiply high by the reciprocal to do the division. Within GCC, this is mostly controlled by the RTX_COSTS macro where the compiler judges whether to issue the straight forward instruction, or a sequence of instructions to replace the original instruction. However there can be a trap in doing so, if you need a lot more registers to do the replacement instructions, you can cause more register spilling which can cause everything to slow down, if you added the spilling in the hot loop.
 
Last edited:
A little benchmark (loop time in microseconds):
DivMul: 13686
Mask: 20531
Shift: 20533

Code:
#include <limits>

__attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_divmul(int16_t v)  {
    return v / 4 * 4;  
}

__attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_msk(int16_t v) {
    if(v < 0) return ((v-1) | 0x3u) + 1;
    else return v & ~0x3u;
}

__attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_shift(int16_t v) {
    if(v < 0) return -(-v >> 2 << 2);
    else return v >> 2 << 2;
}

__attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4() {
    auto start_time = micros();
    for(int16_t i = std::numeric_limits<int16_t>::min();
        i < std::numeric_limits<int16_t>::max();
        i++)
    {
        volatile int16_t res = crush4_divmul(i);
    }
    auto end_time = micros();
    Serial.print("DivMul: ");
    Serial.println(end_time - start_time);
}

__attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4_msk() {
    auto start_time = micros();
    for(int16_t i = std::numeric_limits<int16_t>::min();
        i < std::numeric_limits<int16_t>::max();
        i++)
    {
        volatile int16_t res = crush4_msk(i);
    }
    auto end_time = micros();
    Serial.print("Mask: ");
    Serial.println(end_time - start_time);
}

__attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4_shift() {
    auto start_time = micros();
    for(int16_t i = std::numeric_limits<int16_t>::min();
        i < std::numeric_limits<int16_t>::max();
        i++)
    {
        volatile int16_t res = crush4_shift(i);
    }
    auto end_time = micros();
    Serial.print("Shift: ");
    Serial.println(end_time - start_time);
}

void setup() {
    while(1) {
        b_brush4();
        b_brush4_msk();
        b_brush4_shift();
        delay(5000);
    }
}

void loop() {
}
 
Status
Not open for further replies.
Back
Top