PDA

View Full Version : Bitcrush library is not correct



FRS
08-08-2016, 04:13 PM
I just wanted to point out that the Bitcrush library is incorrect, due to the way bits are shifted right and then left to truncate the lower bits.

The problem with this method is that although the variable is an unsigned int, the value is still 2's compliment. So when the sign bit is set to 1, 1's should be entered into the unused bits instead of 0's.

As the library currently is, if no signal is feeding the bitcrusher, the more bits that are reduced the louder the noise becomes.

I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.

WMXZ
08-08-2016, 04:42 PM
I just wanted to point out that the Bitcrush library is incorrect, due to the way bits are shifted right and then left to truncate the lower bits.

The problem with this method is that although the variable is an unsigned int, the value is still 2's compliment. So when the sign bit is set to 1, 1's should be entered into the unused bits instead of 0's.

As the library currently is, if no signal is feeding the bitcrusher, the more bits that are reduced the louder the noise becomes.

I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.

I know Bitcrush only from Wikipedia, but if your observation is correct, why not simply applying the shifts to signed integers?

FRS
08-08-2016, 05:02 PM
I can only assume that the author of the bitcrush library was trying to simplify the code by shifting bits right and then left to clear out unused bits. In their comments an unsigned int is used so the sign bit is not bringing in ones when shifting right.

If you only shift right/left when the sign bit is set, then you are only affecting the "-" portion of the waveform, and you would still be bringing in 0's instead of 1's when shifting left.

Rather than shift right/left to remove the unused bits, a better solution is to use a for-loop to clear/set bits one at a time from bit 0 to bit ? defined by the crushBit value. Also testing first if sampleSquidge bit 15 is 0, then clear the unused bits, else, set the unused bits.

Notice that by throwing away LSB's using a process like bitcrush, you create a sort of noise gate for any lower level signals as well.
But as the library currently is, the lower the input signal the more ON the negative portion of the waveform becomes.

tni
08-09-2016, 11:09 PM
I've created my own fix for this by testing the sign bit, then either clearing or setting the bits that should be ignored.
You can use division / multiplication. E.g. to crush 2 bits: "v = v / 4 * 4".

MacroMachines
08-14-2016, 10:48 PM
division is super processor heavy.

Frank B
08-14-2016, 10:52 PM
..depends..up to 11(?) cycles..for int ...this is not very heavy ;-)

MacroMachines
08-14-2016, 11:04 PM
It is compared to all other options like bit shifting or look up tables etc.. especially when you need to do it as quickly as possible to allow as much audio generation/processing as possible in a small chip.

WMXZ
08-15-2016, 05:51 AM
It is compared to all other options like bit shifting or look up tables etc.. especially when you need to do it as quickly as possible to allow as much audio generation/processing as possible in a small chip.

Have a look into the compiler output, my hunch is that the compiler translates division and multiplications by power of two constants into shifts.
IMO, if you really need speed, it is always good to analyze the results of the compiler. I always generate assembler listings for this purpose.

tni
08-15-2016, 09:34 AM
division is super processor heavy.
Huh??? Decent compilers haven't used division instructions for constants in decades.



int testDiv4(int v) {
return v / 4;
}

uint32_t testDiv4u(uint32_t v) {
return v / 4;
}


Disassembly:



00000000 <testDiv4>:
0: 2800 cmp r0, #0
2: bfb8 it lt
4: 3003 addlt r0, #3
6: 1080 asrs r0, r0, #2
8: 4770 bx lr
a: bf00 nop

00000000 <testDiv4u>:
0: 0880 lsrs r0, r0, #2
2: 4770 bx lr


They don't for non-power of 2 constants either:


int testDiv7(int v) {
return v / 7;
}

00000000 <testDiv7>:
0: 4b03 ldr r3, [pc, #12] ; (10 <_Z8testDiv7i+0x10>)
2: fb83 2300 smull r2, r3, r3, r0
6: 4403 add r3, r0
8: 17c0 asrs r0, r0, #31
a: ebc0 00a3 rsb r0, r0, r3, asr #2
e: 4770 bx lr
10: 92492493 .word 0x92492493

PaulStoffregen
08-15-2016, 03:15 PM
The audio lib has cycle counting stats you can query as current and worst case CPU usage. I put that in from the very beginning, to make these sorts of trade-offs easy to investigate.

Of course, disassembling the compiler's output is the best way... if you want to put in that kind of work.

MichaelMeissner
08-15-2016, 03:44 PM
The audio lib has cycle counting stats you can query as current and worst case CPU usage. I put that in from the very beginning, to make these sorts of trade-offs easy to investigate.

Of course, disassembling the compiler's output is the best way... if you want to put in that kind of work.

But generally even then, you need to have detailed analysis tools to see what is usually going on.

And trust me, even if you work for the company that produces the chips, there are often little performance gaps that are not mentioned in the manual, including the internal manuals that you need so sign a NDA to read.;)

In terms of division/modulus by a constant, if the constant is a power of 2, and the type is an unsigned type, it almost always will generate a shift right or an and immediate. If the machine has a multiply high instruction (nxn multiply that gives you the upper n-bits instead of the lower n-bits, or all 2n-bits in a pair of registers), you can do a multiply high by the reciprocal to do the division. Within GCC, this is mostly controlled by the RTX_COSTS macro where the compiler judges whether to issue the straight forward instruction, or a sequence of instructions to replace the original instruction. However there can be a trap in doing so, if you need a lot more registers to do the replacement instructions, you can cause more register spilling which can cause everything to slow down, if you added the spilling in the hot loop.

tni
08-15-2016, 03:58 PM
A little benchmark (loop time in microseconds):
DivMul: 13686
Mask: 20531
Shift: 20533



#include <limits>

__attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_divmul(int16_t v) {
return v / 4 * 4;
}

__attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_msk(int16_t v) {
if(v < 0) return ((v-1) | 0x3u) + 1;
else return v & ~0x3u;
}

__attribute__((optimize("O2"))) __attribute__((always_inline)) int16_t crush4_shift(int16_t v) {
if(v < 0) return -(-v >> 2 << 2);
else return v >> 2 << 2;
}

__attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4() {
auto start_time = micros();
for(int16_t i = std::numeric_limits<int16_t>::min();
i < std::numeric_limits<int16_t>::max();
i++)
{
volatile int16_t res = crush4_divmul(i);
}
auto end_time = micros();
Serial.print("DivMul: ");
Serial.println(end_time - start_time);
}

__attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4_msk() {
auto start_time = micros();
for(int16_t i = std::numeric_limits<int16_t>::min();
i < std::numeric_limits<int16_t>::max();
i++)
{
volatile int16_t res = crush4_msk(i);
}
auto end_time = micros();
Serial.print("Mask: ");
Serial.println(end_time - start_time);
}

__attribute__((optimize("O2"))) __attribute__((noinline)) void b_brush4_shift() {
auto start_time = micros();
for(int16_t i = std::numeric_limits<int16_t>::min();
i < std::numeric_limits<int16_t>::max();
i++)
{
volatile int16_t res = crush4_shift(i);
}
auto end_time = micros();
Serial.print("Shift: ");
Serial.println(end_time - start_time);
}

void setup() {
while(1) {
b_brush4();
b_brush4_msk();
b_brush4_shift();
delay(5000);
}
}

void loop() {
}