PDA

View Full Version : And also: what's the type of a sample?



mykle
04-02-2016, 11:57 PM
Hi,

Okay, last dumb question, but I swear I cannot find this in the doc & I'm having a hard time finding it in github either. The 16-bit audio samples pointed to by block->data[] ... what is their type? uint16_t, or int16_t? Or something else? I'm getting bizarre crashes while trying to do straightforward things with samples, and I think it may come down to incorrect type casting.

Thanks for the third time,
-mykle-

WMXZ
04-03-2016, 10:20 PM
Hi,

Okay, last dumb question, but I swear I cannot find this in the doc & I'm having a hard time finding it in github either. The 16-bit audio samples pointed to by block->data[] ... what is their type? uint16_t, or int16_t? Or something else? I'm getting bizarre crashes while trying to do straightforward things with samples, and I think it may come down to incorrect type casting.

Thanks for the third time,
-mykle-

does the following help?
teensy3/core/AudioStream.h:


typedef struct audio_block_struct {
unsigned char ref_count;
unsigned char memory_pool_index;
int16_t data[AUDIO_BLOCK_SAMPLES];
} audio_block_t;

Frank B
04-03-2016, 10:46 PM
oops.....that is VERY old... WMXZ, are you still using such an old lib ? :)
there were many improvements...

https://github.com/PaulStoffregen/cores/blob/master/teensy3/AudioStream.h#L49-L55

mykle
04-04-2016, 05:44 AM
Thanks for the pointer. So it does appear that these are signed int16_t values, which of course makes perfect sense for audio.

Unfortunately, that's what I already thought they were, so this doesn't get me any closer to understanding my problem. :/

The crux of the problem is that this line of code works fine:

block->data[i] = buffer[cursor] = ((s0 + s1) / 2);
While this one, which ought to do about the same thing, causes noise glitches and an eventual crash:

block->data[i] = buffer[cursor] = (int16_t)((s0 + s1) * 0.5);

buffer, s0 & s1 are all declared int16_t . I'm using the Arduino compiler, version 1.6.8, with Teensy add-ons.

The second version, obviously, uses floating point math instead of int math. And int math would be fine enough in this simple case, but I want to make floating point version work in order to do more complex computations in this spot.

Maybe the floating point is just too slow, and somehow this leads the audio engine to crash? But it's just one little multiply ... are there any known issues/bugs with floating point code compiling to the Teensy?

MichaelMeissner
04-04-2016, 06:08 AM
Floating point is emulated on the Teensy. Figure it will take thousands or perhaps 10s of thousands instructions to convert (s0+s1) to floating point, do the multiplication and convert it back to 16-bit integer. If you are using an older version of the IDE, using 0.5 will convert it to double instead of float, which is even slower, since it has to calculate more bits of precision. In the future, the so-called Teensy 3.x++ will have hardware single precision floating point, and the code will have faster.

Note, on a 16-bit machine like a Teensy 2.0, the addition s0+s1 potentially will overflow before the division is done when the calculation is done in 16-bit arithmetic. On a 32-bit machine like an Arm, it should be done in 32-bit arithmetic, and not overflow.

WMXZ
04-04-2016, 08:23 AM
oops.....that is VERY old... WMXZ, are you still using such an old lib ? :)
there were many improvements...

https://github.com/PaulStoffregen/cores/blob/master/teensy3/AudioStream.h#L49-L55

Well, I'm not using this Audio library myself (it is not suited for my application/hardware), so I did not realize that I looked up an old and local version.

I'm not very happy about the way SW updates are distributed (i.e. teensyduino.exe), so I have to do it manually (download from GitHub) with all the risk of work in progress. Consequently, I do it only when bugs are corrected and discussed on the forum.

PaulStoffregen
04-04-2016, 04:04 PM
Figure it will take thousands or perhaps 10s of thousands instructions to convert (s0+s1) to floating point, do the multiplication and convert it back to 16-bit integer.

Actually, about 150 to 160 cycles.

Software computed floating point isn't fast, but it's nowhere near these dire heuristics! Especially on ARM, the single-single 32x32 multiply and single-cycle shift by any number of bits makes implementing floating point fairly efficient.

Here's a quick-and-dirty test, which prints 161 cycles.



void setup() {
ARM_DEMCR = ARM_DEMCR_TRCENA;
ARM_DWT_CTRL = ARM_DWT_CTRL_CYCCNTENA;
}

int16_t s1=4, s2=7;

void loop() {
s1 *= 11;
s2 *= 13;
uint32_t cy1 = ARM_DWT_CYCCNT;
//volatile int s3 = (s1 + s2) / 2;
volatile int s3 = (s1 + s2) * 0.5f;
uint32_t cy2 = ARM_DWT_CYCCNT;
Serial.printf("s3 = %d, cycles = %u\n", s3, cy2 - cy1);
delay(100);
}

MichaelMeissner
04-04-2016, 04:11 PM
Yeah, it was just a guess on my part. I was thinking more if you use double precision (which means you have to do multi-precision integer arithmetic) and do divides rather than multiply. Also, some of the microprocessors I have worked on in the past didn't even have hardware divide.

It is good to see it isn't that bad (at least for add, subtract, and multiply). I wonder if the soft-fp library on ARM is hand tuned or it is the general libgcc/soft-fp library. I also wonder whether it supports full IEEE rounding modes, NaNs, infinities, denormals, signed 0's, that all make IEEE arithmetic 'fun'.

mykle
04-04-2016, 07:07 PM
So I wonder if I should be trying to use floating point or not in this algorithm. All I want to do is average together a set of samples, using an adjustable coefficient for each sample in the set. It's easy to do, and clear to understand, if I multiply each sample by a coefficient between 0 and 1. But that's floating-point, and it causes my firmware to crash.

I could also multiply by an integer between 0 and 100 and then integer divide the result by 100 ... or I could multiply by a coefficient between 0 and 65535 and then right-shift the result 8 times. But at that point I'm basically doing fixed-point math by hand.

Actually, what about fixed-point? Does the Arduino compiler have types for that? Or is there a good fixed-point library? Fixed point might be the ideal compromise here.

At any rate, even if the floating-point is emulated and relatively slow, I still do not get why, in this one simple case, it's failing so spectacularly. I can see how the noise-glitchiness I'm hearing could be caused by floading-point rounding errors or something, but that still doesn't explain why my entire firmware crashes after a half-dozen notes.

PaulStoffregen
04-05-2016, 02:18 AM
I don't know why it's crashing either. Seems like that shouldn't happen. But small details matter, which is why we have the Forum Rule for these sorts of situations. Without the complete code, all anyone can do is say "that really shouldn't happen".

Likewise, that one line change shouldn't cause "noise-glitchiness". There's probably some other subtle detail, which nobody can see or investigate.

The arm_math.h library (aka CMSIS DSP) has lots of stuff for fixed point math. It's not exactly easy to use. I rarely use it.

Cortex-M4 has a fast integer divide instruction, so the savings for right shifting instead of division is small. If you do extra work to be able to later shift, it can even end up slower. When other stuff works out the same (or differs only by constants), right shift is a little faster.

There are also special DSP instructions that do 32x16 multiply to a 48 bit result, with the low 16 bits discarded (effectively a right shift by 16), which all happens in a single cycle. Several objects in the library make use of those to speed things up. I do not know of any easy, automated way to achieve such optimizations.

jonr
04-05-2016, 03:23 AM
I've used fixed point extensively, but I never saw a need for a library. Just remember that after a multiply, the "bin points" add and with a divide, they subtract. Add and subtract have no effect. So in your case, you might use bin 0 AD values multiplied by a bin 8 coefficient yielding a bin 8 result. Sum all of these, divide by the count and then right shift 8 to convert back to bin 0.

On the other hand, if speed isn't an issue, then floating point is convenient.