ARM processors' "Saturated arithmetic"

stevech · Nov 26, 2014

Old to many, new to me, is the ARM processors' "Saturated arithmetic" machine instructions for integers. I wonder if compilers could use these instructions if the variables were so-declared.
http://en.wikipedia.org/wiki/Saturation_arithmetic
http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/CIHDDJGG.html

PaulStoffregen · Nov 26, 2014

No. Michael posted the gcc extension some time ago. I tried, but the gcc we're using does not support it.

stevech · Nov 26, 2014

Is the purpose of Saturated Arithmetic, in a software sense, to allow someone to do signed and unsigned integer math with wanton abandon, knowing that over and underflows just yield the max?

I don't see why I would want that, but it must have some wide appeal to become ARM instruction set members.

mxxx · Nov 27, 2014

it says in the wikipedia article in post #1: mostly DSP stuff, where you want to avoid things to wrap around. makes sense to me.

i think i've used arm_add_q15 etc before for creating waveforms (but that's something different from the 'machine instructions' i suppose).

PaulStoffregen · Nov 27, 2014

stevech said:
Is the purpose of Saturated Arithmetic, in a software sense, to allow someone to do signed and unsigned integer math with wanton abandon, knowing that over and underflows just yield the max?

Usually it's used with signed integers.

The way it's implement on ARM Cortex-M4 is pretty much the opposite of allowing you to do computations with wanton abandon. DSP chips are made that way. ARM's saturating support is limited, which requires careful planning and optimization to make really effective use of it.

I don't see why I would want that, but it must have some wide appeal to become ARM instruction set members.

It's extremely useful for signal processing algorithms. Saturation is used extensively in the audio library. For example, look for signed_saturate_rshift(), which is just a wrapper around the SSAT instruction.

https://github.com/PaulStoffregen/Audio/blob/master/mixer.cpp#L66

Generally the way it works involves using 32 or even 64 bits for intermediate calculations, with careful planning to avoid overflow, and then you use the saturate instruction at the end to reduce the result to the data size you want. You could achieve the same with a couple conditional checks, if the result is greater than the max output or less than the min (negative) output, but conditional tests take time. The SSAT instruction shifts the result down and sets it to the max if the shifted result is greater than the max, or the min if it's less than the (natative) min, using only a single clock cycle, without a slow change in program flow (that requires the ARM's 3-stage pipeline to refill.

For example, in the state variable filter, all the coefficients are carefully planned so the result, after all the math with 32*32->64 multiplies and shifts, has a worst case can't overflow 32 bits, and the desired result is scaled to exactly 29 of those 32 bits.

https://github.com/PaulStoffregen/Audio/blob/master/filter_variable.cpp#L74

Then at the end, the SSAT instruction is used to discard the extra 13 bits kept during the intermediate calculations, and the 16 bit output automatically saturates if anything ended up in those top 3 bits. I spent a couple solid days writing just those couple dozen lines of code, and about a week on the longer version that recomputes the coefficients on every sample. The DSP math and saturation instructions and other special instructions, and 2-samples-per-register packing, let you achieve dramatically faster signal processing.

ARM's beautiful marketing info makes it sound like there's some sort of magical DSP capability that magically makes some types of integer math faster. It does, but only with a LOT of careful planning and optimization, and of course they gloss over that part about how it's hard work to make use of it.

Well, unless you use an already-written library. Then it's easy. As far as I know, only 3 such libraries exist: the Teensy Audio Library, a similar commercial audio library from a company called DSP Concepts, and the ARM DSP math library, which gives you optimized matrix math functions.

Someday gcc is supposed to get a saturated variable type. But I can't imagine it'll be very efficient on Cortex-M4.

ARM processors' "Saturated arithmetic"

stevech

Well-known member

PaulStoffregen

Well-known member

stevech

Well-known member

mxxx

Well-known member

PaulStoffregen

Well-known member