One register used both as input and output?

Kuba0040 · Dec 11, 2021

Hello,
I am trying to get a fast function that will compute this: (a*b>>32)+sum for me: a, b and sum are uint32_t variables. With the help of more people here I have already figured out the a*b>>32 part. Now I want to add the accumulation step. After checking the instruction set for the ARM M7 CPU (Teensy 4.0) I came across the UMLAL instruction. It says that after multiplying Rm*Rs, it adds the result of this multiplication back into the RdLo and RdHi registers as a 64-bit number. Perfect! So, I tried to implement this into my code, and unfortunately it doesn't work.

Code:

static inline uint32_t unsigned_multiply_accumulate_32x32_rshift32(uint32_t out, uint32_t a, uint32_t b)
{
    uint32_t junk; //Just a trash can register to throw in data we don't need
    asm volatile("umlal    %[junk], %[out], %[a], %[b]\n\t"
         : [junk] "=&r" (junk), [out] "=&r" (out)
         : [a] "r" (a), [b] "r" (b));
    return out;
}

This may look weird but what we do here is first we set RdHi to the number we want to accumulate, then we multiply a*b which gives us a 64-bit result and it gets added into RdLo and RdHi. Then we return just RdHi, effectively shifting the result by >>32. Thats why we also add our accumulate value into RdHi and not RdLo. This is so it doesn't get shifted away.

I hope that made sense.
Now the issue I am having is that I don't know how to specify (I do it in the "r" and "=&r" bits) that I am using a register both as an input and then later writing to it. Is this even possible, or am I too far down the rabbit hole?
Thank You for the help.

MarkT · Dec 11, 2021

From utility/dspinst.h in the Audio library:

Code:

// // computes sum += ((a[15:0] * b[15:0]) + (a[31:16] * b[31:16]))
static inline int64_t multiply_accumulate_16tx16t_add_16bx16b(int64_t sum, uint32_t a, uint32_t b)
{
	asm volatile("smlald %Q0, %R0, %1, %2" : "+r" (sum) : "r" (a), "r" (b));
	return sum;
}

you just need to switch smlal for umlal I think, and change the type of sum.

UhClem · Dec 11, 2021

It has been a while since I have looked in this particular rabbit hole, so I have forgotten most of what I learned.

Pretty much everything I learned, I learned here: https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#ss5.3

It includes an example of using a register as both input and output.

One register used both as input and output?

Kuba0040

Well-known member

MarkT

Well-known member

UhClem

Well-known member