I am trying to do an unsigned version of this assembler code in the audio library, but I really have no idea how it works.

the code is:
Code:
// computes (((int64_t)a[31:0] * (int64_t)b[31:0]) >> 32)
static inline int32_t multiply_32x32_rshift32(int32_t a, int32_t b) __attribute__((always_inline, unused));
static inline int32_t multiply_32x32_rshift32(int32_t a, int32_t b)
{
#if defined (__ARM_ARCH_7EM__)
	int32_t out;
	asm volatile("smmul %0, %1, %2" : "=r" (out) : "r" (a), "r" (b));
	return out;
#elif defined(KINETISL)
	return ((int64_t)a * (int64_t)b) >> 32;
#endif
}
I do not need the Kinetis part.

is it even worth doing? Or would it be just as fast to use a 64 bit variable and shift it?

I could be wrong, but I think I need to use the umull instead of smmul.

what i naively tried:
change the initializer and inputs to uint32_t,
change smmul to umull,
change out to uint32_t

of course that did not work
what else am i missing here?