Kuba0040
Well-known member
Hello!
Recently, in one of my projects I ran into a situation where I need to quickly multiply two unsigned 32-bit numbers (uint32_t) together and then shift the result 32 bits to the right. There exists a DSP instruction in the ARM M7 (Teensy 4.0) CPU called "smmul" which performs a similar operation however on signed numbers. There is no direct instruction that would compute (uint32_t*uint32_t)>>32. However, there exists an instruction called "umaal" (You can learn more about it here). What it does is multiply two unsigned 32-bit values together and returns a 64-bit result. However, because the ARM M7 is a 32-bit CPU, each register can only hold 32 bits, thus the result is split into a low register (bits 0-31) and a high register (bits 32-63). But if we were to read just the high register and ignore the low one. Then we would achieve exactly what we are looking for: (uint32_t*uint32_t)>>32.
So here are my questions:
How do I implement this instruction in a similar style to how DSP instructions are integrated in the Audio library?
Basically, how do I set which assembly parameter corresponds to my variables? Also, what are the % signs here?
Example:
Secondly, how do I ignore the low register? Are there some registers in the ARM M7 CPU that ignore writes, so that I could maybe put a dummy register here to just throw the useless data into like a trashcan?
Thank you for the help.
Recently, in one of my projects I ran into a situation where I need to quickly multiply two unsigned 32-bit numbers (uint32_t) together and then shift the result 32 bits to the right. There exists a DSP instruction in the ARM M7 (Teensy 4.0) CPU called "smmul" which performs a similar operation however on signed numbers. There is no direct instruction that would compute (uint32_t*uint32_t)>>32. However, there exists an instruction called "umaal" (You can learn more about it here). What it does is multiply two unsigned 32-bit values together and returns a 64-bit result. However, because the ARM M7 is a 32-bit CPU, each register can only hold 32 bits, thus the result is split into a low register (bits 0-31) and a high register (bits 32-63). But if we were to read just the high register and ignore the low one. Then we would achieve exactly what we are looking for: (uint32_t*uint32_t)>>32.
So here are my questions:
How do I implement this instruction in a similar style to how DSP instructions are integrated in the Audio library?
Basically, how do I set which assembly parameter corresponds to my variables? Also, what are the % signs here?
Example:
Code:
// computes (((int64_t)a[31:0] * (int64_t)b[31:0]) >> 32)
static inline int32_t multiply_32x32_rshift32(int32_t a, int32_t b) __attribute__((always_inline, unused));
static inline int32_t multiply_32x32_rshift32(int32_t a, int32_t b)
{
#if defined (__ARM_ARCH_7EM__)
int32_t out;
asm volatile("smmul %0, %1, %2" : "=r" (out) : "r" (a), "r" (b));
return out;
#elif defined(KINETISL)
return ((int64_t)a * (int64_t)b) >> 32;
#endif
}
Secondly, how do I ignore the low register? Are there some registers in the ARM M7 CPU that ignore writes, so that I could maybe put a dummy register here to just throw the useless data into like a trashcan?
Thank you for the help.
Last edited: