Optimized function for array multiplication

Lesept

Well-known member
Hi
Is there an optimized function for the element-wise multiplication of 2 arrays (dot product)?
The ESP32 provides one: ESP32 optimized dot product. I guess this kind of function os used in Audio processing applications.

More generally, is there somewhere an extensive description of the Teensy-specific API?
For example, I was looking for the checkStaticMemory() function (seen here) but couldn't find how to use it.

Thanks for your help.
 
The Teensy is an ARM based product, a quick search on this board indicates that the the ARM CMSIS DSP libraries should be available for you to use.
Within those it looks like something like arm_dot_prod_f32 would do what you want assuming you are using floats. There are equivalent versions for other data types.
 
PJRC Audio library makes extensive use of some ARM DSP for processing twin 16bit data sets in 32bit words. Not sure of the nature of those operations - but as @AndyA notes - it is worth looking into.
 
Searching for arm_dot_prod_q7 shows it does indeed exist. In CMSIS the floating-point and fixed-point types are indicated by f16, f32, f64, q31, q15, q7 in the function name...

But check the documentation, you might not be wanting fixed-point. Still you can look at the code for these to get tips about what optimizes well for ARM architectures.
 
Searching for arm_dot_prod_q7 shows it does indeed exist. In CMSIS the floating-point and fixed-point types are indicated by f16, f32, f64, q31, q15, q7 in the function name...
Thanks, I guess that they mean:
  • f16: float 16 bits
  • f32: float 32 bits
  • f64: float 64 bits
  • q31: signed int 64 bits ?
  • q15: signed int 32 bits ?
  • q7: signed int 8 bits ?
Am I correct?
 
  • f16: float 16 bits
  • f32: float 32 bits (float)
  • f64: float 64 bits (double)
  • q31: int32_t
  • q15: int16_t
  • q7: int8_t
 
Just a remark.
I see on the CMSIS link provided above that the arm_dot_prod_f32 is defined with const arguments:
Code:
void arm_dot_prod_f32 (const float32_t *pSrcA, const float32_t *pSrcB, uint32_t blockSize, float32_t *result)

When I use it and compile my code, I get a warning:
C:\Users\***\Documents\Arduino\Aidge_Teensy_CNN\src\dnn\include\kernels\macs.hpp:19:22: warning: invalid conversion from 'const float*' to 'float32_t*' {aka 'float*'} [-fpermissive]
19 | arm_dot_prod_f32(inputs, weights, sizeof inputs, &temp);
| ^~~~~~
| |
| const float*
In file included from C:\Users\***\Documents\Arduino\Aidge_Teensy_CNN\src\dnn\include\kernels\macs.hpp:5,
from C:\Users\***\Documents\Arduino\Aidge_Teensy_CNN\src\dnn\include\kernels\fullyconnected.hpp:7,
from C:\Users\***\Documents\Arduino\Aidge_Teensy_CNN\src\dnn\src\forward.cpp:27:
C:\Users\***\AppData\Local\Arduino15\packages\teensy\hardware\avr\1.59.0\cores\teensy4/arm_math.h:2644:15: note: initializing argument 1 of 'void arm_dot_prod_f32(float32_t*, float32_t*, uint32_t, float32_t*)'
2644 | float32_t * pSrcA,
| ~~~~~~~~~~~~^~~~~
as if the version found by the linker does not use const variables.
Indeed, when I look in the arm_math.h file, I find:
Code:
/**
   * @brief Dot product of floating-point vectors.
   * @param[in]  pSrcA      points to the first input vector
   * @param[in]  pSrcB      points to the second input vector
   * @param[in]  blockSize  number of samples in each vector
   * @param[out] result     output result returned here
   */
  void arm_dot_prod_f32(
  float32_t * pSrcA,
  float32_t * pSrcB,
  uint32_t blockSize,
  float32_t * result);
No const here.

So, are the versions in the Teensy core older versions?
 
Thanks for the answer.
Using these functions enables a great speed gain, so it would be worth the effort of using the latest versions. How can I do this?
 
Thanks for the answer.
Using these functions enables a great speed gain, so it would be worth the effort of using the latest versions. How can I do this?

Did you benchmark the new versions? I'm curious how you know they are faster. I wouldn't have expected them to be very different.
 
Back
Top