Linear Algebra Library for Teensy 4.1 - Robotics Project

Status
Not open for further replies.

pablodelgad2

New member
Hello,

I am working on a project to control a 4 degree-of-freedom robot with a Teensy 4.1. I will use the Recursive Newton-Euler Algorithm to estimate the system parameters. However, I want to know if there is a math library that takes full advantage of the Teensy 4.1 for matrix multiplication, since I will be multiplying 4x4 and 6x6 matrices in the algorithm and I want to do it as soon as possible. I am using the Arduino IDE to program the Teensy. I would really appreciate if someone can help me.

Regards,

Pablo Delgado.
 
Take a look at file hardware/Teensy/avr/cores/Teensy4/arm_math.h in your Arduino installation folder. This file contains declarations for functions that are part of the ARM CMSIS DSP library. Among the functions is arm_mat_mult_f32(), which multiplies matrices of 32-bit floating point numbers. There are also functions for 32-bit and 16-bit fixed-point values. Search for the online ARM documentation for CMSIS DSP and example programs. If you haven't already searched this forum, do that, too, using the search tool at the top right of the forum pages.
 
Thank you very much for the suggestion, I will definitely look at it. Perhaps you could answer this question: When the Arduino IDE compile the project, does the compiler look at this functions to take advantage of the Teensy or will I have to use those functions inside my code?
 
Thank you very much for the suggestion, I will definitely look at it. Perhaps you could answer this question: When the Arduino IDE compile the project, does the compiler look at this functions to take advantage of the Teensy or will I have to use those functions inside my code?

I'm not sure if you are replying to my comment about CMSIS DSP, or about @brtaylor's comment on the Eigen library. For CMSIS DSP, the functions are optimized for the particular ARM core you are using. For T4.0/4.1, that is Cortex M7. For T3.5/3.6, it is M4F. I looked at the Eigen webpage, and it is also highly optimized. I don't think you would have to do anything in particular for it to "take advantage" of the Teensy.
 
I'm not sure if you are replying to my comment about CMSIS DSP, or about @brtaylor's comment on the Eigen library. For CMSIS DSP, the functions are optimized for the particular ARM core you are using. For T4.0/4.1, that is Cortex M7. For T3.5/3.6, it is M4F. I looked at the Eigen webpage, and it is also highly optimized. I don't think you would have to do anything in particular for it to "take advantage" of the Teensy.

Correct, my understanding is that Eigen creates highly optimized code, but does not leverage any chip specific instructions like the CMSIS DSP.
 
Thank you very much for both of you. Sorry, I thought I have replied to each one of your comments, I will definitely look at the Eigen library suggested by @brtaylor. I really appreciate the help!
 
Hi,

I just did a quick benchmark of CMSIS arm_mat_mult_f32() for multiplying 4x4 float32 matrices on a T4. In my test setup, the function is about 7% faster than a hard-coded matrix multiplication (both matrices being multiplied being on the stack). It is a bit disappointing but maybe the difference grows when the matrix get larger. I did not try matrix/vector multiplication but I suspect the improvement of the CMSIS method over a hard-coded one will be even lower.
 
@vindar
There is nothing magic about Linear Algebra, so I would not expect that CMSIS library is significant faster than own code. In most cases compiler is smart enough.
If you have not done yet, have a look into the CMSIS source code to see how simple it is done. Very often difference is only loop unwrapping, which compiler can also do, given the right optimisation.
 
@WMXZ
Yes indeed, but I was hoping there was some secret ARM instruction for parallel multiplication that the compiler was not aware of... Apparently not.
 
Status
Not open for further replies.
Back
Top