Teesny 4.0 FPU

orac

Member
As my little project takes shape, I have been learning a lot along the way. Its a bit of a learning curve going from PICAXE basic to C but that was expected.
Anyway, I have been playing with hardware a little bit and am now looking at things like timing and maths. The Teensy 4.0 has an FPU that would like to make use of. Is it a case of, but I am struggling to find a tutorial on how to make use of it.
I did some work a few years ago with the micro mega FPU co-processor which had its own instruction set and was wondering if that was the case here as well.

I have seen some comments about not being able to pass FPU variable into ISRs, which may or may not be an issue, but knowing will allow me to work around the issue if necessary.
Finally, does anyone know how fast/slow it is. The maths I am thinking about doing is going to be time critical and being able to deduct the calculation time for the result would be helpful (there maybe question about cooling at overclocked speeds later).
I do have a basic USB oscilloscope so can some measurements myself but I am not really sure how good it is as it only cost about £30.

Thanks
 
You don't tell which development environment you are using.

If it's Arduino IDE + Teensyduino, you are already using the FPU automatically, no need to take extra steps. The compiler already generates proper FPU instructions for you every time you have "float" or "double" variables/expressions (floats are faster, even on the T4.x).
 
The Teensy 4.0 has an FPU that would like to make use of. Is it a case of, but I am struggling to find a tutorial on how to make use of it.

Just use "float" or "double" variables in your code. You don't need to do anything special. The FPU is used automatically.

There no tutorial because you don't have to do anything other than just use floating point variables in your code.


I have seen some comments about not being able to pass FPU variable into ISRs

You can use floating point within interrupts. It does come with a very minor performance cost, as the processor has to save more registers onto the stack. But that happens automatically as you use them (ARM's term is "lazy stacking"), so you really don't need to worry unless you're writing code that really pushes the performance limits.

Of course, technically speaking, interrupt functions are never passed inputs of any type. They always are defined with no inputs. Usually interrupt functions read static or global variables, which can be any type.


Finally, does anyone know how fast/slow it is.

It's fast.

For 32 bit float, the speed is approximately the same as ordinary 32 bit integer math. Sometimes 8 and 16 bit integers are slower if the compiler has to add logical AND instructions to mask results to 8 or 16 bits, so it's important to compare with native 32 bit integers.

However, the M7 processor has 2 integer execution units but only 1 FPU. So with integer math, depending on lots of complicated factors, you might get 2 integer operations performed per 600 MHz clock cycle. With 32 bit float, you can be sure you'll get at most 1 per clock.

64 bit double (usually) runs at half the speed of 32 bit float. The main gotcha is the compiler assumes constants are double, unless you append "f". The compiler also promotes math to 64 bits if any of the inputs are 64 bit double. So if you want maximum performance and 32 bit float is good enough, remember to write constants like 1.0 as "1.0f". Or course, if using the C library math functions, remember the normal names are 64 but double and ones ending in "f" are 32 bit. So for sine and cosine, call sinf() and cosf().


The maths I am thinking about doing is going to be time critical and being able to deduct the calculation time for the result would be helpful (there maybe question about cooling at overclocked speeds later).
I do have a basic USB oscilloscope so can some measurements myself but I am not really sure how good it is as it only cost about £30.

If you're accustomed to PIC programming, my guess is you'll be quite pleasantly surprised by the performance of Cortex-M7. I don't believe those old PIXAXE parts can run CoreMark... but compare with 8 bit boards like Arduino Mega to get an idea of the incredible difference.

The processor has a 32 bit cycle counter, called ARM_DWT_CYCCNT. The best way to benchmark this sort of code is to read the cycle counter before and after your performance critical math. If necessary, store the difference in a variable, so you can print it to the serial monitor later at a less performance critical time.
 
One quick note, for interrupt routines using static or global variables, be sure to use the volatile keyword on the variables accessed by the interrupt functions. That way the compiler will assume the variables are changed in unknown ways (i.e. in the interrupt function), and to reload the variable instead of keeping it in a register.
 
Thank you for the responses, they are very helpful.
Forgot to mention that at the moment I am using Arduino IDE with Teesnyduino (thinking about trying something else at a later date).
As the final project is going to have at least 2 analogue inputs I started there, this wont even make it into the final project but gives me a good idea on where to go and what to do next.

Code:
  val0 = analogRead(0);                 //read ADC
  val1 = analogRead(1);
  clock1 = ARM_DWT_CYCCNT;
  test = val0 * volt;
  clock2 = ARM_DWT_CYCCNT;
  Serial.print(clock1);
  Serial.print("  :  ");
  Serial.println(clock2);

The ADCs right into a int varable, while test and volt are either float or double.
The cycle counter mentioned by Paul, shows that either float or double, in this case takes 2 cycles. I am assuming that one cycle is taken by writing the cycle count to a variable. If this is correct then as expected this simple sum took just one cycle to complete.

Another quick question, I assume that it will follow the order of operations for equations, or do I need to separate out equations to allow them to execute in the correct order?

Again thank you, this has been very helpful in fleshing out the plans and ideas I have for my current project and gives me insight into what is possible in future ones.
 
You probably already have it, but for others out there 'The C Programming Language' by Kernigham and Ritchie, second edition 1988 pretty much handles these kinds of questions. I still use the 1978 first edition to refresh my ailing brain as the only differences are additional rules added for ANSI C. It always fascinates me reading this book how well these original developers of the C programming language designed and documented something that is as valid now as it was ~ 50 years ago and will probably be the universal low level programming language forever.

To get into understanding the Arduino libraries you need to learn C++.

In the Arduino IDE, for specific Arduino functions right click on something in red print in your code like 'pinMode' and the installed reference library should open.
 
You probably already have it, but for others out there 'The C Programming Language' by Kernigham and Ritchie, second edition 1988 pretty much handles these kinds of questions. I still use the 1978 first edition to refresh my ailing brain as the only differences are additional rules added for ANSI C. It always fascinates me reading this book how well these original developers of the C programming language designed and documented something that is as valid now as it was ~ 50 years ago and will probably be the universal low level programming language forever.

To get into understanding the Arduino libraries you need to learn C++.

In the Arduino IDE, for specific Arduino functions right click on something in red print in your code like 'pinMode' and the installed reference library should open.

I would not recommend the original C manual any more. Too much has changed since it was published. Instead one of the newer books that at least starts with ISO C99 would be a better book.

And note, if you are going back to original sources, the C on UNIX version 6 which was before the K&R book (UNIX version 7) was even more primitive in terms of the C language. There were two AT&T compilers, the original Ritchie compiler, and then the later so-call portable C Compiler by Steve Johnson.

I was on the original ANSI C X3J11 committee that transformed the K&R C into the first C standard (first ANSI C89 which was a USA standard, and then replaced with a few changes with ISO C90 standard which is an international standard). I was one of three people that were at the original meeting, and still on the committee when the C89/C90 standards were ratified, though over time, I represented two different employers (Data General and then Open Software Foundation).

I also implemented a C compiler just based on the K&R book for Data General, and there are several glaring errors in the book that were at odds with the AT&T C compilers it was in theory documenting. The one I remember the most was the syntax chart did not allow you to declare functions that returned pointers to functions (i.e. the signal function in the C standard and UNIX and later LINUX/POSIX).

I haven't been involved with standards since I left OSF. My work with GCC at Cygnus Solutions, and then Red Hat, AMD, and now IBM have primarily been in the back end, and not the front end.
 
Back
Top