Hi! Please advise about casting

Status
Not open for further replies.

foxfyre3

New member
Liking the Teensy 4, but I have question about performance
I don't care about memory usage, I only want speed!
1. Does it take time to cast to/from uint16_t, int8_t, etc?
2. Shall I just use int or uint32_t and int32_t? It's a 32-bit processor, right?
3. Can anybody recommend a decompiler for this platform if possible?
 
Does it take time to cast to/from uint16_t, int8_t, etc?

The processor has additional byte / word instructions so if you are lucky the cast doesn't cost anything but I'd rather not rely on that.

Shall I just use int or uint32_t and int32_t? It's a 32-bit processor, right?

Yes, it is a 32bit processor. uint32_t is the native sized integer and should be fastest. c++ also defines the seldom used types:

  • uint_fast8_t
  • uint_fast16_t
  • uint_fast32_t
Depending on the processor, they translate to the fastest type with at least the requested size. For a Teensy 4 all of those translate to uint32_t.

Can anybody recommend a decompiler for this platform if possible?
  • Teensyduino generates a *lst file which is the generated assembly and can be opened with an editor
  • To test short snippets the compiler explorer is very convenient.
  • This online disassembler is also quite useful
See this wiki post https://github.com/TeensyUser/doc/wiki/GCC for corresponding info
 
Thank you so much, you rock!
I will use 32. Does signing affect the speed? uint32_t vs int32_t
Not casting, not trying to use negative numbers with unsigned
 
Last edited:
Not that I'm aware of. Anyway, you should always use the type you need. If the value is signed use a signed type, if not use the unsigned type.

Of course, I don't know what you are intending to do but it is always good to consider this famous quote from D. Knuth:
“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”

Here an interesting article https://stackify.com/premature-optimization-evil/
 
As a general rule, uint32_t is most efficient. If your numbers are always positive, use uint32_t.

Usually there is no speed penalty for int32_t. Any tricks to avoid int32_t will usually cost far more than simply using int32_t.

Likewise, 32 bit floating point is almost as fast as integers due to the FPU. The FPU adds many extra registers that aren't usually used for integers, so float can run surprisingly fast. So if you're working with data that naturally is fractions or decimal numbers, extra code to shoehorn numerical data into integers usually ends up running slower than simply using 32 bit float.

The one exception for float speed is interrupts. The ARM ABI uses "lazy stacking" for the FPU registers, which results in extra pushes and pops to the memory as you use the FPU. Best to keep interrupts to use of integers only.

64 bit double is also implemented by the FPU, but at half the speed and twice the register pressure of 32 bit float. Be careful of the compiler's rules to promote math to 64 bits if you use decimal constants without a trailing "f" to make them only 32 bits.

Usually 8 and 16 bit integers are as fast as 32 bits, but in some cases the compiler must add logical AND instructions. The inputs to functions are one of the most common examples. But in those cases, where your code really is executing a function call so much where the call overhead is substantial, you probably should make it an inline function if you care about performance.

Just to make this already-long post complete, the other special case is the DSP extension instructions, which support packing two 16 bit signed integers into 32 bit registers. The compiler doesn't use these instructions automatically. You have to use inline assembly (or call inline functions with that inline asm) to make use of these very special features. When done very carefully, certain type of signal processing algorithms can run much faster with 16 bit signed integers. The audio library makes extensive use of this technique, if you're interested to see an example.

Donald Knuth's classic "premature optimization" quote is good advice. But you do need to choose your data types somehow on the first pass. Go with uint32_t for all unsigned integers (unless you need 64 bits), int32_t when you need negative numbers, and don't be shy to use 32 bit float when appropriate, since you have a FPU which implements most basic operations in a single cycle.
 
The other cool thing about the T4 is doing two things at once. Things can be done in parallel in short spots when they don't have overlapping variable influences.
 
Status
Not open for further replies.
Back
Top