Teensy 4 ordering of instructions

Status
Not open for further replies.

hemsy

Active member
It's my understanding that the Teensy 4 is capable of running two parallel lines of code. I have a couple questions about that:

1. Does the compiler or the processor automatically manage that?

2. What if I need to control the order in which a couple of instructions execute? For example:

a = SPI0_POPR;
b = SPI0_POPR;

Depending on which instruction is executed first, a and b could be swapped.

Is there a good tutorial anywhere for learning how to make the best use of this kind of processor?
 
It's my understanding that the Teensy 4 is capable of running two parallel lines of code.

It can sometimes execute 2 instructions in the same cycle.

Sometimes 1 line of C or C++ source code becomes a single instruction, but often times the compiler needs to use many instructions to implement a single line.

1. Does the compiler or the processor automatically manage that?

They both do, sort of.

Ultimately the processor manages whether is can run 2 instructions in the same cycle.

The compiler does many transformations and optimizations to your code to try to maximize the number of independent instructions, so the processor can do this as much as possible.


2. What if I need to control the order in which a couple of instructions execute?

This is very rarely an issue in practice. A tremendous amount of work as gone into the design of the processor and the compiler and the software you use, so things are done the proper way by default.

Both the compiler and the processor are designed around rules of which things must be done in order and which can be performed in any order, as long as the final result is achieved.

The processor has 3 different types of memory access. ARM calls the one you want "strongly ordered" (if you decide to search). The processor has has special barrier instructions, which force any pending memory access to complete.

The compiler has a "volatile" keyword for memory access which can not be optimized away. The compiler also has a feature called "memory barrier", which is unfortunately such a similar name to what the hardware uses, and the concept is similar too, but it applied to the compiler's optimizer.


Depending on which instruction is executed first, a and b could be swapped.

a = SPI0_POPR;
b = SPI0_POPR;

Again, in nearly all cases the compiler and processor will just do the right thing.

In this case of hardware registers (though these are registers in Teensy 3.x), the definitions use "volatile" and the memory region they're in is treated as strongly ordered by processor.

But when this code is reading ordinary RAM, both the compiler and processor "know" they may alter the order of memory access. Sometimes the compiler applies pretty amazing optimizations, never even writing the data out to the RAM if it "knows" you will only read it again in certain ways.


Is there a good tutorial anywhere for learning how to make the best use of this kind of processor?

No, there aren't really any simple tutorials, mainly because this sort of thing is almost never an issue for ordinary coding.

You can find quite a lot of info about how the processor really works in the ARM Architecture v7m Reference Manual, document number DDI0403E (easiest way to search). But this is not easy to read. It's rather arcane details that few people really need to read.

Likewise, the gcc compiler has extensive documentation. But again, much of it is in pretty advanced reference manual format rather than easy to follow tutorials.
 
Status
Not open for further replies.
Back
Top