I tried to measure the max. speed I can toggle a GPIO pin (full speed, endless loop).
Strange is:
Here the code I use to test (just the loop: configure pin as output first):
The results I get, with nominal 600 MHz MCU core clock speed:
BTW: with overclocking the MCU, e.g. 800 MHz - I get 201 MHz GPIO toggling frequency.
This is a huge difference! (my file with GPIO function as part of project is way faster).
My assumption
The only explanation I can come up with:
The function "digitalWrite()" used to toggle a GPIO pin comes from the LIB.
My function "GPIO_setOutValue()" as part of my project (a source code file in my sketch) is very similar, not really so different in terms of instructions to do.
But the difference could be:
No idea how to confirm (OK, have a look at the generated *.MAP or *.LST file).
I know, the MCU has just a very tiny internal flash, mainly used for the bootloader (and I assume, internal flash is never overwritten). All the code sits
in an external flash memory device: the bootloader executes from there or loads some pieces of code into internal ITCM (which is fastest speed).
I think I found a statement like: "the code of your sketch is loaded into ITCM", but it can mean as well: "the code called from sketch, in LIBs, is still located on external slow flash memory".
But which code is loaded into ITCM and which not, instead executed via very slow external interface (to external flash ROM)? - no idea.
How to control which code should be loaded first to ITCM and executed from there? (including code from LIB)
(I see different code addresses in generated files: so, part of code is internal (copied), another still external (fetched via slow interface).
Or - worst case: the ICache is not enabled, not configured (MPU) for code executed on external flash memory. (how to confirm ICache for external code location is enabled?)
Any idea how to control the use of internal ITCM vs. external flash memory (e.g. using __attribute__(()) )?
Which code coming from a LIB is still external?
Are the ICache and DCache enabled (configured = MPU) for external code/data locations?
Why this dramatic speed difference? (when toggling a GPIO pin with two similar functions)
Strange is:
- if I use the LIB functions - 10 times SLOWER
- if I implement my own, similar function - more as 10 times FASTER
Here the code I use to test (just the loop: configure pin as output first):
Code:
/* helper function to set GPIO Output register before configuring mode */
void GPIO_setOutValue(uint8_t pin, uint8_t val)
{
const struct digital_pin_bitband_and_config_table_struct *p;
uint32_t mask;
if (pin >= CORE_NUM_DIGITAL) return;
p = digital_pin_to_info_PGM + pin;
mask = p->mask;
// pin is configured for output mode
if (val) {
*(p->reg + 0x21) = mask; // set register
} else {
*(p->reg + 0x22) = mask; // clear register
}
}
void GPIO_testSpeed(void) {
#if 1
/* this is 10x faster! assuming, this code runs on ITCM */
while (1) {
GPIO_setOutValue(32, arduino::HIGH);
GPIO_setOutValue(32, arduino::LOW);
}
#else
/* this is 10x slower! assuming the function sits on external flash (and is not cached or running full speed) */
while (1) {
digitalWrite(32, arduino::HIGH);
digitalWrite(32, arduino::LOW);
}
#endif
}
The results I get, with nominal 600 MHz MCU core clock speed:
Code:
LIB code My code
---------------------------------------------------------
11.11 MHz 149.7 MHz
BTW: with overclocking the MCU, e.g. 800 MHz - I get 201 MHz GPIO toggling frequency.
This is a huge difference! (my file with GPIO function as part of project is way faster).
My assumption
The only explanation I can come up with:
The function "digitalWrite()" used to toggle a GPIO pin comes from the LIB.
My function "GPIO_setOutValue()" as part of my project (a source code file in my sketch) is very similar, not really so different in terms of instructions to do.
But the difference could be:
- all the LIB code functions - sit in external flash and are executed from there
- my sketch code, code in my own code files, are copied from flash to ITCM and executed afterwards in ITCM (no latency)
No idea how to confirm (OK, have a look at the generated *.MAP or *.LST file).
I know, the MCU has just a very tiny internal flash, mainly used for the bootloader (and I assume, internal flash is never overwritten). All the code sits
in an external flash memory device: the bootloader executes from there or loads some pieces of code into internal ITCM (which is fastest speed).
I think I found a statement like: "the code of your sketch is loaded into ITCM", but it can mean as well: "the code called from sketch, in LIBs, is still located on external slow flash memory".
But which code is loaded into ITCM and which not, instead executed via very slow external interface (to external flash ROM)? - no idea.
How to control which code should be loaded first to ITCM and executed from there? (including code from LIB)
(I see different code addresses in generated files: so, part of code is internal (copied), another still external (fetched via slow interface).
Or - worst case: the ICache is not enabled, not configured (MPU) for code executed on external flash memory. (how to confirm ICache for external code location is enabled?)
Any idea how to control the use of internal ITCM vs. external flash memory (e.g. using __attribute__(()) )?
Which code coming from a LIB is still external?
Are the ICache and DCache enabled (configured = MPU) for external code/data locations?
Why this dramatic speed difference? (when toggling a GPIO pin with two similar functions)