Scriptable (C++) API / Linkable Loadable Code at Runtime

svenk · Nov 27, 2023

I have a fairly large and working C++ code base running on the Teensy 4.1, refered to as firmware in the following, communicating with a number of embedded digital and analog circuitery and primarily communicating over a custom simple remote procedure call procotol over TCP/IP. In order to get rid of Ethernet latencies in certain user-defined cases, I want to make the firmware scriptable, i.e. allow users to send small snippets of code which are put in RAM and callbacked when needed.

I am really open for the technology to realize this. Micropython could be a mature option (https://github.com/orgs/micropython/discussions/9607) but would require me to expose the existing C++ API in the firmware to the mpy runtime. Various existing Lua VMs for microcontrollers go in the same direction.

I wonder if I could not load (cross-)compiled C++ code snippets (i.e. the corresponding bytecode) directly. I understand that this is half the way to Over-The-Air firmware flashing, but hopefully without the pain of replacing the full flash image (and also much faster since I don't even want to write my snippets to flash). If the users have exactly the same codebase in a working platformio installation, cleverly extracting the relevant "user customizable" symbols out of the code and sending them to the teensy should do it, isn't it? Issues are probably with PIC if the usercode is slightly more complex then a single subroutine.

I acknowledge that my proposed method is very hacky by nature and a proper microcontroller OS such as zephyr could provide similar things out of the box (i.e. https://docs.zephyrproject.org/latest/samples/subsys/llext/shell_loader/README.html) or, even simpler, just would allow the execution of suitable programs within the OS. However, I want to avoid using a microcontroller OS at this stage.

So here is my question, does anybody have done similar things on Teensy and can share code snippets, suggestions or useful links?

jmarsh · Nov 28, 2023

You would need to at least reprogram the MPU since the ITCM (the RAM where executable code lives) was recently patched to be read-only during regular operation.

svenk · Nov 28, 2023

Thank you! The MPU reconfiguration was also addressed by PaulStoffregen in an old post (for Teensy 3) and this seems possible in a similar way for Teensy 4.1.

After even digging longer in the subject, I found people in this forum trying to archive similar things, for instance executing code which is loaded form RAM or see also PaulStoffregen's amazing 2014 code on the Teensy 3 how to execute code from RAM.

Quite surprisingly, compiling and linking such code seems actually possible but the ICTM/DCTM separation makes it a big show stopper for me, given that we cannot execute bytecode in memory anywhere expect from ICTM, right?

Furthermore, I wonder if (and why) this statement by PaulStoffregen still holds true:

Achieving any sort of integration between the RAM-based code and flash based code will make the arcane syntax of linker scripts and C function pointer type casting look like a walk in the park!

I guess since the topic has been discussed many times and apparently nobody did it (with sharing a code snippet), I will probably opt in for something different (such as the mpy VM).

PS: Something interesting I found externally: RAM-Loaded STM32v7 https://vivonomicon.com/2020/09/10/...-programs-and-using-tightly-coupled-memories/ and https://github.com/WRansohoff/STM32F7_ramloaders – STM32 is not exactly Teensys Cortex M7 and obviously it is also bound to the TCM RAM banks.

jmarsh · Nov 28, 2023

svenk said:
I guess since the topic has been discussed many times and apparently nobody did it (with sharing a code snippet), I will probably opt in for something different (such as the mpy VM).

The short version is that loading/executing dynamic code without any sort of MMU available to remap memory is more trouble than it's worth. You've only got 1MB of RAM total to play with, already split into three separate regions (ITCM, DTCM and RAM) based on the needs of the original program so cramming more code into ITCM can break far too many things.

PaulStoffregen · Nov 29, 2023

svenk said:
Quite surprisingly, compiling and linking such code seems actually possible but the ICTM/DCTM separation makes it a big show stopper for me, given that we cannot execute bytecode in memory anywhere expect from ICTM, right?

When talking of what can versus can not be done, it's important to keep in mind whether the limit is the hardware capability or only the artificial limits imposed by the MPU.

If you configure the MPU differently, the hardware can indeed execute instructions from DTCM (the portion of RAM1 not partitioned as ITCM) or RAM2. I personally have tested both and I can confirm they do work. I haven't tested executing from PSRAM, but I'm pretty sure it is also possible if the MPU were configured to allow it.

The Cortex-M7 MPU's main purpose is to restrict access. Teensy's default startup code configures the MPU to disallow executing code from memory regions normally used for data. This is meant as a proactive security measure which reduces the odds of buffer overflows or other programming mistakes becoming security vulnerabilities. I hope you can understand how this default setting is valuable for most applications?

svenk · Dec 14, 2023

I wrote a demonstrator code running at https://github.com/svenk/teensy-ramloader where I can load statically linked machine code at a given address and I'm really happy with it. I can execute code from the stack and global variables, so if I understand correctly this is not even the ITCM but the DTCM where I load code from. For some reason, I cannot run code from RAM2. I would have assumed that running a function similar to https://github.com/PaulStoffregen/cores/blob/master/teensy4/startup.c#L280 and setting all relevant SCB_MPU_RASR without NOEXEC should do it but apparently it does not.

I guess it will take me reading a bit more the ARMv7 Architecture Reference Manual but I am quite optimistic, everything is working pretty great.

By the way, I don't load ELFs or symbol tables, just bytecode. If people don't include huge libraries, one can go pretty far with a few kB of code.

jmarsh · Dec 14, 2023

RAM2 is configured as cacheable so you would need to:
a) flush the memory from the data cache after filling it with the executable code
b) invalidate the memory in the instruction cache before running it

Scriptable (C++) API / Linkable Loadable Code at Runtime

svenk

New member

jmarsh

Well-known member

svenk

New member

jmarsh

Well-known member

PaulStoffregen

Well-known member

svenk

New member

jmarsh

Well-known member