How to execute code in SRAM (Teensy 3)

Status
Not open for further replies.

mntmn

New member
Hey there,

I'm building a small but usable computer/terminal from a Teensy 3. So far the Teensy directly outputs a (40x40) colored VGA character matrix, interfaces an SD card and I can happily type away on screen using a directly interfaced PS/2 keyboard.

Everything works fine so far, but my goal is to have a little compiler or at least an assembler running on the device itself. I wrote a little hex monitor and ported DARM to the device (http://darm.re/) so I can see what's going on in memory. My challenge now is to figure out how to execute Thumb-2 machine code from RAM.

Here's one of my failing approaches (the CPU freezes, I don't know what it's state is at that point because I don't have JTAG yet):

Code:
volatile uint8_t testAsmFunc[] = {
  0x70, 0x47, // 4770 BX LR
  0x00, 0xbf, // BF00 NOP
  0x70, 0x47, // 4770 BX LR
  0x00, 0xbf, // BF00 NOP
  0x70, 0x47, // 4770 BX LR
  0x00, 0xbf, // BF00 NOP     -- ignore the asm below, it shouldn't be reached anyway right now --
  0x4f, 0xf0, 0x00, 0x50, // F04F 5000 MOV R0, #0x20000000
  0x4f, 0xf4, 0x40, 0x71, // F44F 7140 MOV R1, #768
  0x08, 0x44, // 4408 ADD R0, R0, R1
  0x4f, 0xf0, 0x03, 0x01, // F04F 0103 MOV R1, #3 <-- char
  0x01, 0x60, // STR R1, [R0] 
  0x70, 0x47, // BX LR
};

void (*testFuncPtr)() = (void(*)())(testAsmFunc);

void executeAsm() {
  testFuncPtr();
}

void executeAsm2() {
  asm("mov r0, #0x20000000\nsub r0, r0, #474\nblx r0\n");
}

So in the first executeAsm() I just try to let GCC generate the code to execute the instructions in testAsmFunc[]. Disassembled, this looks like:

Code:
00000000 <_Z10executeAsmv>:
   0:	b508      	push	{r3, lr}
   2:	4b02      	ldr	r3, [pc, #8]	; (c <_Z10executeAsmv+0xc>)
   4:	681b      	ldr	r3, [r3, #0]
   6:	4798      	blx	r3
   8:	bd08      	pop	{r3, pc}
   a:	bf00      	nop
   c:	00000000 	.word	0x00000000

Which looks kinda OK to me, but I'm not an expert (yet). The expected result is that nothing special should happen because the CPU should just see the BX LR and return.

In executeAsm2(), I just tried to load the target address manually and branch to it (I dumped the actual address from hardware before). I know this is not a feasible approach, though.

So my real question boils down to:

What is the proper way to jump to a block of memory that contains Thumb-2 instructions and come back safely, in C?

I also read that to execute Thumb-2 code, it should be aligned on odd addresses. But the Cortex-M4 only knows about Thumb/-2 instructions anyway, am I right (no ARM mode)? Does this matter?

Any help or pointers in the right direction are greatly appreciated.

Best,
Lukas
 
At least one problem is the use of uint8_t. Thumb instructions must be aligned to 16 bit boundaries. You should use an array of uint16_t, so the compiler will align the data properly in memory.

There's a tiny bit of ASM code executed from RAM in hardware/teensy/cores/teensy3/eeprom.c. Maybe looking at that known-good example might help?

Edit: Also, something to notice in that example is setting the LSB of the function's address (and a bunch of C type casting to turn the array address to an integer, then logical or to set the LSB, then more type casting to turn that to a function pointer of the correct input & output). The uint16_t array will be 16 bit aligned, so the LSB is always 0. But you must set the LSB to 1 because the Cortex-M executes in Thumb mode. It doesn't support the other ARM mode, but for "compatibility" with other ARM chips, it will generate a fault exception if you have a zero in the LSB. Normally the compiler handles these arcane details automatically, but if you're going to fill an array in RAM with instructions and branch do them, you must handle this stuff manually. Also notice the source code for those 8 instructions is at the end of the file. That code in eeprom.c definitely does work, so use it to get started.
 
Last edited:
The following code works for me, nothing happens, but there is no crash :)
First the code is 16 bit aligned, then later testFuncPtr has bit 0 set to 1.

Code:
volatile uint8_t testAsmFunc[] __attribute__ ((aligned (2))) = {
  0x70, 0x47, // 4770 BX LR
  0x00, 0xbf, // BF00 NOP
  0x70, 0x47, // 4770 BX LR
  0x00, 0xbf, // BF00 NOP
  0x70, 0x47, // 4770 BX LR
  0x00, 0xbf, // BF00 NOP     -- ignore the asm below, it shouldn't be reached anyway right now --
  0x4f, 0xf0, 0x00, 0x50, // F04F 5000 MOV R0, #0x20000000
  0x4f, 0xf4, 0x40, 0x71, // F44F 7140 MOV R1, #768
  0x08, 0x44, // 4408 ADD R0, R0, R1
  0x4f, 0xf0, 0x03, 0x01, // F04F 0103 MOV R1, #3 <-- char
  0x01, 0x60, // STR R1, [R0] 
  0x70, 0x47, // BX LR
};

void (*testFuncPtr)() = (void(*)())((uint32_t) testAsmFunc | 0x1);

void executeAsm() {
  testFuncPtr();
}
 
Hey Paul, mlu,

Thank you very much. It works perfectly now. I think the most important bit was OR-ing the 1. I also switched to uint16_t.

So now I just have to port/write a Thumb-2 assembler to run on the Teensy ;)

Cheers,
Lukas
 
... but my goal is to have a little compiler or at least an assembler running on the device itself. I wrote a little hex monitor and ported DARM to the device (http://darm.re/) so I can see what's going on in memory.
There is/was a port of the Python small memory VM for Teensy2/AVR - one of the pyMite ports.

It would be more than GREAT if a pyMite adaptation was done for the more-resource-capable Teensy3. Just a simple subset. Don't need all the esoteric features in even pyMite.
Besides being appealing to the student world, this would also allow for portable code among PCs and various mid-high end embedded processors, that are not up in the Linux class.
 
So now I just have to port/write a Thumb-2 assembler to run on the Teensy ;)

I hope you'll post about that when it's up and running?

If you post a good write-up on a blog or website, I'll bet sites like Hack-a-Day would love to cover the story....
 
Is there anything like this that would also work to execute from ram?


Code:
__attribute__ ((section(".ramcode")))
void xxx()
{
...
}
 
Without checking the details in the linker file, where the section must be defined
in the block that gets relocated to RAM, I think it should be. If needed I can look it up later tonight.

Code:
__attribute__ ((section(".ramfunc")))
void xxx()
{
...
}

EDIT (checking the actual linker files and it should be .fastrun

Code:
__attribute__ ((section(".fastrun")))
void xxx()
{
...
}
 
Last edited:
Thanks. I understand that this would be useful for two cases. 1) if you are writing to flash (and therefor can't execute from flash) or 2) if you want your code to run slightly faster (maybe).

From other sources, I understand that this may be needed to ensure running from ram:

__attribute__ ((section(".fastrun"), noinline, noclone, optimize("Os")))
 
Last edited:
Status
Not open for further replies.
Back
Top