Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 11 of 11

Thread: How to execute code in SRAM (Teensy 3)

  1. #1
    Junior Member
    Join Date
    Sep 2013
    Location
    Berlin
    Posts
    2

    Question How to execute code in SRAM (Teensy 3)

    Hey there,

    I'm building a small but usable computer/terminal from a Teensy 3. So far the Teensy directly outputs a (40x40) colored VGA character matrix, interfaces an SD card and I can happily type away on screen using a directly interfaced PS/2 keyboard.

    Everything works fine so far, but my goal is to have a little compiler or at least an assembler running on the device itself. I wrote a little hex monitor and ported DARM to the device (http://darm.re/) so I can see what's going on in memory. My challenge now is to figure out how to execute Thumb-2 machine code from RAM.

    Here's one of my failing approaches (the CPU freezes, I don't know what it's state is at that point because I don't have JTAG yet):

    Code:
    volatile uint8_t testAsmFunc[] = {
      0x70, 0x47, // 4770 BX LR
      0x00, 0xbf, // BF00 NOP
      0x70, 0x47, // 4770 BX LR
      0x00, 0xbf, // BF00 NOP
      0x70, 0x47, // 4770 BX LR
      0x00, 0xbf, // BF00 NOP     -- ignore the asm below, it shouldn't be reached anyway right now --
      0x4f, 0xf0, 0x00, 0x50, // F04F 5000 MOV R0, #0x20000000
      0x4f, 0xf4, 0x40, 0x71, // F44F 7140 MOV R1, #768
      0x08, 0x44, // 4408 ADD R0, R0, R1
      0x4f, 0xf0, 0x03, 0x01, // F04F 0103 MOV R1, #3 <-- char
      0x01, 0x60, // STR R1, [R0] 
      0x70, 0x47, // BX LR
    };
    
    void (*testFuncPtr)() = (void(*)())(testAsmFunc);
    
    void executeAsm() {
      testFuncPtr();
    }
    
    void executeAsm2() {
      asm("mov r0, #0x20000000\nsub r0, r0, #474\nblx r0\n");
    }
    So in the first executeAsm() I just try to let GCC generate the code to execute the instructions in testAsmFunc[]. Disassembled, this looks like:

    Code:
    00000000 <_Z10executeAsmv>:
       0:	b508      	push	{r3, lr}
       2:	4b02      	ldr	r3, [pc, #8]	; (c <_Z10executeAsmv+0xc>)
       4:	681b      	ldr	r3, [r3, #0]
       6:	4798      	blx	r3
       8:	bd08      	pop	{r3, pc}
       a:	bf00      	nop
       c:	00000000 	.word	0x00000000
    Which looks kinda OK to me, but I'm not an expert (yet). The expected result is that nothing special should happen because the CPU should just see the BX LR and return.

    In executeAsm2(), I just tried to load the target address manually and branch to it (I dumped the actual address from hardware before). I know this is not a feasible approach, though.

    So my real question boils down to:

    What is the proper way to jump to a block of memory that contains Thumb-2 instructions and come back safely, in C?

    I also read that to execute Thumb-2 code, it should be aligned on odd addresses. But the Cortex-M4 only knows about Thumb/-2 instructions anyway, am I right (no ARM mode)? Does this matter?

    Any help or pointers in the right direction are greatly appreciated.

    Best,
    Lukas

  2. #2
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,579
    At least one problem is the use of uint8_t. Thumb instructions must be aligned to 16 bit boundaries. You should use an array of uint16_t, so the compiler will align the data properly in memory.

    There's a tiny bit of ASM code executed from RAM in hardware/teensy/cores/teensy3/eeprom.c. Maybe looking at that known-good example might help?

    Edit: Also, something to notice in that example is setting the LSB of the function's address (and a bunch of C type casting to turn the array address to an integer, then logical or to set the LSB, then more type casting to turn that to a function pointer of the correct input & output). The uint16_t array will be 16 bit aligned, so the LSB is always 0. But you must set the LSB to 1 because the Cortex-M executes in Thumb mode. It doesn't support the other ARM mode, but for "compatibility" with other ARM chips, it will generate a fault exception if you have a zero in the LSB. Normally the compiler handles these arcane details automatically, but if you're going to fill an array in RAM with instructions and branch do them, you must handle this stuff manually. Also notice the source code for those 8 instructions is at the end of the file. That code in eeprom.c definitely does work, so use it to get started.
    Last edited by PaulStoffregen; 09-30-2013 at 11:40 AM.

  3. #3
    Senior Member
    Join Date
    Aug 2013
    Location
    Gothenburg, Sweden
    Posts
    293
    The following code works for me, nothing happens, but there is no crash
    First the code is 16 bit aligned, then later testFuncPtr has bit 0 set to 1.

    Code:
    volatile uint8_t testAsmFunc[] __attribute__ ((aligned (2))) = {
      0x70, 0x47, // 4770 BX LR
      0x00, 0xbf, // BF00 NOP
      0x70, 0x47, // 4770 BX LR
      0x00, 0xbf, // BF00 NOP
      0x70, 0x47, // 4770 BX LR
      0x00, 0xbf, // BF00 NOP     -- ignore the asm below, it shouldn't be reached anyway right now --
      0x4f, 0xf0, 0x00, 0x50, // F04F 5000 MOV R0, #0x20000000
      0x4f, 0xf4, 0x40, 0x71, // F44F 7140 MOV R1, #768
      0x08, 0x44, // 4408 ADD R0, R0, R1
      0x4f, 0xf0, 0x03, 0x01, // F04F 0103 MOV R1, #3 <-- char
      0x01, 0x60, // STR R1, [R0] 
      0x70, 0x47, // BX LR
    };
    
    void (*testFuncPtr)() = (void(*)())((uint32_t) testAsmFunc | 0x1);
    
    void executeAsm() {
      testFuncPtr();
    }

  4. #4
    Junior Member
    Join Date
    Sep 2013
    Location
    Berlin
    Posts
    2
    Hey Paul, mlu,

    Thank you very much. It works perfectly now. I think the most important bit was OR-ing the 1. I also switched to uint16_t.

    So now I just have to port/write a Thumb-2 assembler to run on the Teensy

    Cheers,
    Lukas

  5. #5
    Senior Member
    Join Date
    Jun 2013
    Location
    So. Calif
    Posts
    2,828
    Quote Originally Posted by mntmn View Post
    ... but my goal is to have a little compiler or at least an assembler running on the device itself. I wrote a little hex monitor and ported DARM to the device (http://darm.re/) so I can see what's going on in memory.
    There is/was a port of the Python small memory VM for Teensy2/AVR - one of the pyMite ports.

    It would be more than GREAT if a pyMite adaptation was done for the more-resource-capable Teensy3. Just a simple subset. Don't need all the esoteric features in even pyMite.
    Besides being appealing to the student world, this would also allow for portable code among PCs and various mid-high end embedded processors, that are not up in the Linux class.

  6. #6
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,579
    Please start a new thread about Python on the Suggestions forum.

    It's important enough for its own thread. Tacked on here (a resolved minor coding issue), it's likely to be lost and forgotten.

  7. #7
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,579
    Quote Originally Posted by mntmn View Post
    So now I just have to port/write a Thumb-2 assembler to run on the Teensy
    I hope you'll post about that when it's up and running?

    If you post a good write-up on a blog or website, I'll bet sites like Hack-a-Day would love to cover the story....

  8. #8
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    253
    Is there anything like this that would also work to execute from ram?


    Code:
    __attribute__ ((section(".ramcode")))
    void xxx()
    {
    ...
    }

  9. #9
    Senior Member
    Join Date
    Aug 2013
    Location
    Gothenburg, Sweden
    Posts
    293
    Without checking the details in the linker file, where the section must be defined
    in the block that gets relocated to RAM, I think it should be. If needed I can look it up later tonight.

    Code:
    __attribute__ ((section(".ramfunc")))
    void xxx()
    {
    ...
    }
    EDIT (checking the actual linker files and it should be .fastrun

    Code:
    __attribute__ ((section(".fastrun")))
    void xxx()
    {
    ...
    }
    Last edited by mlu; 05-27-2015 at 07:10 AM.

  10. #10
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    253
    Thanks. I understand that this would be useful for two cases. 1) if you are writing to flash (and therefor can't execute from flash) or 2) if you want your code to run slightly faster (maybe).

    From other sources, I understand that this may be needed to ensure running from ram:

    __attribute__ ((section(".fastrun"), noinline, noclone, optimize("Os")))
    Last edited by jonr; 05-27-2015 at 02:37 PM.

  11. #11
    Senior Member
    Join Date
    Nov 2012
    Location
    Boston, MA, USA
    Posts
    1,108
    As this 2013 thread has been exhumed, it is worth searching to forum for the FASTRUN feature which has been added meanwhile. An example at
    https://forum.pjrc.com/threads/27690...ll=1#post64142

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •