Is it possible to boot from SD Card?

Status
Not open for further replies.
Is it possible to boot from the SD card?
Has anyone done this?
Does anyone know how to make this work?
Thanks,
John

There is no support in the current tools.

There have been people over the years that have tried to do it, and have met with varying levels of success. Mostly though I don't see them after a bit. Perhaps their method works, and they don't need to improve on it, perhaps they left Teensy for something else, perhaps it is just too complicated, and they gave up.

I could imagine ways to do it, by having a sketch that has a tiny kernel that copies data from the sd card into their respective locations, and possibly updates the flash memory so you don't need the SD card the next time. You would have to make sure that the kernel is not affected by the move. It would be somewhat easier if you had your own linker script, to link the update code in certain locations to avoid the kernel copying code from the SD.

But if your goal is to be able to support updates, one method is to use the CircuitPython that is available for Teensy 4.0 and 4.1. Instead of programming in Arduino, you would have to program in CircuitPython, but CP does act as a removable disk, and you can edit the source on the fly (or replace it). Of course the problem is, the python code is plain text.
 
Thanks Michael. My thought was to read the image from SD copy it to RAM and run out of RAM. This would save wear and tear on the FLASH and SD cards are pretty cheap, especially when debugging requires so many code iterations. Has anyone been able to run code out of RAM rather than FLASH?
Another reason I'm curious is because I was reading this site: https://www.stupid-projects.com/tensorflow-2-1-0-for-microcontrollers-benchmarks-on-teensy-4-0/ where she was running a tensorflow bench mark on Teensy4 but it showed the Teensy IMXRT1062 @ 600MHz was worse than the over clocked STM32746 @ 288Mhz. Her surmise was that Teensy was worse because it was running out of SPI NOR Flash rather than on chip RAM, although she admits she is not sure of the reason. So it would be interesting if there were other benchmarks to compare execution from Flash against execution from RAM and especially psram. Thoughts anyone?
John
 
All the code on the Teensy 4/4.1 is running from RAM by default. There's a big thread about how exactly the RAM sections on the IMXRT1062 work here: https://forum.pjrc.com/threads/57326-T4-0-Memory-trying-to-make-sense-of-the-different-regions

I beg to differ, if you look at the hex file the load address is in Flash. Also if you look at the linker script files imxrt1062.ld and imxrt1062_t41.ld the .text section is linked into Flash, only the .data sections, .bss sections, heap and stack sections are in RAM. So unless the bootloader relocates the code and copies it to RAM, it will run in Flash. The Flash, even though it is SPI NOR has execute in place (XIP) capability. Can anyone at PJRC clairify this issue i.e. does the bootloader relocate the code and copy it to RAM for execution?
John
 
According to the docs:

https://www.pjrc.com/store/teensy40.html

it is indeed copied to RAM.

-Michael

Thanks for the link. But actually according to the diagram only code labeled FASTRUN __attribute__ ((section(".fastrun") ))
is copied to ram, every thing labeled FLASHMEM __attribute__((section(".flashmem"))) stays in Flash. Also there's a lot more Flash than FASTRUN RAM.
Also I'd still be interested in how much faster code in RAM is compared to FLASH? Nevertheless this info is helpful thanks.
John
 
Can anyone at PJRC clairify this issue

Yes, here is a definitive answer....

i.e. does the bootloader relocate the code and copy it to RAM for execution?

No, the bootloader does not do this. But the startup code (which executes from flash) does indeed copy data from flash to ITCM RAM.

By default, or when FASTRUN is used on functions, they are compiled to run from ITCM and loaded into that section of the flash which gets copied to ITCM at startup.

But not all code runs from ITCM. Functions with FLASHMEM are located only in the flash memory and "execute in place" from the flash. Normally only startup code has FLASHMEM on its functions, since the flash is so much slower in the case of a cache miss. But the M7 processor does have 32K of L1 cache just for instructions, and also 32K of L1 cache for data, so executing from flash isn't always as bad as you might imagine.

So the answer is more complicated than all code running from one type of memory. Most functions run from ITCM, but those declared with FLASHMEM do indeed execute in place directly from flash memory.

One other tangentially related issue is the MPU. The startup code configures DTCM, OCRAM and other memory regions not normally used to execute code with the NOEXEC flag. This is done as a preemptive security measure, so make exploiting buffer overflows harder. While the M7 processor could theoretically execute code from anywhere in the 32 bit address space, we specifically configure the MPU to disallow fetching instructions from anywhere other than ITCM and Flash.

If you want to do something special with memory, don't forget you'll probably need to mess with the MPU config.
 
Last edited:
Also I'd still be interested in how much faster code in RAM is compared to FLASH?

For cached instructions, the speed is the same. The M7 core has a 32K instruction cache which is separate from the 32K data cache, so for most "normal" code the cache allows the flash memory to perform quite well.

But cache misses take hundreds of clock cycles. You definitely do not want latency sensitive code, like interrupt functions, excuting directly from the flash memory.
 
Not that a Problem as it Sounds.
Normally, interrrupts are most likeley in the cache. The Chance that a regulär called interrupt is not in the cache is very very small.
So.. best case is - as ever- just try it.
 
Code:
.text.itcm : {
	. = . + 32; /* MPU to trap NULL pointer deref */
	*(.fastrun)
	*(.text*)
	. = ALIGN(16);
} > ITCM  AT> FLASH
From https://github.com/PaulStoffregen/cores/blob/master/teensy4/imxrt1062.ld

I only glanced at the article you linked, but it sounded like they were using a custom link script and not the default Teensyduino one, in which case they could have been running from flash.

Yep, you are correct she was using the NXP MCUExpresso IDE which by default links everthing to run in flash.
 
Thanks Paul for clearing this up, very helpful. I'm not currently trying to do anything special with memory, just trying to understand how all this works. It looks to me like the linker script puts .text.progmem in flash and text.itcm in itcm ram. How does the compiler assign these attributes to the code without an explicit attribute in the source code itself?
John
 
Thanks Paul for clearing this up, very helpful. I'm not currently trying to do anything special with memory, just trying to understand how all this works. It looks to me like the linker script puts .text.progmem in flash and text.itcm in itcm ram. How does the compiler assign these attributes to the code without an explicit attribute in the source code itself?
John

post #9 above has the general code placement answer for T_4.x's:
By default, or when FASTRUN is used on functions, they are compiled to run from ITCM and loaded into that section of the flash which gets copied to ITCM at startup.

But not all code runs from ITCM. Functions with FLASHMEM are located only in the flash memory and "execute in place" from the flash. Normally only startup code has FLASHMEM on its functions, since the flash is so much slower in the case of a cache miss. But the M7 processor does have 32K of L1 cache just for instructions, and also 32K of L1 cache for data, so executing from flash isn't always as bad as you might imagine.

So the answer is more complicated than all code running from one type of memory. Most functions run from ITCM, but those declared with FLASHMEM do indeed execute in place directly from flash memory.
 
No you didn't answer my question. If I write some code in a .c file and do not use either FASTRUN or FLASHMEM how does the linker know what segment to put it in since I didn't specify anything special? So how does the default work, since I thought the compiler would just assume it was a plain old .text section? Does the linker script just put anything that is just .text in ram? I'm probably not understanding the syntax of the linker script.
 
As suggested there is indeed an .ld file that controls the linker defaults and name options. It is set up as indicated in Paul's reply ... By default ...
 
While I'm a compiler guy, and I haven't looked at the .ld scripts in many years, but here is my take on some of what goes on. Note, I am not familiar with the ARM processors nor have I looked into the Teensy startup code.

The script is at:
  • hardware/teensy/avr/cores/teensy4/imxrt1062_t41.ld

The compiler is told to put each function in a separate section with the '-ffunction-sections' option, and each global variable in a separate section with the '-fdata-sections' option. Unless you use the section attribute, the name of the section used is:
  • Functions: .text.<name>;
  • Read/write data: .data.<name>;
  • Read only (const) data: .rodata.<name>.

The linker is passed the '--gc-sections' option which says to eliminate unused sections, unless the KEEP option is used.

Here is the .ld file:

Set up the memory regions:
Code:
MEMORY
{
        ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
        DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
        RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
        FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 7936K
        ERAM (rwx):  ORIGIN = 0x70000000, LENGTH = 16384K
}

Set up the start address in the ELF image. This is presumably used by the loader.
Code:
ENTRY(ImageVectorTable)

Define the sections, first the sections that are put into the FLASH region of code. This is code from the flash memory chip that is soldered on to the Teensy. You can execute from it, and have data in it. It is slower than the ITCM code that normally the code is copied to, but it is much, much bigger in terms of code space. Notice the use of the KEEP keyword. If you declare functions with the FLASHMEM macro, they will go into flash, but not be copied into ITCM. Similarly if you declare global variables with the PROGMEM macro, they will go into flash, but not be copied into DTCM.

The following labels are defined so the initialization code can copy things from FLASH to ITCM, etc.
  • __preinit_array_start;
  • __preinit_array_end;
  • __init_array_start;
  • __init_array_end.

Code:
SECTIONS
{
        .text.progmem : {
                KEEP(*(.flashconfig))
                FILL(0xFF)
                . = ORIGIN(FLASH) + 0x1000;
                KEEP(*(.ivt))
                KEEP(*(.bootdata))
                KEEP(*(.vectors))
                KEEP(*(.startup))
                *(.flashmem*)
                *(.progmem*)
                . = ALIGN(4);
                KEEP(*(.init))
                __preinit_array_start = .;
                KEEP (*(.preinit_array))
                __preinit_array_end = .;
                __init_array_start = .;
                KEEP (*(.init_array))
                __init_array_end = .;
                . = ALIGN(16);
        } > FLASH

This sets up the ITCM region. The linker links everything as if it is in the ITCM region, but actually puts the data into FLASH and the initialization code then copies it to ITCM.
Code:
        .text.itcm : {
                . = . + 32; /* MPU to trap NULL pointer deref */
                *(.fastrun)
                *(.text*)
                . = ALIGN(16);
        } > ITCM  AT> FLASH

Align the ITCM code to the next 32k boundary, but don't fill up the FLASH memory with all of the 0's to do the alignment. The padding is needed to set the ITCM region to have execute permission.
Code:
        .text.itcm.padding (NOLOAD) : {
                . = ALIGN(32768);
        } > ITCM

Set up the initialized data into the DTCM region. On the Teensy, it appears there is not a separate read-only section for constant data.
Code:
        .data : {
                *(.rodata*)
                *(.data*)
                . = ALIGN(16);
        } > DTCM  AT> FLASH

Set up the uninitialized data (BSS) to be allocated space in DTCM region, but don't allocate space in FLASH to hold it.
Code:
        .bss ALIGN(4) : {
                *(.bss*)
                *(COMMON)
                . = ALIGN(32);
                . = . + 32; /* MPU to trap stack overflow */
        } > DTCM

Set up uninitialized data to the RAM memory region. You would declare these variables with the DMAMEM macro.
Code:
        .bss.dma (NOLOAD) : {
                *(.dmabuffers)
                . = ALIGN(32);
        } > RAM

Set up uninitialized data to the Teensy 4.1's extra psram memory for the 1-2 chips soldered underneath the Teensy. You would declare these variables with the EXTMEM macro:
Code:
        .bss.extram (NOLOAD) : {
                *(.externalram)
        } > ERAM

Set up the labels for various regions of memory:
Code:
        _stext = ADDR(.text.itcm);
        _etext = ADDR(.text.itcm) + SIZEOF(.text.itcm);
        _stextload = LOADADDR(.text.itcm);

        _sdata = ADDR(.data);
        _edata = ADDR(.data) + SIZEOF(.data);
        _sdataload = LOADADDR(.data);

        _sbss = ADDR(.bss);
        _ebss = ADDR(.bss) + SIZEOF(.bss);

        _heap_start = ADDR(.bss.dma) + SIZEOF(.bss.dma);
        _heap_end = ORIGIN(RAM) + LENGTH(RAM);

        _itcm_block_count = (SIZEOF(.text.itcm) + 0x7FFF) >> 15;
        _flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
        _estack = ORIGIN(DTCM) + ((16 - _itcm_block_count) << 15);

        _flashimagelen = SIZEOF(.text.progmem) + SIZEOF(.text.itcm) + SIZEOF(.data);

Set the Teensy model number to identify this as a Teensy 4.1:
Code:
        _teensy_model_identifier = 0x25;

Don't load debug stuff into the flash memory, just put it as sections in the ELF file.
Code:
        .debug_info     0 : { *(.debug_info) }
        .debug_abbrev   0 : { *(.debug_abbrev) }
        .debug_line     0 : { *(.debug_line) }
        .debug_frame    0 : { *(.debug_frame) }
        .debug_str      0 : { *(.debug_str) }
        .debug_loc      0 : { *(.debug_loc) }

}

Here are the declaration macros defined in hardware/teensy/avr/cores/teensy4/avr/pgmspace.h:
Code:
#define DMAMEM __attribute__ ((section(".dmabuffers"), used))
#define FASTRUN __attribute__ ((section(".fastrun") ))
#define PROGMEM __attribute__((section(".progmem")))
#define FLASHMEM __attribute__((section(".flashmem")))
#define EXTMEM __attribute__((section(".externalram")))
 
Thanks Michael, I think I understand now, in the .text.itcm section it includes everything with the .fastrun attribute and everything with the .text* attributes. So I assume the compiler assigns a .text attribute by default to any code that doesn't have another attribute assigned otherwise. I was confused by the section names .text.progmem and .text.itcm, which are just the section names and not what goes in them, what parts of the code that go in each section is specified in the body of the section declaration between the {}. Sorry for my denseness. Perhaps you guys should write a book "Teensy for Dummy's" or some such for such as I. Thanks again for your patient and detailed reply, much appreciated.
John
 
Thanks Michael, I think I understand now, in the .text.itcm section it includes everything with the .fastrun attribute and everything with the .text* attributes. So I assume the compiler assigns a .text attribute by default to any code that doesn't have another attribute assigned otherwise. I was confused by the section names .text.progmem and .text.itcm, which are just the section names and not what goes in them, what parts of the code that go in each section is specified in the body of the section declaration between the {}. Sorry for my denseness. Perhaps you guys should write a book "Teensy for Dummy's" or some such for such as I. Thanks again for your patient and detailed reply, much appreciated.
John

As I said, if you don't have a section attribute, the compiler will create a section name with either ".text.", ".data.", or ".rodata." prefixes before the function name. Note, in C++ (the Arduino language), function names are modified to include a description of the return type and arguments.

So if you have the function:
Code:
int zero (void) { return 0; }

Generally, C++ will call this "_Z4zerov". The section name used would be ".text._Z4zerov".

For details of the script, Paul would have to chime in, since he wrote it. But generally, ".text.progmem" and ".text.itcm" are just ways to collect all of the PROGMEM and ITCM elements. It does mean that if you had a C function named either "progmem" or "itcm" it might cause errors. But since the majority of code is C++, it is likely not much of an issue.
 
Perhaps you guys should write a book "Teensy for Dummy's" or some such for such as I.

We do indeed need much more beginner-oriented tutorial material.

But diving into the details of linker scripts isn't quite what I would have in mind for topics to cover in such tutorials...
 
We do indeed need much more beginner-oriented tutorial material.

I'd be glad to help if I can.

But diving into the details of linker scripts isn't quite what I would have in mind for topics to cover in such tutorials...

Well, I'm not exactly a novice, I've been doing embedded software/hardware for over 40 years, mostly for medical devices, and most recently using Xilinx Zynq parts. I'm retired now, but I love playing with these little boards and fooling around with Teensy4.1 has been great fun. I have several "home" projects I want to do with Teensy4.1 once I get up to speed. I very much appreciate all the help I've gotten on this forum, there's a wealth of information here (if you can find it).
John
 
Status
Not open for further replies.
Back
Top