Teensy 4.1 "dual boot" capability?

Hi all-
I have a project with two FW options and would prefer to be able to execute either one at startup, probably based on an "EEPROM" (upper flash) byte setting. They will both fit in the flash together, no problem there. In fact, I'm doing that now and have code to copy the alt flash into the main location so it will be booted instead of the main FW. That works, but is of course fairly slow and does a lot of flash writing. I'd much rather be able to go directly into either at startup and not juggle flash contents. Is there any way to do this? Guessing one would need to be compiled for a different address, and possibly a 3rd image to make the decision at boot.

I realize this is unorthodox, but I do think it's the best solution in my case. The project (known as TeensyROM) is a ROM emulator. Through exhaustive testing, I know that RAM (1 and 2) are the only sources with fast enough latency to serve up ROM emulation data. Direct from flash or external (serial) RAM/Flash are not fast enough to serve the host bus. Therefore, RAM availability is critical. The main FW has support for Ethernet, USB Hosting, MIDI, etc. The mere inclusion of these libs takes up valuable RAM space (both code and variables). So, when I need to emulate a particularly large ROM, I load a stripped down version of FW to clear out as much RAM as I can muster and drop all those other features. I don't see a way to do this within a single image as there's no way to "unload" an included lib.

I know this won't be easy, but appreciate any thoughts on how to make this happen.
Thank you,
Travis
 
So the IMXRT chip in the Teensy has a built-in ROM, which is the first thing that runs when it boots. It's capable of loading program images from various sources (Flash, SD, Serial etc.) based on how the fuses are set, so I would guess the default Teensy config restricts it only to a flash chip connected to FlexSPI. But, in addition to just booting a program image, the ROM also supports "plugins" - a special type of image whose sole purpose is to load another image from a different source (e.g. USB or ethernet) and then return control to the ROM, which then continues to load the new image. So what I'm thinking is in theory it would be possible to have multiple main program images stored on the flash, with a plugin image stored at the beginning (where the ROM loads from) that selects which program to load, based on either saved config data or even a pin held high/low.

It would be a fair bit of low-level work to get it all working (lots of referencing chapter 9: system boot in the IMXRT1060 pdf and messing with build/linker scripts), but that's the "easiest" way I could think of for loading different images without constant reflashing.
 
Interesting, thank you for the thoughts. That definitely sounds like a potential path, similar to what I meant by a 3rd image to make the decision at boot. It would just choose which image to load, very short.
I'm a bit surprised if this hasn't been needed before by others? Seems like it could be a handy capability.
 
If you store two images (hex files) on SD, each with the capability of switching to the other, whichever one is running could load the other with these steps: (1) read image from SD and write it to a flash buffer above the active image, (2) call routine (in RAM) to copy the buffer to the flash base address, erasing the lower sectors as it goes, and (3) erase the buffer to be ready for next swap. Would that be acceptable? This is the update method used in FlasherX, which you can find here (https://github.com/joepasquariello/FlasherX/releases/tag/v2.3). I use FlasherX to update Teensy via UART, but others use it update via SD.
 
That's basically what I'm currently doing, and have borrowed (with credit given) from this lib extensively for easy flash update capability. :)
What I primarily don't like about this method is that it's very slow writing to flash, so the user has to wait longer to switch. Also concerned about writing flash so often, but I guess it can take quite a few writes. Lastly, more flash writes means greater possibility of interruption/corruption.
Looking for a faster method with less possibility of corruption.
 
That's basically what I'm currently doing, and have borrowed (with credit given) from this lib extensively for easy flash update capability. :)
What I primarily don't like about this method is that it's very slow writing to flash, so the user has to wait longer to switch. Also concerned about writing flash so often, but I guess it can take quite a few writes. Lastly, more flash writes means greater possibility of interruption/corruption.
Looking for a faster method with less possibility of corruption.
Oh, okay. My applications are pretty small, so I think of the flash write as fast. Or perhaps I should say that I think of the slow part as being the parsing of the hex file and writing to the flash buffer. Once that is done, I think of the copy from buffer to flash base address as being fast. If you put an 8MB PSRAM or flash on the underside of T4.1, you could store both programs there on bootup, and then you would not have to read SD or buffer the new image to flash on updates. You would still be doing an erase/write on every switch. I have tested FlasherX with the new image stored in PSRAM and it works fine.
 
I think of the slow part as being the parsing of the hex file and writing to the flash buffer. Once that is done, I think of the copy from buffer to flash base address as being fast.
Hmmm... My implementation is taking about 20 sec to parse/write the hex file from SD (a little over 2MB of code data after parsing), however, the copy/erase portion takes another 73 seconds(!) I've always thought that was a lot, but I'm using flash_move() exactly as it was written. Any thoughts on that? Doesn't seem to match what you are saying, and certainly does seem long...
 
Hmmm... My implementation is taking about 20 sec to parse/write the hex file from SD (a little over 2MB of code data after parsing), however, the copy/erase portion takes another 73 seconds(!) I've always thought that was a lot, but I'm using flash_move() exactly as it was written. Any thoughts on that? Doesn't seem to match what you are saying, and certainly does seem long...
I tested with an application a bit over 2MB, and I get 20 seconds to read from SD and write to the flash buffer, and 57 seconds for flash_move(). Less than your 73 seconds, and I don't know why, but still on the order of 1 minute. I tried commenting out the erase of the buffer at the end of flash_move(), and that reduced the time to 37 seconds, still pretty long. I've recently done some benchmarking on writes to the 8MB flash, and a typical time is ~25 us per 32-bit value. If that is correct, it implies about 13 seconds to write 2MB, so that plus erase time would be the lower limit if you have to move code.
 
I tested with an application a bit over 2MB, and I get 20 seconds to read from SD and write to the flash buffer, and 57 seconds for flash_move(). Less than your 73 seconds, and I don't know why, but still on the order of 1 minute. I tried commenting out the erase of the buffer at the end of flash_move(), and that reduced the time to 37 seconds, still pretty long. I've recently done some benchmarking on writes to the 8MB flash, and a typical time is ~25 us per 32-bit value. If that is correct, it implies about 13 seconds to write 2MB, so that plus erase time would be the lower limit if you have to move code.
Thank you much for the testing/confirmation. My app was 2,177,024 bytes, and I measured 43s for the move and 30 for the erase. We're definitely in the same ballpark. Would be interested to hear if you find a way to move more towards the 13 second number! :)

So, you can see why I started this thread. :) It would be great to make that time non-existent if switching between only two images, and potentially often...
 
Thank you much for the testing/confirmation. My app was 2,177,024 bytes, and I measured 43s for the move and 30 for the erase. We're definitely in the same ballpark. Would be interested to hear if you find a way to move more towards the 13 second number! :)

So, you can see why I started this thread. :) It would be great to make that time non-existent if switching between only two images, and potentially often...
For an application as large as yours, I think buffering in PSRAM would be the fastest. I did a test, and the time to read SD and write to PSRAM is 5 seconds (compare to 20 to write to a flash buffer). The reduction of 15 seconds corresponds pretty well to my estimate of ~13 seconds to write to flash. The time to flash_move() from PSRAM to flash based address is ~37 seconds, the same as I measured for the move from flash with no erase of the buffer, so that is dominated by the time to erase/write the lower flash, and still over 40 seconds total. For TeensyLoader, I measure 17 seconds to erase and load the application, which is probably close to the lower limit.
 
@TravisSmith
Not sure how helpful this would be for you but I've been working on a bootloader type application for T4.1
We have self updating firmware (using the flasherX library) but needed a way to recover if power failed half way through an update without resorting to connection the USB cable.

The application sets a flag using the EEPROM library to indicate it is starting an upgrade. Then after writing the code to flash and before rebooting sets a flag indicating that the upgrade is complete.

We now have a small app that is built and run as a normal teensy application. This checks the EEPROM values for any indication that an update failed. It also checks the flash to see if it appears to contain an application by looking for the correct magic numbers in the expected locations. If everything checks out OK it runs the application.

If things look wrong for any reason it doesn't run the application and instead looks for a .hex file on the SD card. If it finds one it uses flasherX to load that into the flash. After loading the .hex and before flashing it there are some checks added to ensure that the application start address is consistent with an application that was built for using the bootloader rather than a standard teensy app.

There were a couple of small changes needed to flasherX for this to work - setting the EEPROM flags to indicate upgrade starting and success before rebooting, changing the flash start address so that it thinks the flash starts after the end of the bootloader application, checking the image was built for the correct flash address.
Also since the bootloader has very little memory usage I changed the default flasherX buffer location. Rather than using flash (which makes for a slow upgrade process) it uses external memory if fitted, failing that DMAMEM. The DMAMEM fallback limits your application flash image size to around 400k but that's plenty for us.


To build the main application you need to change the linker file and bootdata.c to increase the flash start address to after the end of the bootloader and reduce the available size appropriately. After changing those numbers in the two files the standard arduino IDE will build a suitable .hex file for using with the bootloader. Remember to change them back afterwards. The resulting .hex file needs to be copied to the teensy memory using the bootloader application rather than over USB download. Other than those changes you don't need to do anything special for the application code.

The magic numbers in the flash are 0x42464346 as a uint32_t at the start of the flash image and 0x432000D1 at a location 0x1000 after the start of the flash image.
The memory address to jump to to start the application is then located at an address 0x1004 into the image.
Before jumping to that location I disable the MPU and invalidate and disable both caches.


For your use case (switching between two applications) you could probably use a modified version of this. Build the two applications you want to run for two different non-overlapping FLASH locations. The bootloader application could then 1) look at the image and decide which location to flash it to and 2) on boot up either prompt the user, check some IO pin or or check the EEPROM to decide which image to run.

I have no idea what impact any of this would have on any flash file system libraries.


Edit - Thinking about your specific application (picking application at run time but not so worried about recovery from bad updates) I'd be tempted to simplify the bootloader application even more, all it does is check the EEPROM and launch the correct application. No need to be able to update the flash. The two applications could then be built with different start addresses by modifying bootdata.c and the linker file, you'd need to check the size of each one to ensure you don't overlap.
This gives you 3 .hex files, each with a different start address in memory. I'm fairly certain you could find a tool that would combine these into a single .hex file. You could then use the standard USB teensy download tool to load that .hex into the device.
 
Last edited:
@AndyA
Thank you so much, this sounds like the perfect solution! I wonder if it could be further simplified by combining the bootloader app with one of the main images (probably the stripped down one) and just jumping to the other immediately after start if needed? I'm not sure if any of the info from one image (ie RAM2 allocation) would carry into the second app or if it would be truly starting from scratch (as desired). Maybe that's a reason to keep the "bootloader" image separate and small as possible.
I haven't modified the linker script before. I should be able to figure that out, unless you have some specific pointers (no pun intended)? Everything else sounds quite straight forward.
Thank you again! -Travis
 
Yes, you could make it so that the code for mode 1 checks for the mode 2 flag and jumps to that code on startup. That would indeed cut it down to only 2 applications you need to build.
The nice thing about doing it this way is that once the application starts to run it's as if it's the only only application that has ever run. All the memory configuration and initialisation is done from scratch as if that was the only code installed (well other than the flash addresses used). But without the need to move things in the flash each time.

I can't really post the full code, it's all mixed in with decoding proprietary encrypted upgrade files etc. but can probably give you some code snippets for the relevant sections if you hit an issue. Although as with most things the bit that took the longest was working out what needed to be done not actually coding it up. The address for the first instruction was an odd number, I spent ages trying to find a mistake in my logic because it didn't seem right that a 32 bit system have an odd number for the start address of the application code.

What I did was create a copy of the linker file with a .original extension and then save the modified version with a .bootloader extension. I then have two batch files titled Build app and build bootloader which copy from the appropriate version to the location the standard build tools use. Far less error prone than having to manually change / edit it. I'm sure there are way to make it more automated but that's good enough for me for now.
 
Partly in case you need it and partly for anyone finding this thread later on, here's a somewhat stripped down block of code.

If your second application was built to start at an address 0x100000 into the flash (when it was built FLASH in the .ld file is modified to start at 0x60100000) then
runApp(0x100000);
will attempt to run that code. If it doesn't look like there is a valid image at that location it will return false.

Since this is assembled from various bits rather than copied directly from the compiler there may be some syntax errors I've missed.

Also https://developer.arm.com/documentation/ka003262/latest/ links to a tool that will merge the two .hex files for you.

Code:
const uint32_t FLASH_BASEADDRESS = 0x60000000;

bool checkForValidImage(uint32_t addressInFlash) {
  uint32_t SPIFlashConfigMagicWord = *((uint32_t*)addressInFlash);
  uint32_t VectorTableMagicWord = *((uint32_t*)(addressInFlash+0x1000));
  if ((SPIFlashConfigMagicWord == 0x42464346) && (VectorTableMagicWord==0x432000D1))
    return true;
 
  Serial.println("Invalid magic numbers, no flash image found");
  return false;
}

typedef  void (*pFunction)(void);

FLASHMEM bool runApp(uint32_t offsetFromStart) {
  uint32_t imageStartAddress = FLASH_BASEADDRESS+offsetFromStart;
  if (!checkForValidImage(imageStartAddress))
      return false;
 
  // ivt starts 0x1000 after the start of flash. Address of start of code is 2nd vector in table.
  uint32_t firstInstructionPtr = imageStartAddress + 0x1000 + sizeof(uint32_t);
  Serial.printf("First instruction pointer is at address 0x%08X\r\n", firstInstructionPtr);
  uint32_t firstInstructionAddr = *(uint32_t*)firstInstructionPtr;

  // very basic sanity check, code should start after the ivt but not too far into the image.
  if ( (firstInstructionAddr < (imageStartAddress+0x1000)) || (firstInstructionAddr > (imageStartAddress+0x3000)) ) {
    Serial.printf("Address of first instruction %08X isn't sensible for location in flash. Image was probably incorrectly built\r\n", firstInstructionAddr);
    return false;
  }
  Serial.printf("Jumping to code at 0x%08X\r\n", firstInstructionAddr);
  delay(10); // give the serial port time to output so we see that message.

  pFunction Target_Code_Address = (pFunction) firstInstructionAddr;
  disableCache();
  Target_Code_Address();
  while(true) {
    Serial.println("Shouldn't be here");
    delay(1000);
  }
}


// disable and invalidate both caches. Disable MPU.
// assembled from bits of core_cm7.h
// uses various #defines from that file that will need to be included.
// probably some library functions we could call instead of this.
FLASHMEM void disableCache() {

  SCB_MPU_CTRL = 0; // turn off MPU
  SYST_CSR = 0; // turn off system tick
  // disable all interrupts
  for (int i=0;i<NVIC_NUM_INTERRUPTS;i++) {
    NVIC_DISABLE_IRQ(i);
  }

  register uint32_t ccsidr;
  register uint32_t sets;
  register uint32_t ways;

  SCB->CSSELR = 0U; /*(0U << 1U) | 0U;*/  /* Level 1 data cache */
  asm("dsb");

  SCB->CCR &= ~(uint32_t)SCB_CCR_DC_Msk;  /* disable D-Cache */
  asm("dsb");

  ccsidr = SCB->CCSIDR;
                                            /* clean & invalidate D-Cache */
  sets = (uint32_t)(CCSIDR_SETS(ccsidr));
  do {
    ways = (uint32_t)(CCSIDR_WAYS(ccsidr));
    do {
      SCB->DCCISW = (((sets << SCB_DCCISW_SET_Pos) & SCB_DCCISW_SET_Msk) |
                     ((ways << SCB_DCCISW_WAY_Pos) & SCB_DCCISW_WAY_Msk)  );
    } while (ways-- != 0U);
  } while(sets-- != 0U);

  asm("dsb");
  asm("isb");
  SCB->CCR &= ~(uint32_t)SCB_CCR_IC_Msk;  /* disable I-Cache */
  SCB->ICIALLU = 0UL;                     /* invalidate I-Cache */
  asm("dsb");
  asm("isb");
}

edit- added disable systick line to the disable cache function.
edit2- added interrupt disable to the disable cache function.
 
Last edited:
Yeah, sounds entirely possible. On the Due I wrote some routines to boot to one of two zones. This worked well because the Due has a selector bit that selects one of two banks upon start up so you can switch. You do then need for your interrupt vectors to point to the correct sketch which usually involves a custom linker script as has already been said. It's all perfectly feasible and I think the ideas and solutions given sound good. So, I guess this is just a message of encouragement. I've done it on a Due which is also likewise not really meant for this sort of thing (despite it's bank switching capability).
 
One additional detail.
Before jumping to the new application include the line

Code:
SYST_CSR = 0; // turn off system tick

For simple applications this isn't needed.
But if your application includes ethernet support then it'll crash on startup without that.
Only took most of a day to figure that one out :-(
 
And one further addition found due to it not working in a different system:
If there is data coming in constantly on external interfaces you also need to disable all user space interrupts before rebooting. But not by using __disable_irq().
Post #15 updated to include new changes.
 
Back
Top