DMAMEM / RAM2 and detecting first boot after power on alternatively detecting warmboot

Schaeg · Aug 1, 2024

According to official documentation the Teensy 4.1 comes with two RAM banks.

https://imgur.com/a/rzECetc

The RAM2 bank also called DMAMEM has the interesting property that it isn't reset after a warm reboot which can be triggered after press of the white button on the board or using watchdog timer.

For a larger project i want to keep track of the number of sequential warm reboots between cold reboots. For this i need to have the ability distinguish between the two types.

The Teensy will be used without a host computer connected to it. To test this functionality i adapted the teensy tutorial number 3 like this.

C:

void setup()   { 
  static int DMAMEM bootcount = 0; // this is non-sensical
      
  Serial.begin(38400);
  delay(1000); // give user time to restart serial console to not miss first message.
  Serial.printf("I've been rebooted %d times", bootcount);
  bootcount += 1;
  arm_dcache_flush(&bootcount, sizeof(bootcount)); // without this i've never observed the counter being incremented

void loop()                 
{
  Serial.println("Hello World");
  delay(1000);
}

Using this code one observes:
Flash code using Arduino IDE and connect to serial via the command

Bash:

cu -l /dev/ttyACM0 -s 38400

Connected.
I've been rebooted -1954828805 timesHello World
Hello World
Hello World
Hello World
...

After press white button to warm-restart the teensy code the serial connection dies

cu: Got hangup signal

Disconnected.

If one quickly reconnects to serial using the same command as before one sees

Connected.
I've been rebooted -1954828804 timesHello World
Hello World
Hello World
Hello World
....

Repeating the same cycle one observes

cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1954828803 timesHello World
Hello World
Hello World
Hello World
cu: Got hangup signal

Disconnected.

Thus the count up mechanism works successfully however the starting point depends on a value in uninitialized value which can be arbitrary. If the Teensy 4.1 is power cycled it starts counting from a different place.

anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971638790 timesHello World
Hello World
Hello World
cu: Got hangup signal
^[[A
Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971638789 timesHello World
Hello World
Hello World
Hello World
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971638788 timesHello World
Hello World
Hello World
Hello World
Hello World
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971638787 timesHello World
Hello World
Hello World
Hello World
cu: Got hangup signal

Disconnected.

I am looking for a way to reliably detect a first boot or alternative detect a warm boot. Elsewhere in the forum I found this this snippet of code.

C:

bool isWarmBoot()
{
    static DMAMEM unsigned bootCheck; // DMAMEM is not zeroed during bootup

    if (bootCheck != 0xAAAA'AAAA)
    {
        bootCheck = 0xAAAA'AAAA;                         // some number
        arm_dcache_flush(&bootCheck, sizeof(bootCheck)); // dmamem is cached, force a write to memory
        return false;
    }
    return true;
}

However it doesn't actually detect warm boots. It just compares an arbitrary constant with uninitialized memory. This test might mistake a cold boot for a warm boot if the uninitialized memory just happens to be initialized in the wrong way. While one could add more similar tests to reduce the likelihood of such a mistake one can't eliminate the chance this way and i am fundamentally dubious of having the correctness of the program depend on the behavior of uninitialized memory.

Is there a better approach to distinguish cold vs warm boot?
A better way would rely only on documented behavior and not undocumented aspects of proprietary software (such as undocumented behavior of closed-source boot loader which isn't asserted to remain the same) or proprietary hardware (mechanisms whose behaviour differs between warm reboot and cold reboot which aren't documented in official docs of the chip or the ISA).

EDIT: Replace with rewritten text used to create a duplicate because i was unaware that a partial question was submitted
EDIT2: Clarify i am interested in the number of soft reboots between power cycles not the total number of boots.

jmarsh · Aug 1, 2024

This really isn't a good way of doing it (relying on memory contents staying constant between reboots) since it invokes undefined behaviour - static variables are meant to have a defined initial value (0 in your example sketch) or the compiler can assume they will be initialized to zero.
Instead I would recommend checking the SRC_SRSR register which can give specific information about what caused the reboot. There is more information on its layout/purpose in the IMXRT1062 reference manual.
If you need to keep track of how many times a reboot has occurred, I would suggest using the EEPROM library.

sbfreddie · Aug 1, 2024

Be aware! The white button on the Teensy is not a restart or reset button, it merely puts the Teensy into program mode.
There is no reset button on any Teensy currently shipping.

Regards,
Ed

shawn · Aug 1, 2024

Also note that DMAMEM variables can’t be statically initialized.

Ref: https://www.pjrc.com/store/teensy41.html

Schaeg · Aug 1, 2024

Right ... i had the arduino_ide runnning in automatic mode on the computer i connected to via serial.

sbfreddie said:
Be aware! The white button on the Teensy is not a restart or reset button, it merely puts the Teensy into program mode.
There is no reset button on any Teensy currently shipping.

Would you call it inaccurate to call pressing the white button a warm restart under those circumstances? If so why?

shawn said:
Also note that DMAMEM variables can’t be statically initialized.

Yes and my experiments confirmed that already.

After writing some program which inspects a bunch of registers and sends them via serial i learned:

Some parts of (struct arm_fault_info_struct *)0x2027FF80 behaves like RAM2 as far as i can tell. Others don't change.
I can't use any of these registers to distinguish pressing the white button (which preserves RAM2) from pluging in and out: SRC_SRSR, SRC_SBMR1, SRC_SBMR2, SRC_GPR5, SRC_SRSR_CSU_RESET_B, SRC_SRSR_WDOG_RST_B, SRC_SRSR_WDOG3_RST_B, SRC_SRSR_JTAG_RST_B, SRC_SRSR_JTAG_SW_RST, SRC_SRSR_TEMPSENSE_RST_B

C:

static volatile uint32_t DMAMEM bootcount;


static volatile uint32_t SRSR;
static volatile uint32_t SBMR1;
static volatile uint32_t SBMR2;
struct arm_fault_info_struct *info = (struct arm_fault_info_struct *)0x2027FF80;
struct crashreport_breadcrumbs_struct *bc = (struct crashreport_breadcrumbs_struct *)0x2027FFC0;


uint32_t gpr5;
uint32_t csu_reset;
uint32_t ipp_user;
uint32_t wdog_rst;
uint32_t wdog3_rst;
uint32_t jtag_b_rst;
uint32_t jtag_d_rst;
uint32_t temp_rst;


void setup()   {
     
  Serial.begin(38400);
  bootcount += 1;

  SRSR = SRC_SRSR;
  SBMR1 = SRC_SBMR1;
  SBMR2 = SRC_SBMR2;
  gpr5 = SRC_GPR5;
  csu_reset = SRC_SRSR_CSU_RESET_B;
  ipp_user = SRC_SRSR_IPP_USER_RESET_B;
  wdog_rst = SRC_SRSR_WDOG_RST_B;
  wdog3_rst = SRC_SRSR_WDOG3_RST_B;
  jtag_b_rst = SRC_SRSR_JTAG_RST_B;
  jtag_d_rst = SRC_SRSR_JTAG_SW_RST;
  temp_rst = SRC_SRSR_TEMPSENSE_RST_B;
 



  arm_dcache_flush((void*)&bootcount, sizeof(bootcount)); // without this i've never observed the counter being incremented
};

void loop()                
{
  Serial.printf("I've been rebooted %d times", bootcount);
  Serial.println("");
  Serial.printf("SRSR: %08x, SBMR1: %08x, SBMR2: %08x, gpr5: %08x, ipp: %08x, wdog: %08x, wdog3: %08x, jb: %08x, jd: %08x, temp: %08x ",
   SRSR, SBMR1, SBMR2, gpr5, ipp_user, wdog_rst, wdog3_rst, jtag_b_rst, jtag_d_rst, temp_rst);
  Serial.println("");
  Serial.printf("%08x %08x %08x %08x %08x  %08x %08x %08x %08x",
  info->len, info->ipsr, info->cfsr, info->hfsr, info->mmfar,
   info->bfar, info->ret, info->xpsr,  info->time);
  Serial.println("");
  delay(1000);
}

anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963233789 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
d87ff72e c38440a5 6cbfd2d8 22b2225 1f39f9de 89083eb8 18e7d5e9 49a00488 16fe43d1

cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963233788 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
d87ff72e c38440a5 6cbfd2d8 22b2225 1f39f9de 89083eb8 18e7d5e9 49a00488 16fe43d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963233787 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
d87ff72e c38440a5 6cbfd2d8 22b2225 1f39f9de 89083eb8 18e7d5e9 49a00488 16fe43d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1954828806 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
fc6ff72c c3a440a7 acbfd2d8 2232221 1f79f95e cc0836b8 1ce7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1954828805 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
fc6ff72c c3a440a7 acbfd2d8 2232221 1f79f95e cc0836b8 1ce7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1694765574 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
f8eff72e c38440a5 ecbfd2fa 223a221 1f79f9de c80836b0 18e7d5f1 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1694765573 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
f8eff72e c38440a5 ecbfd2fa 223a221 1f79f9de c80836b0 18e7d5f1 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963250180 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
f86ff73c c2a440b7 3cbfd2f8 22b2225 1f79f95e c90836b8 39e7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963250179 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
f86ff73c c2a440b7 3cbfd2f8 22b2225 1f79f95e c90836b8 39e7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963201062 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
fc6ff72c c3a440a7 2cbfd2d8 22b2221 1f79f85e c80836b8 1ce7d5f9 49a00488 16fe43d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963201061 times
SRSR: 0001, SBMR1: 0000, SBMR2: 0019, gpr5: 0000, ipp: 0008, wdog: 0010, wdog3: 0080, jb: 0020, jd: 0040, temp: 0100
fc6ff72c c3a440a7 2cbfd2d8 22b2221 1f79f85e c80836b8 1ce7d5f9 49a00488 16fe43d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963217413 times
SRSR: 00000001, SBMR1: 00000000, SBMR2: 00000019, gpr5: 00000000, ipp: 00000008, wdog: 00000010, wdog3: 00000080, jb: 00000020, jd: 00000040, temp: 00000100
fc6ff72e c3a440a7 2cbfd2d8 02332225 1f79f9de cc0836b8 18e7d5e9 49a00488 16fe43d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1963217412 times
SRSR: 00000001, SBMR1: 00000000, SBMR2: 00000019, gpr5: 00000000, ipp: 00000008, wdog: 00000010, wdog3: 00000080, jb: 00000020, jd: 00000040, temp: 00000100
fc6ff72e c3a440a7 2cbfd2d8 02332225 1f79f9de cc0836b8 18e7d5e9 49a00488 16fe43d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1694781958 times
SRSR: 00000001, SBMR1: 00000000, SBMR2: 00000019, gpr5: 00000000, ipp: 00000008, wdog: 00000010, wdog3: 00000080, jb: 00000020, jd: 00000040, temp: 00000100
f86ff72e c3a450a7 2cbfd2d8 02232225 1f79f9df cc0836b8 18e7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971606052 times
SRSR: 00000001, SBMR1: 00000000, SBMR2: 00000019, gpr5: 00000000, ipp: 00000008, wdog: 00000010, wdog3: 00000080, jb: 00000020, jd: 00000040, temp: 00000100
fc6ff72e c3a440a7 acbfd2d8 02232221 1f79f9de cd0836b8 38e7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal
Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971606051 times
SRSR: 00000001, SBMR1: 00000000, SBMR2: 00000019, gpr5: 00000000, ipp: 00000008, wdog: 00000010, wdog3: 00000080, jb: 00000020, jd: 00000040, temp: 00000100
fc6ff72e c3a440a7 acbfd2d8 02232221 1f79f9de cd0836b8 38e7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.
anabrid@think17:~$ cu -l /dev/ttyACM0 -s 38400
Connected.
I've been rebooted -1971606050 times
SRSR: 00000001, SBMR1: 00000000, SBMR2: 00000019, gpr5: 00000000, ipp: 00000008, wdog: 00000010, wdog3: 00000080, jb: 00000020, jd: 00000040, temp: 00000100
fc6ff72e c3a440a7 acbfd2d8 02232221 1f79f9de cd0836b8 38e7d5f9 49a00488 16fe47d1
...
cu: Got hangup signal

Disconnected.

My best canditate solution so far is to use that the reboot reason being a certain watchdog as sign as that a certain value has been initialized. Here some pseudo code using watchdog 3 in the boot process:

Code:

static uint32_t DMAMEM warm_bootcount;
static bool DMAMEM maybe_booted_before;
static bool DMAMEM can_trust_RAM2_to_be_good_warm_boot;

void setup()   {
  can_trust_RAM2_values_to_be_good_after_a_boot = false;
  if(wdog_3_was_restart_reason() || app_was_reason() ){
    if(maybe_booted_before) {
      warm_bootcount +=1;
      arm_dcache_flush(&warm_bootcount, sizeof(warm_bootcount));    
    } else {
      warm_bootcount = 0;
      arm_dcache_flush(&warm_bootcount, sizeof(warm_bootcount));
      init_all_other_ram2_values_if_they_are_assumed_to_have_a_value_set_after_a_non_warm_boot_also_flush();
      can_trust_RAM2_to_be_good_warm_boot = true;
      arm_dcache_flush(&can_trust_RAM2_to_be_good_warm_boot, sizeof(can_trust_RAM2_to_be_good_warm_boot));
      maybe_booted_before = true;
      arm_dcache_flush(&maybe_booted_before, sizeof(maybe_booted_before));
    }
  } else {
    maybe_booted_before = false;
    arm_dcache_flush(&maybe_booted_before, sizeof(maybe_booted_before));
    set_up_wdog_3();
    while(true) delay(irrelevant_amount); // wdog 3 will trigger.
  }
  // unreachable unless can_trust_RAM2_to_be_good_warm_boot is true

  initProgram();

};

I am not very happy with that solution as additional complexity is required in a few places to make this work. This solution will never count a cold start as a wam start but any restart caused by the application not through will watchdog 3 will be seen as a cold start. One needs to be able to anticipate all those in app_was_reason() without that relying on uninitialized memory in that. I hope there is a better way which matches the criteria outlined in my first post.

jmarsh · Aug 1, 2024

Schaeg said:
Would you call it inaccurate to call pressing the white button a warm restart under those circumstances? If so why?

Yes it is inaccurate because presssing the button doesn't trigger a reboot. It puts the Teensy into programming mode. If there is no USB cable connected, or the teensy loader program isn't currently running on the PC to automatically upload the last compiled sketch, the Teensy will just sit in programming mode indefinitely.

Pressing the button should not be considered a warm reboot because it generally reprograms the Teensy, which is basically a cold boot - it is now running a completely different program than before, even if it's identical code.

Again: it is not safe to assume DMAMEM will not be modified by a reboot. There are definitely parts of it that will be overwritten and there is no guarantee your static variables won't be allocated in those areas.

Schaeg · Aug 1, 2024

jmarsh said:
Yes it is inaccurate because presssing the button doesn't trigger a reboot. It puts the Teensy into programming mode.

You are answering a question distinct from the one i asked. The "under those circumstances" is load barring. "Under those circumstances" the first hypothetical is irrelevant.

jmarsh said:
If there is no USB cable connected, or the teensy loader program isn't currently running on the PC to automatically upload the last compiled sketch, the Teensy will just sit in programming mode indefinitely.

Pressing the button should not be considered a warm reboot because it generally reprograms the Teensy, which is basically a cold boot - it is now running a completely different program than before, even if it's identical code.

If i append to "under those circumstances" that the program on the PC is unchanged the second hypothetical becomes irrelevant. This situation seems to be (to my limited knowledge, which is why i am asking) indistinguishable from a warm boot while distinct from cold boot, as pressing the white button keeps the RAM2 content active and unchanged, as such it is not really a cold boot where RAM2 is uninitialized.

I am asking there whether i am missing anything else which would make that distinct from a warm boot. This is a tangent i asked to further and test my understanding.

jmarsh · Aug 1, 2024

Schaeg said:
as pressing the white button keeps the RAM2 content active and unchanged

It doesn't.
Write the entire contents of RAM2, perform a "warm" reboot and then check the entire contents. You will see it has not all been preserved. Then realize that your static variables can be allocated anywhere in DMAMEM (by the linker) depending on how other code in the libraries you use make use of DMAMEM.

Schaeg · Aug 1, 2024

jmarsh said:
Then realize that your static variables can be allocated anywhere in DMAMEM (by the linker) depending on how other code in the libraries you use make use of DMAMEM.

This isn't relevant as the program was assumed identical but yeah if the program changes (or the dependency) all bets are off.

jmarsh said:
It doesn't.
Write the entire contents of RAM2, perform a "warm" reboot and then check the entire contents. You will see it has not all been preserved.

You are right. It's not the entire content which is preserved. In all my past experiments all variables i allocated in SRAM were preserved always perfectly when pressing the white button. However an entire RAM dump made using this program finds differences. Thank you for making me aware of this.

C:

void setup() {
  // RAM2 size is 512 Kilobytes which is 512000 bytes.
  uint8_t * RAM2 = (uint8_t *)0x20200000;
  Serial.begin(38400);
  while (!Serial) {
    ; // Wait for Serial connection.
  }
  for (int i=0;i<512000; i++) {
  
    if (i%64 == 0) {
    Serial.println("");
    Serial.printf("[%08x] :",  RAM2 + i);
    Serial.println("");}
    else if  (i%8 == 0) {
    Serial.print(" "); }
    Serial.printf("%01x", RAM2[i]);
  }
Serial.println("");Serial.println("DONE.");
Serial.flush();
Serial.end();


}
void loop() {
  delay(1000);
}

dump1.txt, dump2.txt, dump3.txt were created after pressing the white button while the much more differing dump1-1.txt was created after disconnecting USB and rerunning the program. The first 3 files are a lot more similar but not fully similar. I have no reason to believe my allocated variables are special somehow. I just was lucky that i never noticed them differing between white button presses.

Since pressing the white button obviously isn't doing a warm restart is there a complete list of options which can cause a warm restart (and how to trigger them)?

defragster · Aug 1, 2024

For sure the Button on All Teensy units takes the processor offline to Program mode by the separate PJRC bootloader chip. This is part of the design to make a Teensy 'unbrickable'. It is not a 'reset' or restart in any other way.

The RAM2/DMAMEM does get changed in the lower portion some 32 or 64 KB as it passes through the boot startup.

If the RAM2 addresses are fixed and KNOWN and unchanging ( static allocs ) and outside the changed area it can be counted on to survive a 'warm restart' - but only data that has been flushed from the CACHE. LittleFS RAM drives used a fixed allocation in this area and with a FLUSH on writes - the drive survived warm restart ... until that 'feature' was determined to be 'for testing' and was replaced with 'Format' on each restart.

Schaeg · Aug 2, 2024

So now that the partially ephemeral nature of RAM2/DMAMEM has been thoroughly explored i want to refocus the question on the original issue.

1. My original question:

Schaeg said:
Is there a better approach to distinguish cold vs warm boot?
A better way would rely only on documented behavior and not undocumented aspects of proprietary software (such as undocumented behavior of closed-source boot loader which isn't asserted to remain the same) or proprietary hardware (mechanisms whose behaviour differs between warm reboot and cold reboot which aren't documented in official docs of the chip or the ISA).

2. My horrible but best solution so far and whether people have comments on it.

Schaeg said:

Code:

static uint32_t DMAMEM warm_bootcount;
static bool DMAMEM maybe_booted_before;
static bool DMAMEM can_trust_RAM2_to_be_good_warm_boot;

void setup()   {
  can_trust_RAM2_values_to_be_good_after_a_boot = false;
  if(wdog_3_was_restart_reason() || app_was_reason() ){
    if(maybe_booted_before) {
      warm_bootcount +=1;
      arm_dcache_flush(&warm_bootcount, sizeof(warm_bootcount));   
    } else {
      warm_bootcount = 0;
      arm_dcache_flush(&warm_bootcount, sizeof(warm_bootcount));
      init_all_other_ram2_values_if_they_are_assumed_to_have_a_value_set_after_a_non_warm_boot_also_flush();
      can_trust_RAM2_to_be_good_warm_boot = true;
      arm_dcache_flush(&can_trust_RAM2_to_be_good_warm_boot, sizeof(can_trust_RAM2_to_be_good_warm_boot));
      maybe_booted_before = true;
      arm_dcache_flush(&maybe_booted_before, sizeof(maybe_booted_before));
    }
  } else {
    maybe_booted_before = false;
    arm_dcache_flush(&maybe_booted_before, sizeof(maybe_booted_before));
    set_up_wdog_3();
    while(true) delay(irrelevant_amount); // wdog 3 will trigger.
  }
  // unreachable unless can_trust_RAM2_to_be_good_warm_boot is true

  initProgram();

};

defragster · Aug 2, 2024

Schaeg said:
So now that the partially ephemeral nature of RAM2/DMAMEM has been thoroughly explored i want to refocus the question on the original issue.

Given the above storing known values in RAM2 and finding them there or missing would indicate that.

See CrashReport code that is how it works.

Schaeg · Aug 2, 2024

defragster said:
Given the above storing known values in RAM2 and finding them there or missing would indicate that.

I think you are suggesting the solution i already discussed here.

Schaeg said:
I am looking for a way to reliably detect a first boot or alternative detect a warm boot. Elsewhere in the forum I found this this snippet of code.

C:

bool isWarmBoot() { static DMAMEM unsigned bootCheck; // DMAMEM is not zeroed during bootup if (bootCheck != 0xAAAA'AAAA) { bootCheck = 0xAAAA'AAAA; // some number arm_dcache_flush(&bootCheck, sizeof(bootCheck)); // dmamem is cached, force a write to memory return false; } return true; }

This solution is flawed. While the RAM2 can be sort of seen as initialized during a warm reset the same locations are uninitialized during a cold start. Thus code would read uninitialized data on which might look just like the data one is expecting to find there to detect a warm reboot. To make sure one doesn't mistake a cold start for a warm reset one first needs to know that a warm reset was performed before one can trust any RAM2 content. I am looking a for solution to establish that information.

Schaeg · Aug 2, 2024

I might have found a lot more elegant solution. Can anyone confirm that that approach will accurately count the number of soft restarts in RAM2?

C:

static uint32_t DMAMEM warm_bootcount;
static bool DMAMEM can_trust_RAM2_to_be_good_warm_boot;

void setup()   {
  uint32_t SRSR = SRC_SRSR;
  warm_bootcount += 1;
  can_trust_RAM2_to_be_good_warm_boot = false;
 
  if((SRSR & 1) == 1){ // check power on flag // used bitwise and here
    warm_bootcount = 0;
    arm_dcache_flush(&warm_bootcount, sizeof(warm_bootcount));
    init_all_other_ram2_values_if_they_are_assumed_to_have_a_value_set_after_a_non_warm_boot_also_flush();
    can_trust_RAM2_to_be_good_warm_boot = true;
    arm_dcache_flush(&can_trust_RAM2_to_be_good_warm_boot, sizeof(can_trust_RAM2_to_be_good_warm_boot));
    SRC_SRSR = 1; // clear power on flag
  } else if (has_crashed()){  // assuming there are no reason for the app to choose to warm reset
    crash_report();
    repair();
  }
 
  initProgram();
};

kd5rxt-mark · Aug 2, 2024

Here is a suggested approach (off the top of my head, so take it with due consideration !!):

Define two locations in EEPROM (call them LOC1 & LOC2). Each time your sketch starts up, read LOC1, increment it, & store the new value into LOC1. Now read LOC2 & compare it to the updated value of LOC1. If they are the same, then a WARMSTART has just occurred. If they are different, then a COLDSTART has just occurred. The value in LOC1 keeps track of the total number of restarts, regardless of the type.

Use the following pseudocode to execute a WARMSTART:

Code:

read LOC1, increment it, then store the new value into LOC2
SCB_AIRCR = 0x05FA0004;  // cause the Teensy to reset
asm volatile ("dsb");  // cause a cache flush before exiting

This approach depends upon the following assumptions:
- EEPROM is initialized to all 0xFF
- restarts are not occurring very quickly and/or very often, so not *excessively* writing to EEPROM

General questions:
- when/how/why is a WARMSTART occurring ??
- are you using a generic pin to allow WARMSTART by the operator ??
- are you using a WDT ?? If so, then the WDT handler should include the same LOC1/LOC2 management

Food for thought . . .

Mark J Culross
KD5RXT

Schaeg · Aug 2, 2024

kd5rxt-mark said:
General questions:
- when/how/why is a WARMSTART occurring ??

watchdog timeout, apply software update, recovering from crashes/faults

kd5rxt-mark said:
- are you using a generic pin to allow WARMSTART by the operator ??

No but i allow triggering warm starts via the network (after authentication).

kd5rxt-mark said:
- are you using a WDT ?? If so, then the WDT handler should include the same LOC1/LOC2 management

This solution would preclude using watchdog 1 and 2 as they will cause this scheme to get out of sync. Watchdog 1 and 2 can't run custom code. Only Watchdog 3 can run custom code and can only run for 256 cycles. I am not sure that solution works timing wise even in that case.
Crashes/faults which only cause a warm restart but don't write those numbers to EPROM also would cause that system to go out of sync.

I would generally want to avoid the assumption that all warm restart are happening in a controlled manner. However counting the amount of total starts as you described is interesting. Your solution allows for detecting planned restarts. I have not evaluated how to log this info across power loss. My main concern was detection.

jmarsh · Aug 2, 2024

Schaeg said:

I might have found a lot more elegant solution. Can anyone confirm that that approach will accurately count the number of soft restarts in RAM2?

C:

static uint32_t DMAMEM warm_bootcount;
static bool DMAMEM can_trust_RAM2_to_be_good_warm_boot;

void setup()   {
  uint32_t SRSR = SRC_SRSR;
  warm_bootcount += 1;
  can_trust_RAM2_to_be_good_warm_boot = false;
 
  if((SRSR && 1) == 1){ // check power on flag
    warm_bootcount = 0;
    arm_dcache_flush(&warm_bootcount, sizeof(warm_bootcount));
    init_all_other_ram2_values_if_they_are_assumed_to_have_a_value_set_after_a_non_warm_boot_also_flush();
    can_trust_RAM2_to_be_good_warm_boot = true;
    arm_dcache_flush(&can_trust_RAM2_to_be_good_warm_boot, sizeof(can_trust_RAM2_to_be_good_warm_boot));
    SRC_SRSR = 1; // clear power on flag
  } else if (has_crashed()){  // assuming there are no reason for the app to choose to warm reset
    crash_report();
    repair();
  }
 
  initProgram();
};

This still relies on undependable RAM2. If you have a reliable method for distinguishing a cold boot, why not use it to set a counter in the eeprom to 0?
Also this use of logical AND (instead of bitwise AND) looks incorrect:

if((SRSR && 1) == 1)

Schaeg · Aug 2, 2024

jmarsh said:
This still relies on undependable RAM2.

Can you explain how that still relies on undependable RAM2?
In my limited understanding it relies on the behavior of the SRC Reset Status Register (SRC_SRSR) described on page 1276. https://www.pjrc.com/teensy/IMXRT1060RM_rev3.pdf#d40e546a1310 which is memory mapped and not RAM2. Setting a bit in that register clears that bit. So SRC_SRSR = 1; performs &= ~1; on the content of the register.

jmarsh said:
Also this use of logical AND (instead of bitwise AND) looks incorrect:

You are correct i meant bit wise and there.

If you have a reliable method for distinguishing a cold boot, why not use it to set a counter in the eeprom to 0?

I do not know that i have such a method yet. I am trying to find one in this thread. I have two candidates. The one you just spotted an error in and the one using watchdog to establish a chain of trust.

jmarsh · Aug 2, 2024

Schaeg said:
Can you explain how that still relies on undependable RAM2?

The flag and count variables are in RAM2, any minor change to your program may end up placing them in an unsafe area that isn't preserved across reboots. Using the eeprom instead would solve this problem once and for all.

Schaeg · Aug 2, 2024

jmarsh said:
any minor change to your program may end up placing them in an unsafe area that isn't preserved across reboots.

What do you mean by that exactly? What are those unsafe areas?
Are there parts of (flushed) DMAMEM which aren't preserved across warm restarts (watchdog,
#define REQUEST_EXTERNAL_RESET (AIRCR=(AIRCR&VECTKEY_MASK)|VECTKEY|SYSRESETREQ), faults, ... ) while the power is coming in?

jmarsh said:
The flag and count variables are in RAM2

Yes they are. They demonstrate how my mechanism is supposed to be used as an example. The actual cold boot detection logic doesn't depend on their value for correctness. So i do not see how this is relevant.

jmarsh said:
. Using the eeprom instead would solve this problem once and for all.

I am still thinking and try to get confidence in the correctness of my detecting cold vs warm boot approach. The number will be written to EPROM at some point or extracted over network for persistence.

Schaeg · Aug 2, 2024

Schaeg said:
Are there parts of (flushed) DMAMEM which aren't preserved across warm restarts (watchdog,
#define REQUEST_EXTERNAL_RESET (AIRCR=(AIRCR&VECTKEY_MASK)|VECTKEY|SYSRESETREQ), faults, ... ) while the power is coming in?

I guess i am going to adapt my dumper code from yesterday after lunch to check the assumption that DMAMEM is actually preserved on CPU resets like i think it is.

defragster · Aug 2, 2024

Schaeg said:
I think you are suggesting the solution i already discussed here.

Have not read that code - was pointing to teensy4\CrashReport.cpp code as that has been studied and observed to work reliably for years tracking when Crashes are detected, and the same process would be usable to also separate COLD starts from WARM starts without any Crash.
It presents full example of reliably using 128 bytes of RAM2 to track crashes and user breadcrumb()'s
/* Crash report info stored in the top 128 bytes of OCRAM (at 0x2027FF80)

There are also a 4 DWORDS of NVRAM that persist across warm starts


#if defined(__IMXRT1062__)
uint32_t *NVRAM_UINT32 ((uint32_t *)0x400D4100);

Those are enabled in setup() with:


#if defined(__IMXRT1062__)
    SNVS_LPCR |= (1 << 24); //Enable NVRAM - documented in SDK
#endif

CollinK · Aug 5, 2024

Honestly, I think this subject is being over complicated. Conceptually, it needn't be this difficult. RAM2 is mostly uninitialized and persists across warm boots. As mentioned, the CrashReport interface uses RAM2 for storage of the breadcrumbs and other structures. These are not touched during a warm boot and so they're safe. I've experimented and found that the RAM just under where CrashReport uses is also uninitialized and safe. So, all you need to do is define a variable in that space (which is at the end of RAM2) and then check whether that variable has the magic value or not. A 32 bit magic value has only a 1 in 4 billion chance of accidentally being your magic value. That's such a low chance that you may as well call it zero. If it means that much, just use a 64 bit variable which for all practical purposes DOES have zero chance of ever accidentally or coincidentally being your magic value without you setting it. And, that's it. Initialize a magic value into a variable that is at the end of RAM2 and unless you are filling all of RAM2 you will never overwrite it and everything will be fine. Anything else just seems needlessly complicated to me. Trust Einstein, everything should be as simple as possible but no simpler.

defragster · Aug 6, 2024

CollinK said:
CrashReport interface uses RAM2 for storage of the breadcrumbs and other structures

Correct - see post #12 - parallel code using 128 bytes (one cache line) just under CrashReport that could be used in the same way. It demonstrates needed cache Flush (or the value won't be pushed out) and even a checksum that would assure less than 1^32 chance of misreading and allow storage of things like warm restart counts of other desired info in the other 30 DWORDS between a magic value and checksum.

At most the lower 64KB of the 512KB is disturbed on a powered warm restart (IIRC 64 for locked and 32KB for unlocked) - and ~12KB? USB is allocated from that low end as well IIRC

So unless the code goes overboard with dynamic allocations or other memory that region should be safe. Notes above confirmed that LittleFS RAMDRIVE testing did this extensively during 'beta' and it was reliable for data integrity there.

Schaeg · Aug 8, 2024

defragster said:
There are also a 4 DWORDS of NVRAM that persist across warm starts
#if defined(__IMXRT1062__) uint32_t *NVRAM_UINT32 ((uint32_t *)0x400D4100);
Those are enabled in setup() with:
#if defined(__IMXRT1062__) SNVS_LPCR |= (1 << 24); //Enable NVRAM - documented in SDK #endif

Can you point me to where it is documented in the SDK and what SDK do you mean?
(I tried to dig around and only found this https://github.com/nxp-mcuxpresso/m...4ac817/devices/RW610/drivers/fsl_power.c#L843 which doesn't match this info and might be for a totally different chip, searching the address on GitHub exclusively finds #define IOC_UARTRXD_UART0 0x400D4100)
Is the value of the NVRAM defined on first boot?

CollinK said:
A 32 bit magic value has only a 1 in 4 billion chance of accidentally being your magic value. That's such a low chance that you may as well call it zero. If it means that much, just use a 64 bit variable which for all practical purposes DOES have zero chance of ever accidentally or coincidentally being your magic value without you setting it. And, that's it.

I find the assumption that all values are equally likely dubious. Values read shortly after a reset might indeed be a lot closer to the initial value being stored or not. Given variation between chips and so on it might end up happening that those particular RAM cells are a lot more stable.

CollinK said:
Anything else just seems needlessly complicated to me.

Documenting the assumption that certain memory is scrambled between reboots seems needlessly complicated to me.

DMAMEM / RAM2 and detecting first boot after power on alternatively detecting warmboot

Member

Well-known member

Well-known member

Well-known member

Member

Well-known member

Member

Well-known member

Member

Attachments

Senior Member+

Member

Senior Member+

Member

Member

Well-known member

Member

Well-known member

Member

Well-known member

Member

Member

Senior Member+

Well-known member

Senior Member+

Member