Re-enable bootloader interrupts after a hard fault [Teensy 4.1]

Frank B had a long possible wait after instant restart - only to make sure USB was back online before printing.l.

I think it was to give the user the chance to read the fault messages before the sketch prints thousands of lines and scrolls the message away, or crashes again.


For temperature, (was it planned? or was the code added already? - don't remember) i thought it was good to print "Temperature alarm" after reboot and to just switch off.
 
I think it was to give the user the chance to read the fault messages before the sketch prints thousands of lines and scrolls the message away, or crashes again.


For temperature, (was it planned? or was the code added already? - don't remember) i thought it was good to print "Temperature alarm" after reboot and to just switch off.

A wait after startup is good for reading - but the user can control that now when a fault happens.

I was thinking of the up to 10 second wait that was just to allow Serial to come online. Unless it is a temp thing I don't see a reason to not just record the info and restart to let the user know what is going on instead of that pregnant pause of panic time. Especially if it it going to be always enabled - but not always included in setup()?

When it is a temp thing - indeed better to at least throttle down for sure to prevent melting the insides.

The 1062 lowest voltage is 0.95 volts when F_CPU_ACTUAL<=24000000. Just did a quick edit to Fault Sketch to run at 24Mhz. The CrashReport still fully functional and then temp rather than 57 at 600 MHz - ended up at 44 pretty quickly then 42.3. Repeating that test at 198 MHz it only cools to 49 then 47. So that is 3+ degrees C more 'headroom' for cooling if the fault was for Over Temp. Ambient temp is relatively warm at 25 C - and the T_4.1 suspended unpinned in open air just running the Fault test sketch.

If waiting for MCU to cool - better at 24 MHz than 198. And if not waiting to cool - half a second should clear the Serial data and not make the user wait to know the problem.
 
That's why the question about extending this to T_3.x's (not T_LC) - only static area is the 7 DWORDS of NVRAM. Maybe Paul is planning on that - or has other insight.

At least for the upcoming 1.54 release, I'm planning on CrashReport being a Teensy 4.x feature.

Maybe in the future it'll get implemented on Teensy 3.x. Maybe. But to be realistic, you could read "maybe" as "probably not". Or perhaps a minimal version might be made someday, so programs with Serial.print(CrashReport) at least compile and give something. But the reality to keep in mind is Teensy 3.2 is the lion's share of all Teensy 3.x boards. We don't have a MPU on Teensy 3.2. The MPU we do have on Teensy 3.5 & 3.6 isn't the highly configurable ARM one we get with Teensy 4.x. CrashReport probably can't ever be as useful on Teensy 3.x boards.
 
And if not waiting to cool - half a second should clear the Serial data and not make the user wait to know the problem.

I'm a little concerned about an infinite rebooting going too quickly. Many PCs take over 1 second just to complete USB enumeration. Windows 7 takes about 5 seconds with some combinations of HID and other interfaces.

Obviously we can't protect or even anticipate all crash scenarios. But we can think about likely ones. I especially want a NULL pointer deref in C++ constructors or early in setup() to result in USB enumeration completing and a least a few seconds for user-level programs on the PC to be able to notice before the hardware disconnects USB and reboots. That was the reasoning behind 8 seconds.
 
I had a long wait. I think it was 10 secs or more before running the user code. It needs to be long because the user needs to be informed about the crash. As said, if something prints rapidly, or crashes soon in setup without the wait all the printed info is pretty useless. The user has no chance to read it and sees a reboot (if ever) only.

Let me repeat my very old and often mentioned wish for a dedicated debug led here..(for future models). Sorry :)
 
Have you considered a user defined "panik" message? Like DIE("Unrecoverable state. Stopping.")?

I did consider this. I saw you proposed it earlier. I decided not to include it, at least not in this first version (to soon become part of a stable 1.54 release).

As I recall, your proposal was storing the pointer to the string in the persistent memory, then using it to print the message after rebooting. That works in a design where the fault handler doesn't allow the USB to remain active for Arduino to upload new code. But if an upload is started within the 8 seconds before automatically rebooting, everything in flash memory may have changed.

Any pointer kept in persistent memory is unsafe, or at least unreliable. Any proposal needs to avoid pointers in the persistent storage.

We could probably implement this by using 32 bit constants rather than pointers to strings, where the constant is written to SRC_GPR5 and then you write to SCB_AIRCR to cause a reboot. Then we'd need a mechanism to extend CrashReport to show the messages corresponding to each constant that might be stored in SRC_GPR5. Or maybe non-pointer fields could be added to the persistent arm_fault_info_struct. Or perhaps 32 or 64 or 80 bytes in the persistent memory could hold a copy of the string. Lots of ways it might be done....

I see this as the sort of thing which could go into 1.55. But for the upcoming 1.54 release, I want to keep CrashReport relatively simple.
 
So far everything I have seen so far is looking good. It should come in very handy for localizing down faults in our code. Probably a great time to ship it and get out 1.54 to work on newer releases of Arduino.

I need to re-browse through this thread, to see how some of this might work in some other situations, have already been discussed. Like maybe I have my T4.x plugged into an RPI with USB type of Mouse, or Keyboard...
How could I capture the data? Probably can not open up a terminal window fast enough...

Wondering if the Teensy has some FS available (SD, Flash, ..._ it would make sense if the CrashReport code had some method that you could set which FS to try to save t. Obviously more code and risk involved. The data saved might be raw crash data or could be textual stuff... Then maybe later the user could grab the data and analyze it... Or print it using something like dmesg...
But again probably way beyond what one might do for this release.
 
Fault info is always stored only to a tiny dedicated area in RAM, currently the last 128 bytes of DMAMEM, which is address 0x2027FF80.

The design is to minimize the fault handler code. By the time it runs, "bad things" may have happened. Complex software like filesystems and storage drivers may or may not be usable. Especially for anything that writes to non-volatile media, using software with unreliable state could make a bad situation even worse.

That's why we automatically reboot after 8 seconds.

Then when your board restarts, you can use Serial.print(CrashReport) to get a report of whatever is known about how things went wrong on the previous run.
 
Fault info is always stored only to a tiny dedicated area in RAM, currently the last 128 bytes of DMAMEM, which is address 0x2027FF80.

The design is to minimize the fault handler code. By the time it runs, "bad things" may have happened. Complex software like filesystems and storage drivers may or may not be usable. Especially for anything that writes to non-volatile media, using software with unreliable state could make a bad situation even worse.

That's why we automatically reboot after 8 seconds.

Then when your board restarts, you can use Serial.print(CrashReport) to get a report of whatever is known about how things went wrong on the previous run.
Thanks Paul, that is all that I assumed.

But was thinking out laud, that if I know I am not typically going to have USB Serial hooked up to some form of serial monitor, when a crash happens.
If it would make sense to have a way in sketch to say output to some place else, that I can later do a post mortum on...
 
Thanks Paul, that is all that I assumed.

But was thinking out laud, that if I know I am not typically going to have USB Serial hooked up to some form of serial monitor, when a crash happens.
If it would make sense to have a way in sketch to say output to some place else, that I can later do a post mortum on...

As noted - after a Fault - the degree of system utility is compromised at best.

As long as power stays on the DMAMEM will hold the CrashReport data. The CrashReport can stream anywhere on restart.

Is there a 44-50 byte hole after EEPROM storage area? Assuming write to that works? Maybe allow USER to point to EEPROM where they want it stored.
> though that would be a non trivial read/transfer/clear from EEPROM after Crash when offline, before power out and read on request and manage.
 
I'm a little concerned about an infinite rebooting going too quickly. Many PCs take over 1 second just to complete USB enumeration. Windows 7 takes about 5 seconds with some combinations of HID and other interfaces.

Obviously we can't protect or even anticipate all crash scenarios. But we can think about likely ones. I especially want a NULL pointer deref in C++ constructors or early in setup() to result in USB enumeration completing and a least a few seconds for user-level programs on the PC to be able to notice before the hardware disconnects USB and reboots. That was the reasoning behind 8 seconds.

No perfect answer. Given a new/unknown feature and only valid with SerMon online - or Serial# ready to receive. And given how rare Faults are getting it 'known' won't likely happen fast or even be noticed/understood.

Currently a fault/Crash just takes the Teensy stopped into 'wait for usb'.
Assuming the CrashReport scheme will be 'always on' with TD1.54?
So on Crash it will sit 8 seconds ( longer if over temp until cooled? As noted it will cool faster at 24 MHz when a Temp Fault and still work to 'wait for USB' ).
> maybe if temp not 'safe' after some time (30 secs?) it should just shutdown?

Granted, this seems this would be an effective emulation/replacement of current behavior where the user would now see '8 sec reboot' instead, and as noted the Forum could hopefully catch and inform those users about the CrashReport feature.

If it is going to stall 8 secs - and not temp related - it seems equivalent to have the long delay (8 seconds) just before setup() entry while Serial if available comes online to respond to CrashReport ASAP - and then even hardcode the rest of the 'long' wait into return from CrashReport.

Though of course without the call to CrashReport - then there would be 'fast' repeated reboot. Unless there was a 'weak' call the user could add to sketch to replace the default wait for reaction of their choosing.
> That "weak void userCrashReport() { // on prior Crash wait 8 secs }" then enter setup()
> Sketch "void userCrashReport()" - could wait for Serial - or other - and then stream the output to a place of their choosing - that might solve KurtE's issue where it could go to elsewhere.
Not a simple as "if ( CrashReport ) {" - but that would still work, and pasting a template copy of userCrashReport() could give more control?
 
As noted - after a Fault - the degree of system utility is compromised at best.

As long as power stays on the DMAMEM will hold the CrashReport data. The CrashReport can stream anywhere on restart.

Is there a 44-50 byte hole after EEPROM storage area? Assuming write to that works? Maybe allow USER to point to EEPROM where they want it stored.
> though that would be a non trivial read/transfer/clear from EEPROM after Crash when offline, before power out and read on request and manage.

How about moving the Error report into a protected (from power loss) area when the Teensy re-boots and everything works ok. That way if not logged on via usb it can still be examined later.
 
How about moving the Error report into a protected (from power loss) area when the Teensy re-boots and everything works ok. That way if not logged on via usb it can still be examined later.

That's the issue - the closest thing is the DMAMEM/RAM2 maintained while powered and not reset on MCU startup.

Only other common and simple place to write would be EEPROM - and assume that may not always work either. T_4.0 has no QSPI ports or SD - and using those is suspect as well even on T_4.1 ... iff they are even populated with FLASH or SD card.
 
If it would make sense to have a way in sketch to say output to some place else, that I can later do a post mortum on...

Sure, you can do this. CrashReport is printable from any Arduino Print/Stream derived class.

So you could put something like this in setup():

Code:
void setup() {
  if (SD.begin(10)) {
    File dataFile = SD.open("crashlog.txt", FILE_WRITE); // FILE_WRITE append if file already exists
    if (dataFile) {
      dataFile.print(CrashReport);
      dataFile.close();
    }
  }
 
That's the issue - the closest thing is the DMAMEM/RAM2 maintained while powered and not reset on MCU startup.

Only other common and simple place to write would be EEPROM - and assume that may not always work either. T_4.0 has no QSPI ports or SD - and using those is suspect as well even on T_4.1 ... iff they are even populated with FLASH or SD card.

That's what I am getting at. Send it to DMAMEM/RAM2 when crash occurs then on Auto re-boot, when everything is working, send it to EEPROM (or wherever). That way if the Teensy is working away from usb, remote or whatever, the error report is still available. Care will have to be taken that the system does not repeat the hard error loop again and again and.....
The user should have some way of detecting that an error has occurred and be able to program ably power down.
 
That's what I am getting at. Send it to DMAMEM/RAM2 when crash occurs then on Auto re-boot, when everything is working, send it to EEPROM (or wherever). That way if the Teensy is working away from usb, remote or whatever, the error report is still available. Care will have to be taken that the system does not repeat the hard error loop again and again and.....
The user should have some way of detecting that an error has occurred and be able to program ably power down.

Indeed - that is what that post #86 was about ... would have to look into it. And it involves complexity as the read a decipher code would need to be extended, or hacked if not done internally.
 
BTW: I added this case to the "it's your fault" code - to allow crash safe monitoring of Temp.
Code:
    else if ( cc == 't' ) {
      Serial.printf( "\tdeg  C=%f\t F_CPU=%u\n" , tempmonGetTemp(), F_CPU_ACTUAL );
      while ( Serial.available()) Serial.read();
    }

Also @mjs513: if the Crash register values are still set on calling : unused_interrupt_vector() - it should parse that out and record a Fault.
Just here for a sec - and not seeing code there ... that's speculation ...
 
Good point will give it a try.

PS will the t case to my copy

Hope it is that simple - if flags not set may take some test inside the unused _isr to call it out.

IIRC correctly Frank B's code set some vector to his Crash code - but that may have been the for the div_by_zero.

Left the T_4.1 here running at 24 MHz showing a cool 43C - touched it and went up quickly to : deg C=43.666668
> for panic testing could add into loop() a tempmonGetTemp() check and print on each ~0.5 degree rise over MAX then print new MAX so the test heat added wouldn't go overboard. Maybe just enable with 't' flag to keep the spew down - though that should be minimal.

I'd do more - but being here is just ignoring what I need to be getting done ...
 
Hope it is that simple - if flags not set may take some test inside the unused _isr to call it out.

IIRC correctly Frank B's code set some vector to his Crash code - but that may have been the for the div_by_zero.

Left the T_4.1 here running at 24 MHz showing a cool 43C - touched it and went up quickly to : deg C=43.666668
> for panic testing could add into loop() a tempmonGetTemp() check and print on each ~0.5 degree rise over MAX then print new MAX so the test heat added wouldn't go overboard. Maybe just enable with 't' flag to keep the spew down - though that should be minimal.

I'd do more - but being here is just ignoring what I need to be getting done ...

Yep was that simple. :) now back to the logic
 
@PaulStoffregen - @defragster and all

Finally broke the code after having a full meal and stop staring at the screen and basically have it shut down after the crashreport is printed. This is for now and until after discussion.

Even if we wait for the chip to cool down below a certain threshold and then reboot as we previously discussed we really are resolving the cause of the panic temp as it could be caused just by the overclocking or other issues with the wiring - not sure we really shouldn't shut down.

Code:
D:\Users\Merli\Documents\Arduino\CrashRepot_example\CrashRepot_example.ino Jun 22 2021 21:08:50
 millis() now 711
CrashReport ... Hello World
  Fault occurred at: 21:10:06
  Temperature at time of fault: 55.7 degC
  length: 11
  IPSR: 50
  MMFAR: 0
  BFAR: 0
  return address: 2A6
  XPSR: 21010000
  crc: 9A0D3C8
Reboot was caused by temperature sensor
Note chip is shut down at this point - didn't change the Reboot to shut down waiting on further direction.
 
@PaulStoffregen - @defragster and all

Finally broke the code after having a full meal and stop staring at the screen and basically have it shut down after the crashreport is printed. This is for now and until after discussion.

Even if we wait for the chip to cool down below a certain threshold and then reboot as we previously discussed we really are resolving the cause of the panic temp as it could be caused just by the overclocking or other issues with the wiring - not sure we really shouldn't shut down.

...
Note chip is shut down at this point - didn't change the Reboot to shut down waiting on further direction.

Shutdown seems right.

Only comments:
> IDE SerMon sometimes fails to connect on Teensy restart
> TyCommander dumps the device display after some short number of minutes when the device goes 'missing'

So restarting right away and displaying the text and having the Crash cleared may leave an unattended device with no record of the event if SerMon fails to connect or isn't discovered before the missing goes away.

Again no perfect answer? If the device is unpowered/restarted the Crash info is lost ... if the user returned puzzled and does some restart process involving power down/up.

Also if device restarted and no Serial.print(CrashReport); is issued - how does the Teensy know to print and shutdown and not just restart running back into the same issue? ... see last lines of post #87 ...
 
Back
Top