Re-enable bootloader interrupts after a hard fault [Teensy 4.1]

MeBoop

Member
Hi! I'm not sure if this is the right question to ask, but I currently implement a hard fault function to do a (WIP) crash trace dump, but after the dump I'd like to re-enable USB interrupts so I can re-flash the Teensy during my unending while loop.

Another interesting, but less important question if anyone knows: my handler isn't called for every exception. Sometimes the Teensy crashes (as indicated by teensyduino no longer being able to upload/re-flash) and the handler isn't called. I'm not sure if there's another interrupt I need to handle (the documentation is a little hard to decipher) or if my handler is crashing (less likely; it succeeds on some crashes).

I'm currently overriding the HardFault_HandlerC function (when TeensyDebug isn't compiled in, anyway) and have my screen show an appropriate blue screen with useful-ish info (I'd like to eventually try and generate a trace/dump file to the SD card, would be neat!)

Thanks!
 
Very interesting thing -I'd like to see your code?
I made something similar, but dont get much useful info.

My codes just resets after a hardfault.
 
@MeBoop - indeed - would be great to see your 'crash trace dumo' code.

FrankB added a great T_4.x solution using the RAM2 area for Fault data storage. RAM2 is not zeroed on restart and if power is maintained that data area then survives for detection on the next 'restart' for display when the USB and system is healthy and not in a faulted state.

I did something using the included T_3.x Reg dump code and found some updated code for the T_4.x that is in the cores commented out - but nothing beyond that.

That capture of the faults and many would still function for USB printing - I even added a call from the fault handler to a weak userDebugDump() that could be placed in the sketch ( @FrankB - you might try this before restart? - note below ) And that could print out known sketch specific stuff as desired that might lead to Fault solution or sketch state at that time.

That was not always reliable for USB printing and the sketch could not return to normal function as seen because once in the faulted state, it won't exit properly to resume after the fault.

@FrankB: re userDebugDump()
->> On Fault: userDebugDump( void *yourRam2, bool bFaultedNow==true )
> before restarting you might call this sketch function with a pointer and a flag this is when a FAULT happened.
> The user could put a Structure or other data there for display on restart

->> On Restart after Fault call: userDebugDump( void *yourRam2, bool bFaultedNow==false )
> the user could then using that *yourRam2 extract the saved data for display
 
It's still a WIP -- I'm actively working on hammering away at it (my ARM asm knowledge is very slim, but I'm working on it!) but will definitely share once it gets in a more usable state. If I do want to use SD card to shove dump files, I'll have to make sure it's not in an inconsistent state.

I'm actually having success with USB printing during faults. I remember having issues with it awhile ago, but it seems to be working well now (I only print some basic info ATM.)

For stack traces, the current plan is to generate a .h based on the general memory layout of everything else around it -- plan B would be, if the SD card is in a consistent state, reading the ELF directly from the SD card and doing some black magics to coerce it into user readable data by diving into the symbol/dwarf sections.
The SD card may never be in an inconsistent state if the fault never falls within the SD library (I'm using SDFat for SDIO), in theory; and I think it's friendly enough to the stack to not overflow the handler, but definitely more experimentation required.

A big part of this is being able to re-image the teensy after a fault (so I don't have to reset it when the issue could be somewhere early on -- an infinite loop! not to mention that you could put the CPU in a locked state accidentally if you fault in your fault handler). Anyone have any ideas about this? I'm not really sure what to re-enable that the bootloader hooks (and given the bootloader is proprietary, it's not exactly easy to reverse engineer -- nor would I want to out of respect for Paul's excellent hard work).

The RAM2 is a good idea, but my project is very allocate-heavy (the bulk of it is a fancy GUI). Using a SPI FLASH/SRAM chip would be interesting, though; not sure if the SPI RAM would be cleared on startup (I don't think it is) -- but I've gotta order a new teensy 4/some sram chips from PJRC as I accidentally killed some of the pins needed (oops, avoid my thread history lol)

(a side note, I like to use visualmicro because I'm lazy, but its' gdbstub debugger is very buggy and doesn't seem to work, which is kinda what inspired this)
----
In the teensy 4 core, the HardFault_HandlerC is declared weak, so you can easily override it and do whatever voodoo you'd like in it -- https://github.com/PaulStoffregen/c...e4ca0458ea17655d1c4da8/teensy4/startup.c#L532
 
@MeBoop: Writing to the SD in faulted state is suspect? Lots of overhead and moving parts have to be working for that? As noted - some faults allow USB Serial.print() others do not - but returning from fault handler leaves the Teensy locked/hung IIRC.

The Bootloader is an external processor that only exerts real control during programming request, though on T_4.x it does help with startup timing AFAIK. But it only talks to the 1062 when in Program mode, and once in program mode the Teensy can only restart with or without reprogramming done.

The SPI PSRAM is NOT cleared on startup - but again it may or may not function in a faulted state.

Frank_B and I have both working in startup.c with the fault handler to some degree.

Frank_B has an outstanding pull request for his Fault >> RAM2 : Restart >> READ RAM2 to present fault info. before setup().

@Frank - I got this working here in your hardfaults.cpp code - I'll email for review and finishing touches if you desire.
Code:
Hardfault.
Return Address: 0x8FE
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x0 (nullptr)

[B][COLOR="#FF0000"]FAULT RECOVERY :: userHFDebugDump() in hardfaults.cpp ___ [/COLOR][/B]
Hardfault.
Return Address: 0x8FE
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x0 (nullptr)

[B][COLOR="#FF0000"]FAULT RECOVERY :: userHFDebugDump() in hardfaults.cpp ___ [/COLOR][/B]
 
@MeBoop: Writing to the SD in faulted state is suspect? Lots of overhead and moving parts have to be working for that? As noted - some faults allow USB Serial.print() others do not - but returning from fault handler leaves the Teensy locked/hung IIRC.

The Bootloader is an external processor that only exerts real control during programming request, though on T_4.x it does help with startup timing AFAIK. But it only talks to the 1062 when in Program mode, and once in program mode the Teensy can only restart with or without reprogramming done.

The SPI PSRAM is NOT cleared on startup - but again it may or may not function in a faulted state.

Frank_B and I have both working in startup.c with the fault handler to some degree.

Frank_B has an outstanding pull request for his Fault >> RAM2 : Restart >> READ RAM2 to present fault info. before setup().

@Frank - I got this working here in your hardfaults.cpp code - I'll email for review and finishing touches if you desire.
Code:
Hardfault.
Return Address: 0x8FE
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x0 (nullptr)

[B][COLOR="#FF0000"]FAULT RECOVERY :: userHFDebugDump() in hardfaults.cpp ___ [/COLOR][/B]
Hardfault.
Return Address: 0x8FE
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x0 (nullptr)

[B][COLOR="#FF0000"]FAULT RECOVERY :: userHFDebugDump() in hardfaults.cpp ___ [/COLOR][/B]

There are a lot of moving parts which is why I agree it could be suspect. However: I've had quite a bit of success using SPI with several of the fault handlers (and even USB serial, even though I'm still unable to engage the USB program mode -- weird!) -- my display I .. display .. the error data on works over SPI (RA8875)
Unless the fault is related to the hardware SPI (in which case, a potential idea is to use software SPI for this? hmmmm) I don't think there will be any issues using it, but it just needs a lot of testing to confirm.

I never intend to return from the fault handler. The reason returning from the fault handler leaves it hung is because the PC never changes, so it goes right back to whatever caused the fault, causing it to fault again.
You could technically skip that instruction, but... that's probably way more dangerous than it's worth.

Dedicating a block of memory to holding fault information is a great idea as a more "provable" fault handling but I have some concerns that there will be data omitted, such as a stack trace.
I wonder if the ETM (embedded trace macrocell) is usable with the Teensy, as it is available on the MCU. This is only about 32 bytes or so, so it'd be relatively cheap to store.

My issue w/r/t the bootloader, is that when I fault, it no longer responds to programming requests. I'm not really sure why but I could just be blind to something obvious.
 
The Bootloader responds to the button by a hardware signal on the board.

The USB Wire signal requires a functional USB Stack. Having output is one thing that somehow often works.

But the USB jump to Bootloader requires CODE in the USB stack running to detect the PC request to ( via a baud rate change ) and then have the Teensy tell the Bootloader chip to take over.


I've modified the FrankB HardFaults.cpp code to write on FAULT and READ on restart from both DMAMEM ( RAM2 ) and a PSRAM.
This latest version running puts a string into PSRAM on the FAULT - then the Teensy RESETS, detects it was restarted because of a Fault, Enters the indicated Code and prints out the saved string.
Code:
[B]Hardfault.
Return Address: 0x942
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x0 (nullptr)

FAULT RECOVERY :: userHFDebugDump() in hardfaults.cpp ___ 
You Are Here - [B]via PSRAM EXTMEM![/B] :( [/B]

Frank - going to email this .cpp to you now for your consideration ...
 
I have other priorities ATM but if I get some time this weekend I'll see about hammering away at generating dump files. A shame about the USB issue, though. Would be a massive help when rapid fire debugging (but I ended up just moving my teensy closer so pushing the button isn't that bad haha)
 
FrankB's code isn't in a PJRC release yet - just a Pull Request on github.

I made some edits for his review that allows call to user sketch on a FAULT to save data to DMAMEM or EXTMEM for view after restart. This ahppens before entry to setup() { but after waiting for USB to be online } - at this point that data could be used to alter how the program runs this next time based on the data saved.
 
Good luck with whatever is working to trace back.

If you have code that can walk back and uncover the stack in any way please post.

The code edit just forwarded to FrankB could capture that onto PSRAM or RAM2 perhaps - then use your idea to then when the system is stable on the next restart open files on the SD card to parse the ELF or other. Right now as you know the Fault type and address and register values are captured and that is all.
 
You have the stack pointer and you can use the compiler option -funwind-tables -- this was the working plan. More than that requires a disassembler and dwarf info on elf.
 
Okay, thought maybe you had the math and logic to find and walk back the call stack.

Frank has the PR to PJRC CORES for his code and I put a PR on hit PowerButton with an updated copy of the hardfaults.cpp that does the work with edit to call a user func as shown that works, but not heard back if he thinks it is a good idea.

It would work for your case to push data to PSRAM for instance to recover/display or act on with restart. That would allow capturing variable values/state when the fault happened in case it explains why it went wrong.

Paul may be ready to take the PR or have his own idea that he hasn't presented yet. But having general interest in needing it and being ready to test it might help carry the issue.
 
Okay, thought maybe you had the math and logic to find and walk back the call stack.

Frank has the PR to PJRC CORES for his code and I put a PR on hit PowerButton with an updated copy of the hardfaults.cpp that does the work with edit to call a user func as shown that works, but not heard back if he thinks it is a good idea.

It would work for your case to push data to PSRAM for instance to recover/display or act on with restart. That would allow capturing variable values/state when the fault happened in case it explains why it went wrong.

Paul may be ready to take the PR or have his own idea that he hasn't presented yet. But having general interest in needing it and being ready to test it might help carry the issue.

Without DWARF/unwind tables, you're going to have a very hard time creating a backtrace, I'm afraid. You could back up the entirety of the stack and send that over on next reboot for analysis on the PC, but other than that, it's mostly wishful thinking

going to keep investigating though, may can find something useful
 
@Defragster, wasn't that thing I did with a DIE() macro?
@MeeBoob - Yes, I played with unwind-tables, too, not very successful.
I hope you have more luck!
 
@Defragster, wasn't that thing I did with a DIE() macro?
@MeeBoob - Yes, I played with unwind-tables, too, not very successful.
I hope you have more luck!

dir() is the other way around. Running along and see a problem and cause a reset.

The userHFDebugDump() is:
> Fault happens - as it does log return address
> Call empty weak userHFDebugDump() : unless USER code has one in the sketch
->> userHFDebugDump(, true) can do nothing or push any desired sketch state variables or info into PSRAM/DMAMEM
> Teensy reset()

> Teensy restarts
> Fault info is displayed - as it does
> Call empty weak userHFDebugDump() : unless USER code has one in the sketch
->> userHFDebugDump(, false) can do nothing or display any sketch state variables or info from PSRAM/DMAMEM
> Teensy setup()

Just like trapping Faults - it doesn't happen very often. But some sketches run for some hours or days and then just die. If the death is by Fault this would give a chance to track anything the sketch can record from global variables or other system state.

Question: Do watchdogs exit through a Fault mechanism - or just reset?
 
with the watchdog you can decide what to do in it's own callback, i think only 1 of the 4 watchdogs doesn't reset unless you tell it to
 
Last edited:
with the watchdog you can decide what to do in it's own callback, i think only 1 of the 4 watchdogs doesn't reset unless you tell it to

Thanks TonTon81 - I did go look at your code and saw that watchdogs have their own exit path not through Fault. Of course by default no watchdogs are enabled.

@FrankB and @MeBoop : bummer their isn't a clear way to unwind for stack trace. Not having an easy way showed in the searches I found
 
@Tim, can you make a normal pullrequest to the T4_Powerbutton?

Probably not :)

It seemed a new folder was needed - not sure if I can do that from the web?

If I had write permission - or you created a new git for that?

Will go try ...
 
I made a forked copy and put that folder and file on this system.

Then made that PR - not sure how it isn't normal.

When I go to the www.git clicking Pull Request want's to know from where. If the file existed I could edit and offer that - but since it is a new folder and file ???
 
Sorry this has taken me so very long. I'm trying to bring together Frank's approach of storing info in memory to be printed on a warm reboot, and the Teensy 3.x approach of keeping USB responsive so anything you printed actually shows up in the serial monitor and you can click Upload in Arduino and not need to press the button on Teensy.

Committed code on github today.

https://github.com/PaulStoffregen/cores/commit/d8ca453f1208d62264921a25712916a1fd8eb464


I'm also debating whether to have Teensy default to automatically rebooting some "safe" period of time after a fault. I can imagine reasons why this should and should not be done. Really looking for some feedback on that question.


Here's roughly how to use it...

Code:
#include <Wire.h>
#include <CrashReport.h>

void setup() {
  Serial.begin(9600);
  Serial1.begin(9600);
  Wire.begin();
  Serial.print(CrashReport);
}

void loop() {
  static int count=0;
  Serial.print("test ");
  Serial.println(count);
  if (count == 4) *(uint32_t *)0 = 0;  // crash here...
  delay(10);
  Serial1.print("test ");
  Serial1.println(count);
  count = count + 1;
  delay(250);
}
 
Sorry this has taken me so very long. I'm trying to bring together Frank's approach of storing info in memory to be printed on a warm reboot, and the Teensy 3.x approach of keeping USB responsive so anything you printed actually shows up in the serial monitor and you can click Upload in Arduino and not need to press the button on Teensy.

Committed code on github today.

https://github.com/PaulStoffregen/cores/commit/d8ca453f1208d62264921a25712916a1fd8eb464


I'm also debating whether to have Teensy default to automatically rebooting some "safe" period of time after a fault. I can imagine reasons why this should and should not be done. Really looking for some feedback on that question.


Here's roughly how to use it...

Code:
#include <Wire.h>
#include <CrashReport.h>

void setup() {
  Serial.begin(9600);
  Serial1.begin(9600);
  Wire.begin();
  Serial.print(CrashReport);
}

void loop() {
  static int count=0;
  Serial.print("test ");
  Serial.println(count);
  if (count == 4) *(uint32_t *)0 = 0;  // crash here...
  delay(10);
  Serial1.print("test ");
  Serial1.println(count);
  count = count + 1;
  delay(250);
}

Thanks for the explanation, I'm learning a lot in this topic.
 
Back
Top