Forum Rule: Always post complete source code & details to reproduce any issue!
Page 3 of 8 FirstFirst 1 2 3 4 5 ... LastLast
Results 51 to 75 of 199

Thread: Re-enable bootloader interrupts after a hard fault [Teensy 4.1]

  1. #51
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,470
    Quote Originally Posted by PaulStoffregen View Post
    I'm going to take a break, until at least late tonight. Now's the perfect time to pull the latest from github and jump in if you'd like to make changes.
    Features still on my wish list...

    1: Much more user friendly message printed for the common cases like memory access violation. For example, rather than merely "Data Access Violation", kindly but firmly explain to the user their program almost certainly has a bug which accessed an address which isn't supposed to be used by any program. Ideally we'd look at the target address and print different messages to help people understand what's wrong.

    2: Show friendly message first, followed by cryptic register info

    3: Add temperature to arm_fault_info_struct and have CrashReport show the chip's temperature at the time of the problem. Maybe we should also log the RTC and show time/date?

    4: Test over-temperature panic. Is 8 seconds long enough? Maybe the fault handler should poll temperature and keep waiting until it's safe? (assuming the same code will again reheat the chip up to a temperature panic after rebooting)

  2. #52
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    8,612
    Quote Originally Posted by defragster View Post

    Frank had to enable and handle the divide by zero case? So #4 is ignored.
    Yup, be default "div by zero exceptions" are disabled. And it is good as it is to maintain Arduino compatibility.
    But it would nice to a simple call that enables them instead of having to set the flag manually.

    enableDivByZeroException(bool) or something like that...

  3. #53
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    8,612
    Quote Originally Posted by PaulStoffregen View Post
    Maybe we should also log the RTC and show time/date?
    Good Idea.
    Quote Originally Posted by PaulStoffregen View Post
    assuming the same code will again reheat the chip up to a temperature panic after rebooting
    Perhaps with a additional test for enabled overclocking .

  4. #54
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    9,215
    Playing a little bit... Sort of wondering if for example my testing for a different PS4 controller is crashing or not...

    So thought I would add it and see if I go boom...

    Question is best way to build a sketch, that may be built with this core or maybe a previous one without it...
    So Should I:
    Code:
    #if __has_include (<CrashReport.h>)
    #  include <CrashReport.h>
    #endif
    Now in the past, I would then in the code, that was conditional on if something was included, like lets say the ILI9341_t3 library
    I might have in my code:
    Code:
    #ifdef _ILI9341_t3H_
    ...
    #endif
    But this does not have any defines in it... instead uses #pragma once

    So question is in the setup code, do I again try to test this by the __has_include?
    Sorry I know this is a pretty generic question

  5. #55
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by PaulStoffregen View Post
    Features still on my wish list...

    1: Much more user friendly message printed for the common cases like memory access violation. For example, rather than merely "Data Access Violation", kindly but firmly explain to the user their program almost certainly has a bug which accessed an address which isn't supposed to be used by any program. Ideally we'd look at the target address and print different messages to help people understand what's wrong.

    2: Show friendly message first, followed by cryptic register info

    3: Add temperature to arm_fault_info_struct and have CrashReport show the chip's temperature at the time of the problem. Maybe we should also log the RTC and show time/date?

    4: Test over-temperature panic. Is 8 seconds long enough? Maybe the fault handler should poll temperature and keep waiting until it's safe? (assuming the same code will again reheat the chip up to a temperature panic after rebooting)
    Wish list features added to my fork (https://github.com/mjs513/cores) for testing:
    Added current temperature
    Added RTC time

    Temperature: panic ISR is already set so that should trip the reboot due to temp. Have to double check what register gets updated. As for 8 second start and temp - if you have a panic temp interrupt then probably should wait until it cools off to say high temp? Put a kludge in for now for testing purposes.

    @PaulStoffregen - having a problem with isvalid(info). Its always returning 0 (false). Don't see the issue but working other stuff first.

    @Frank B - will figure out how to get the clock frequency.

  6. #56
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    All
    I just added some code for panic temp resets. Need a sanity check.

    In crashreport.cpp:
    Code:
      if (SRSR & SRC_SRSR_TEMPSENSE_RST_B) {
        p.println("Reboot was caused by temperature sensor");
    	if(CCM_ANALOG_MISC1_IRQ_TEMPPANIC == 1) { p.println("Panic Temp Exceeded");
      }
    a temp reset only should occur at Panic temps.

    In startup.c
    Code:
    	// reboot
    	if(CCM_ANALOG_MISC1_IRQ_TEMPPANIC == 1) {
    		while(tempmonGetTemp() > 80)  { delay(100); }  // 5degs below High temp alarm.
    	}
    	SRC_GPR5 = 0x0BAD00F1;
    	SCB_AIRCR = 0x05FA0004;
    	while (1) ;
    If the panic temp ISR is trip I wait for the temp to get down below the High temp but should we also throttle back the clock?

  7. #57
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Simpler .clear() is good - as noted - just needed a way to have the sketch know fault - like Kurt said.

  8. #58
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Here is the test sketch updated to current PJRC/CORES - setup() tests and prints and clears fault.

    Have other task to complete today ...

    With options to trigger multiple faults:
    Code:
    // Found a couple of Faults to force here :: https://interrupt.memfault.com/blog/cortex-m-fault-debug
    int illegal_instruction_execution(void) {
      int (*bad_instruction)(void) = (void *)0xE0000000;
      return bad_instruction();
    }
    
    void setup() {
      pinMode( LED_BUILTIN, OUTPUT );
      digitalWrite( LED_BUILTIN, 1 );
      Serial.begin(115200);
      while (!Serial && millis() < 4000 );
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      Serial.printf(" millis() now %u\n", millis() );
      if ( CrashReport ) {
        Serial.print(CrashReport);
        CrashReport.clear();
        Serial.print("\n\tRECOVERED from Crash !");
      }
      delay ( 100 );
      digitalWrite( LED_BUILTIN, 0 );
      delay ( 1000 );
      digitalWrite( LED_BUILTIN, 1 );
    }
    
    void loop() {
      if ( Serial.available()) {
        char cc = Serial.read();
        digitalWrite( LED_BUILTIN, 0 );
        Serial.println("\nIt's your fault!\n");
        delay ( 50 );
        if ( cc == '0' ) {
          illegal_instruction_execution();
        }
        else if ( cc == '1' ) { // Bad Address Read
          uint32_t ff = *(volatile uint32_t *)0xbadcafe;
          Serial.print(ff);
        }
        else if ( cc == '2' ) { // Coprocessor Fault - turn off FPU
          __asm volatile(
            "ldr r0, =0xE000ED88 \n"
            "mov r1, #0 \n"
            "str r1, [r0] \n"
            "dsb \n"
            "vmov r0, s0 \n"
          );
        }
        else if ( cc == '3' ) { // bad_addr_double_word_write
          volatile uint64_t *buf = (volatile uint64_t *)0x30000000;
          *buf = 0x1122334455667788;
        }
        else if ( cc == '4' ) { // Divide by Zero (not enabled?)
          cc = cc - '4';
          cc /= cc;
        }
        else if ( cc == '5' ) {
        }
        else if ( cc == '6' ) {
        }
        else { // default fault on Serial with no other fault ... from 'Enter'
          uint32_t *y = 0; y[0] = 5;
        }
      }
      delay ( 3 );
      digitalWrite( LED_BUILTIN, 0 );
      delay ( 40 );
      digitalWrite( LED_BUILTIN, 1 );
    }

    Code:
    (IACCVIOL) Instruction Access Violation
    
    (NOCP) No Coprocessor
    
    	(DACCVIOL) Data Access Violation
    	(MMARVALID) Accessed Address: 0x0 (nullptr)

  9. #59
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,470
    I'm wondering if we should have printing the crash report also automatically clear?

    That would block some cases like printing the same info to 2 different devices. But it would also simplify the common case. Then when people have mysterious problems in a post-1.54 future, we can just tell them to add 1 line to their program.

  10. #60
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by PaulStoffregen View Post
    I'm wondering if we should have printing the crash report also automatically clear?

    That would block some cases like printing the same info to 2 different devices. But it would also simplify the common case. Then when people have mysterious problems in a post-1.54 future, we can just tell them to add 1 line to their program.
    Just did a PR (https://github.com/PaulStoffregen/cores/pull/571) for the changes I made. Don't think the fault handler is going to capture the temp fault on the first go around until it cools off since it has auto restart on panic temp hit - may test on high temp? Have to reread the RM again. Been awhile. As I mentioned in the PR isvalid is always returning a 0

  11. #61
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Quote Originally Posted by PaulStoffregen View Post
    I'm wondering if we should have printing the crash report also automatically clear?

    That would block some cases like printing the same info to 2 different devices. But it would also simplify the common case. Then when people have mysterious problems in a post-1.54 future, we can just tell them to add 1 line to their program.
    That is how I actually did it at first last night.

    But not seeing a way to let sketch know of fault, that seemed wrong.

    With the added BOOL test - having the :
    Code:
      if ( CrashReport ) {
        Serial.print(CrashReport);
    Automatically .clear() at the same time makes sense.

    It would indeed preclude sending to Serial and Serial1.
    > It is a rabbit hole - but the Crash code could return a copy of the arm_fault_info_struct and then a function could parse that ::
    Code:
      struct arm_fault_info_struct myCopy_afis;
      myCopy_afis = CrashReport; //   if ( (myCopy_afis = CrashReport) ) {
      if ( myCopy_afis ) {
        Serial.print( CrashReport(myCopy_afis) );
        Serial4.print( CrashReport(myCopy_afis) );
      }

  12. #62
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    About time date stamp that could be fun - though it may just be 9 seconds before the setup print is shown on restart.

    What might be useful for some debugging purposes is recording millis (micros) at time of the crash. i.e. the crash occurs every time after 14.4 minutes of runtime.

    The 1062 has its own tick count that is low res and not used for millis IIRC - wondering if that still has a valid uptime count. Millis() gets skewed when interrupts off - really bad on LittleFS with one of the media during formatting. Wonder if that would be good to present - if available?

    Not sure how this will be adapted to T_3.x? It doesn't have the support for Frank B scheme using static nature of RAM2. The T_3.x family only has 7 DWORDS of NVRAM (excluding the RTC valid time DWORD). IIRC the T_3.x family doesn't have so many words for error checking - and if they were pre-parsed at time of fault smaller amount of space would hold the needed info to reflect the fault that occurred.

  13. #63
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,470
    Quote Originally Posted by mjs513 View Post
    As I mentioned in the PR isvalid is always returning a 0
    Fixed.
    https://github.com/PaulStoffregen/co...d45b04c1d5ca15

  14. #64
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Post #58 code works with latest pulled CORES to show this.
    Fault is cleared after display on a clean reset/restart.
    Is it really running at that temp - haven't been watching that lately:
    Code:
    T:\tCode\FAULT_HANDLER\CrashReport\CrashReport.ino Jun 21 2021 13:51:58
     millis() now 1078
    
    It's your fault!
    
    
    T:\tCode\FAULT_HANDLER\CrashReport\CrashReport.ino Jun 21 2021 13:51:58
     millis() now 1112
    CrashReport ... Hello World
      Fault occurred at: 13:00:50
      Temperature at time of fault: 57.0 degC
      length: 11
      IPSR: 3
      CFSR: 1
    	(IACCVIOL) Instruction Access Violation
      HTSR: 40000000
    	(FORCED) Forced Hard Fault
      MMFAR: 0
      BFAR: 0
      return address: E0000000
      XPSR: 60010000
      crc: A6921706
    Reboot was caused by 8 second auto-reboot after fault or bad interrupt detected
    
    	RECOVERED from Crash !
    T:\tCode\FAULT_HANDLER\CrashReport\CrashReport.ino Jun 21 2021 13:51:58
     millis() now 1104
    
    It's your fault!
    
    
    T:\tCode\FAULT_HANDLER\CrashReport\CrashReport.ino Jun 21 2021 13:51:58
     millis() now 1149
    CrashReport ... Hello World
      Fault occurred at: 13:01:10
      Temperature at time of fault: 56.3 degC
      length: 11
      IPSR: 3
      CFSR: 82
    	(DACCVIOL) Data Access Violation
    	(MMARVALID) Accessed Address: 0x0 (nullptr)
      HTSR: 40000000
    	(FORCED) Forced Hard Fault
      MMFAR: 0
      BFAR: 0
      return address: E2
      XPSR: 61010000
      crc: 3D25AC22
    Reboot was caused by 8 second auto-reboot after fault or bad interrupt detected
    
    	RECOVERED from Crash !
    T:\tCode\FAULT_HANDLER\CrashReport\CrashReport.ino Jun 21 2021 13:51:58
     millis() now 1134
    BOLD is the Crash report text versus Sketch output non-bold on startup
    Last edited by defragster; 06-21-2021 at 09:03 PM. Reason: update two unique faults

  15. #65
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,470
    Quote Originally Posted by defragster View Post
    Automatically .clear() at the same time makes sense.
    I've committed auto-clear after printing CrashReport.

    https://github.com/PaulStoffregen/co...77e529f2f03d29


    Please pull the latest from github, then feel free to play. Unless any other issues needing my attention come up, I'm going to focus on a couple other things and probably return to this tomorrow or Wednesday.

    Don't be shy to rewrite how the various errors are explained. After 1.54 releases, when people have strange issues one of the first things we'll probably advise them to try is add Serial.print(CrashReport); to their code. Hopefully we can ship 1.54 with well written messages that help people understand why their programs crash.

  16. #66
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    github just updated for AutoClear on Print.

    @Paul: Perhaps add an optional param to : Serial.print(CrashReport);
    > If passed with NO_CLEAR value it could allow Fault data to persist for second output?

    One line change to prior code is:
    Code:
    // Found a couple of Faults to force here :: https://interrupt.memfault.com/blog/cortex-m-fault-debug
    int illegal_instruction_execution(void) {
      int (*bad_instruction)(void) = (void *)0xE0000000;
      return bad_instruction();
    }
    
    void setup() {
      pinMode( LED_BUILTIN, OUTPUT );
      digitalWrite( LED_BUILTIN, 1 );
      Serial.begin(115200);
      while (!Serial && millis() < 4000 );
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      Serial.printf(" millis() now %u\n", millis() );
      if ( CrashReport ) {
        Serial.print(CrashReport);
        Serial.print("\n\tRECOVERED from Crash !");
      }
      delay ( 100 );
      digitalWrite( LED_BUILTIN, 0 );
      delay ( 1000 );
      digitalWrite( LED_BUILTIN, 1 );
    }
    
    void loop() {
      if ( Serial.available()) {
        char cc = Serial.read();
        digitalWrite( LED_BUILTIN, 0 );
        Serial.println("\nIt's your fault!\n");
        delay ( 50 );
        if ( cc == '0' ) {
          illegal_instruction_execution();
        }
        else if ( cc == '1' ) { // Bad Address Read
          uint32_t ff = *(volatile uint32_t *)0xbadcafe;
          Serial.print(ff);
        }
        else if ( cc == '2' ) { // Coprocessor Fault - turn off FPU
          __asm volatile(
            "ldr r0, =0xE000ED88 \n"
            "mov r1, #0 \n"
            "str r1, [r0] \n"
            "dsb \n"
            "vmov r0, s0 \n"
          );
        }
        else if ( cc == '3' ) { // bad_addr_double_word_write
          volatile uint64_t *buf = (volatile uint64_t *)0x30000000;
          *buf = 0x1122334455667788;
        }
        else if ( cc == '4' ) { // Divide by Zero (not enabled?)
          cc = cc - '4';
          cc /= cc;
        }
        else if ( cc == '5' ) {
        }
        else if ( cc == '6' ) {
        }
        else { // default fault on Serial with no other fault ... from 'Enter'
          uint32_t *y = 0; y[0] = 5;
        }
      }
      delay ( 3 );
      digitalWrite( LED_BUILTIN, 0 );
      delay ( 40 );
      digitalWrite( LED_BUILTIN, 1 );
    }

  17. #67
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    24,470
    Please look at the temperature wait in unused_interrupt_vector(). I have 2 concerns...

    1: Probably more than just testing CCM_ANALOG_MISC1_IRQ_TEMPPANIC is needed.

    2: The temperature test should be done inside the timer polling code, like this.

    Code:
            // keep USB running, so any unsent Serial.print() actually arrives in
            // the Arduino Serial Monitor, and we remain responsive to Upload
            // without requiring manual press of Teensy's pushbutton
            count = 0;
            while (1) {
                    if (PIT_TFLG0) {
                            //GPIO7_DR_TOGGLE = (1 << 3); // blink LED
                            PIT_TFLG0 = 1; 
                            if (temperature_is_safe()) {
                                    if (++count >= 80) break;  // reboot after 8 seconds
                            } else {
                                    count = 0;
                            }
                    }
    The idea is to do this check inside the loop which polls USB, so during the delay we remain responsive to any USB auto-reboot request. This will also start the 8 second delay once the temperature has cooled to below panic level, so we don't immediately reboot the moment the chip cools to non-panic temperature.

  18. #68
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by PaulStoffregen View Post
    Please look at the temperature wait in unused_interrupt_vector(). I have 2 concerns...

    1: Probably more than just testing CCM_ANALOG_MISC1_IRQ_TEMPPANIC is needed.

    2: The temperature test should be done inside the timer polling code, like this.

    Code:
            // keep USB running, so any unsent Serial.print() actually arrives in
            // the Arduino Serial Monitor, and we remain responsive to Upload
            // without requiring manual press of Teensy's pushbutton
            count = 0;
            while (1) {
                    if (PIT_TFLG0) {
                            //GPIO7_DR_TOGGLE = (1 << 3); // blink LED
                            PIT_TFLG0 = 1; 
                            if (temperature_is_safe()) {
                                    if (++count >= 80) break;  // reboot after 8 seconds
                            } else {
                                    count = 0;
                            }
                    }
    The idea is to do this check inside the loop which polls USB, so during the delay we remain responsive to any USB auto-reboot request. This will also start the 8 second delay once the temperature has cooled to below panic level, so we don't immediately reboot the moment the chip cools to non-panic temperature.
    Will down the latest changes and take a look. Also want to take a look at the whole I did it. When I did a test the T4 just went into inifinite restart. Would like to avoid the whole issue with auto reboot due to Panic temp being reached. The High temp right now is set to 5deg below Panic so we have some room.

  19. #69
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Quote Originally Posted by PaulStoffregen View Post
    Please look at the temperature wait in unused_interrupt_vector(). I have 2 concerns...

    1: Probably more than just testing CCM_ANALOG_MISC1_IRQ_TEMPPANIC is needed.

    2: The temperature test should be done inside the timer polling code, like this.

    ...
    The idea is to do this check inside the loop which polls USB, so during the delay we remain responsive to any USB auto-reboot request. This will also start the 8 second delay once the temperature has cooled to below panic level, so we don't immediately reboot the moment the chip cools to non-panic temperature.
    Also use of delay(100) is suspect ?: while(tempmonGetTemp() > 80) { delay(100); } // 5degs below High temp alarm.
    > without interrupts the delay() func won't function. Perhaps smaller steps of delayMicroseconds based on ARM_CYCCNT to allow USB feeding.

    Unless the temp was the cause of the restart - the 8 seconds delay seems long. Long enough for a user to miss the pending Crash feedback and power off the device missing the Crash info - if just reset it will show on restart ( if code is present ) but still a long wait for something that will happen once for the first time and be an ODD wait.

    Frank B had a long possible wait after instant restart - only to make sure USB was back online before printing. Unless overclocking over temp isn't something typically encountered. If the USB doesn't complete in 1-2 seconds before restart ... will it ever? What if something was left "ON" when the fault happened? It will be forced in that state for 8 seconds before restart/setup() can take control.

  20. #70
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by defragster View Post
    github just updated for AutoClear on Print.

    @Paul: Perhaps add an optional param to : Serial.print(CrashReport);
    > If passed with NO_CLEAR value it could allow Fault data to persist for second output?

    One line change to prior code is:
    .........
    @PaulStoffregen @defragster

    I change to using the test sketch posted by @defragster and noticed a few things if I do a '0' for instance I will receive the following message in the sermon
    Code:
    D:\Users\Merli\Documents\Arduino\CrashRepot_example\CrashRepot_example.ino Jun 21 2021 18:07:41
     millis() now 715
    
    It's your fault!
    but no crash report. After reboot I will see the crash report:
    Code:
    D:\Users\Merli\Documents\Arduino\CrashRepot_example\CrashRepot_example.ino Jun 21 2021 18:07:41
     millis() now 706
    CrashReport ... Hello World
      Fault occurred at: 00:00:08
      Temperature at time of fault: 46.6 degC
      length: 11
      IPSR: 3
      CFSR: 1
    	(IACCVIOL) Instruction Access Violation
      HTSR: 40000000
    	(FORCED) Forced Hard Fault
      MMFAR: 0
      BFAR: 0
      return address: E0000000
      XPSR: 60010000
      crc: 936F22E5
    Reboot was caused by 8 second auto-reboot after fault or bad interrupt detected
    
    	RECOVERED from Crash
    Same behavior for options 1, 3, 4, and just a return. For option 2 (coprocessor) do not get any fault report just the report. I would have thought the crash report would have been printed when the crash occurred not on report? Or am I making a wrong assumption.

  21. #71
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Quote Originally Posted by mjs513 View Post
    @PaulStoffregen @defragster

    I change to using the test sketch posted by @defragster and noticed a few things if I do a '0' for instance I will receive the following message in the sermon
    Code:
    D:\Users\Merli\Documents\Arduino\CrashRepot_example\CrashRepot_example.ino Jun 21 2021 18:07:41
     millis() now 715
    
    It's your fault!
    but no crash report. After reboot I will see the crash report:
    ...
    Same behavior for options 1, 3, 4, and just a return. For option 2 (coprocessor) do not get any fault report just the report. I would have thought the crash report would have been printed when the crash occurred not on report? Or am I making a wrong assumption.
    Not sure I follow?

    Here when a fault happens - "It's your fault!" followed by 8 sec pause - then auto restart and the report shows on setup().

    When the key entered doesn't trigger a crash the code will loop() and the "CR" will trigger the primary 'NULL pointer write fault' in the else case.

    That will happen when "It's your fault!" fails to fault and then followed by "It's your fault!" for that NULL ptr write.

    >> I have not seen a failure to Reboot/Restart ...

    That's how I see it working with latest CORES dumped over my install with that sketch.

    There may be something odd I didn't test ( as I'm not supposed to be here ) - or there is a problem with CORES out of sync - or the SKETCH post#66 is not correct for current CORES ????

  22. #72
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by defragster View Post
    Not sure I follow?

    Here when a fault happens - "It's your fault!" followed by 8 sec pause - then auto restart and the report shows on setup().

    When the key entered doesn't trigger a crash the code will loop() and the "CR" will trigger the primary 'NULL pointer write fault' in the else case.

    That will happen when "It's your fault!" fails to fault and then followed by "It's your fault!" for that NULL ptr write.

    >> I have not seen a failure to Reboot/Restart ...

    That's how I see it working with latest CORES dumped over my install with that sketch.

    There may be something odd I didn't test ( as I'm not supposed to be here ) - or there is a problem with CORES out of sync - or the SKETCH post#66 is not correct for current CORES ????
    I am seeing the same thing - thought the crash report should be printed before the reboot. Probably wrong.

  23. #73
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    14,446
    Quote Originally Posted by mjs513 View Post
    I am seeing the same thing - thought the crash report should be printed before the reboot. Probably wrong.
    Good! - Nope. Yep ... Can't print before fresh start.

    After Crash the system stability is in question and USB print is intermittent at best. I found that trying debug handler back in T_3.6 beta days and since.

    Frank B's insightful contribution was use of the static nature of RAM2/DMAMEM to allow all needed Crash decoding data for reference and report on Restart when the system is stabile and out of danger.

    That's why the arm_dcache_flush_delete(info, sizeof(*info)); is needed after the struct is filled - otherwise the MCU will be holding it in the cache when the restart happens.

    That's why the question about extending this to T_3.x's (not T_LC) - only static area is the 7 DWORDS of NVRAM. Maybe Paul is planning on that - or has other insight. While powered those NVRAM DWORDS will hold value for restarting.

  24. #74
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by defragster View Post
    Also use of delay(100) is suspect ?: while(tempmonGetTemp() > 80) { delay(100); } // 5degs below High temp alarm.
    > without interrupts the delay() func won't function. Perhaps smaller steps of delayMicroseconds based on ARM_CYCCNT to allow USB feeding.

    Unless the temp was the cause of the restart - the 8 seconds delay seems long. Long enough for a user to miss the pending Crash feedback and power off the device missing the Crash info - if just reset it will show on restart ( if code is present ) but still a long wait for something that will happen once for the first time and be an ODD wait.

    Frank B had a long possible wait after instant restart - only to make sure USB was back online before printing. Unless overclocking over temp isn't something typically encountered. If the USB doesn't complete in 1-2 seconds before restart ... will it ever? What if something was left "ON" when the fault happened? It will be forced in that state for 8 seconds before restart/setup() can take control.
    Decided to take the simplest route for delaying reboot:
    Code:
    	while (1) {
    		if (PIT_TFLG0) {
    			//GPIO7_DR_TOGGLE = (1 << 3); // blink LED
    			PIT_TFLG0 = 1;
    			if (tempmonGetTemp() < 85) {
    					if (++count >= 80) break;  // reboot after 8 seconds
    			} else {
    					count = 0;
    			}
    So we don't get into conflict with auto restart due to temp. The 85 is the High Temp alarm setting btw. Or going to try panic but have my doubts.

  25. #75
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    7,210
    Quote Originally Posted by PaulStoffregen View Post
    Please look at the temperature wait in unused_interrupt_vector(). I have 2 concerns...

    1: Probably more than just testing CCM_ANALOG_MISC1_IRQ_TEMPPANIC is needed.

    2: The temperature test should be done inside the timer polling code, like this.

    Code:
            // keep USB running, so any unsent Serial.print() actually arrives in
            // the Arduino Serial Monitor, and we remain responsive to Upload
            // without requiring manual press of Teensy's pushbutton
            count = 0;
            while (1) {
                    if (PIT_TFLG0) {
                            //GPIO7_DR_TOGGLE = (1 << 3); // blink LED
                            PIT_TFLG0 = 1; 
                            if (temperature_is_safe()) {
                                    if (++count >= 80) break;  // reboot after 8 seconds
                            } else {
                                    count = 0;
                            }
                    }
    The idea is to do this check inside the loop which polls USB, so during the delay we remain responsive to any USB auto-reboot request. This will also start the 8 second delay once the temperature has cooled to below panic level, so we don't immediately reboot the moment the chip cools to non-panic temperature.
    Ok here we go - not having much luck with using Panic Temperature at this point. Let me explain why.

    1. the panic_isr in tempmon.c is set up to shutdown power:
    Code:
    void Panic_Temp_isr(void) {
      __disable_irq();
      IOMUXC_GPR_GPR16 = 0x00000007;
      SNVS_LPCR |= SNVS_LPCR_TOP; //Switch off now
      asm volatile ("dsb":::"memory");
      while (1) asm ("wfi");
    }
    Not a big deal can just comment that out for now.

    2. for testing I set
    Code:
    static uint32_t highAlarmTemp   = 50U;
    static uint32_t lowAlarmTemp    = 25U;
    static uint32_t panicAlarmTemp  = 55U;
    and held the T4.1 by a light to get it to trip again not a big deal.

    3. have temp is safe set up like this:
    Code:
    FLASHMEM
    bool temperature_is_safe() {
    	if(CCM_ANALOG_MISC1_IRQ_TEMPPANIC == 1) {
    		TEMPMON_TEMPSENSE0 &= ~0x2U;   //stops temp monitoring
    	}
    	if(tempmonGetTemp() > 80) {
    		delayMicroseconds(100);
    		return 0;
    	} else {
    		TEMPMON_TEMPSENSE0 |= 0x2U;   //starts temp monitoring
    		return 1;
    	}
    }
    and
    4. in the isr:
    Code:
    	while (1) {
    		if (PIT_TFLG0) {
    			//GPIO7_DR_TOGGLE = (1 << 3); // blink LED
    			PIT_TFLG0 = 1;
    			if (temperature_is_safe()) {
    					if (++count >= 80) break;  // reboot after 8 seconds
    			} else {
    					count = 0;
    			}
    		}
    Now what is happening is that Panic temp now keeps triggering a restart but the crash report/unused_interrupt_vector never gets executed so the temp is safe function never gets executed

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •