Weird program restart, and then Serial doesn't output until button or power cycle

Status
Not open for further replies.

shawn

Well-known member
I'd love some help diagnosing this weird issue. It's one of those cases where commenting out code that should have nothing to do with the error makes the code work, but uncommenting it makes the program restart somehow and then `Serial` no longer can send data. Maybe it's something dumb I'm doing? Maybe it's in my interrupt code in the library? I don't know, but it's strange and I want to see if others (who are willing) can duplicate the issue.

I've found another case where `Serial` output stops working—input works still—but the program keeps running. I'll reference this as maybe being similar (but maybe it isn't) to this (`Serial` can no longer output, but the program keeps working):
https://forum.pjrc.com/threads/58908-Freezing-using-Serial-during-a-UART-ISR-on-a-Teensy-LC

Here, `Serial` is only being used outside any ISR code. It's only accessed from `setup()` and `loop()`.

Summary

This test program is the "main" example for the TeensyDMX library (attached as an INO file). It reads bytes from `Serial` to choose a sub-program that either listens to DMX (Flasher), sends DMX (Chaser), or does nothing (Null). When Flasher is running, it flashes the LED according to the DMX input. Sending 'f' to the program starts this mode and, if there's no DMX input, the LED will turn off. Sending 'c' starts the Chaser mode and the LED will flash once a second. When the program starts, the LED is on because the first mode is the Null program.

Thus, there's a way to determine the three modes: On for Null, off for Flasher, and brief flashes at 1Hz for Chaser.

My system:
Catalina version of Teensyduino, v1.49-beta5
OS: macOS Mojave, v10.14.6

Steps to reproduce

  1. Install the TeensyDMX library (latest as of this writing is v4.0.0-beta).
  2. Run the attached program on a Teensy 3.2.
  3. The LED should be on because it starts in the Null mode.
  4. Send 'f' in the serial monitor and note that it switches to the Flasher mode. (The restart also happens when sending a 'c' for Chaser.)
  5. The LED should turn off after a moment to indicate that it's not receiving data. (Or flash at 1Hz if Chaser is running.)
  6. After a short while, the LED will turn back on again, indicating that it returned to the Null mode.
  7. Notice that there's some additional random character that appeared in the serial monitor just before the program restarted. This doesn't always appear, however. Its presence means that the Teensy is sending some serial data? Perhaps one of the characters that `setup()` sends when it restarts?
  8. Also notice that sending additional characters to the program changes the mode (because the LED changes modes) but there's no `Serial` output. No `Serial` output actually only happens about half the time for me, it seems random (racey?).

The kicker

The program doesn't seem to crash/restart if the `switch` statement on lines 170-187 is commented out. (Or just comment out the `case '?'` part, lines 80-86.) I can't figure out why. Is there something weird about the produced program size? Is this one of those things where it looks like it's one cause but it's really another? Did I miss something dumb that might actually be obvious in hindsight? Is it some weird interaction with the UART ISR functions (in the TeensyDMX library; there's no `Serial` calls in there). I'm stumped. There's nothing in that `switch` statement that should cause a problem.

Maybe it's time to go through my UART ISR code again (in the TeensyDMX library)... but I just can't find an issue. I don't expect anyone to debug all that for me, I'd just love it if someone can reproduce the error and perhaps propose something I haven't thought of; this'll prove it's not just my setup. Collective wisdom and all.
 

Attachments

  • TestRestartingMain.ino
    8 KB · Views: 53
I played a bit with it and saw the following (T-LC):

  • Switching between sketches always works if you have a switch to the null sketch in between. I.e. n -> f -> n -> c -> n -> ....
  • Switching f->c or c->f always crashes.
I added a few debug messages and found that the destructor on the old sketch is not called in changeSketch() whenever it crashes. So, something seems to be wrong with your unique_ptr<Sketch>. To test, I replaced the unique_ptr<Sketch> by a traditional pointer and adjusted the code in changeSketch() accordingly which fixes the issue (or at least the symptom)

Could be that the implementation of unique_ptr<> is somehow broken (I doubt that), could also be that there is some subtle memory issue / buffer overrun etc in your code or the teensyDMX code which leads to the strange behavior.


Code:
 // Create a new sketch
  switch (sketchType) {
    case 'c':
      delete currSketch;
      currSketch = new Chaser();     
      break;
    case 'f':
      delete currSketch;
      currSketch = new Flasher();
      break;
    case 'n':
      delete currSketch;
      currSketch = new NullSketch();
      break;
  }

Hope that helps chasing it down
 
Thanks for having a look.

I see the same crash even if I don't use `unique_ptr`. Also, when it crashes, `Serial` output can no longer be trusted without a power cycle or program button reset (and the program can work even if `Serial` output has something wrong—Serial input still works) , so it may mean the destructor is still getting called (that is, if you're using `Serial` to check); I'm always seeing the destructor get called in the correct order when it doesn't crash and I just switch between things.

I also see the same crash when I pre-allocate all the `Sketch` objects instead of using `new`/`delete`.

I'm thinking the issue is very likely in my library. :(
But why does the program run normally once `Serial` output is "out of the way"? Something's very weird in the interaction between my library and `Serial` output (specifically output). I wonder if I'm doing anything incorrectly with setting up interrupts or something?
Note that there's no UART input so interrupts shouldn't be generated.

So I'm basically seeing three issues:
  1. The program does a spontaneous reset. The reset does not occur if a seemingly unrelated `switch` statement is commented out. Race condition?
  2. After the reset, `Serial` can't be trusted without a power cycle or re-program. What sort of state is `Serial` getting into, and how am I putting it there??
  3. The program never crashes again (or, at least, I haven't seen it) if I re-switch to the Flasher mode without power cycling or reprogramming the board, after the "crash". This leads me to believe that once `Serial` output is out of the way somehow, things run normally. I've seen this before in the thread I referenced in the first post, but this time, I'm not accessing `Serial` from an ISR.

Le-sigh.
 
Last edited:
Update: I adjusted the program so that it only does Serial input. There’s no Serial output calls. I still see the spontaneous crash/restart.
 
My gut feeling is that it has nothing to do with serial. To me this looks like some buffer overruns or similar. Depending on adding / removing parts of the code they will generate issues or not. But again, this is my gut feeling only.
 
My process today is going to be bisecting through all the git commits to see where it fails. Thank heavens for git. You’re probably right that it’s not Serial, but I’d like to think it’s a clue. Maybe you're right about the buffer overrun.
 
Last edited:
Well, this issue exists all the way back to v2.0.0 of the library. Humbug. At least the code base is simpler for trying to debug this or find a cause. :)

Note that I've tried on Teensyduino v1.48 and 1.49-beta5.
 
Last edited:
So here's what I've found so far. When I read or check availability from `Serial`, the program crashes/restarts. When I use an `elapsedMillis` timer to change the mode instead, it does not crash. A second point is that the program never crashes/restarts once it happens once (without a power cycle or upload), even if I provide the exact same input. I'm wondering what this means?

(A side note on my use of the word "crash": I'm not certain what's happening, but the program is definitely restarting, so I'm taking liberty to use the word "crash".)

I did two things. First, I replaced `Serial.available()` with a call to a function, `serialAvailable()`, that then calls `Serial.available()`. I did the same with `Serial.read()`. This program crashes/restarts. Next, I used an `elapsedMillis` timer to return 'f' after 1 second. Here's the updated code:

Code:
int serialAvailable() {
  // return Serial.available();
  static elapsedMillis timer{0};
  static bool latch = false;
  if (!latch) {
    if (timer >= 1000) {
      latch = true;
      return 1;
    }
  }
  return 0;
}

int serialRead() {
  // return Serial.read();
  return 'f';
}

// Main program loop.
void loop() {
  // Potentially choose a new sketch
  int avail = serialAvailable();
  if (avail > 0) {
    int b = serialRead();

So either what I'm doing in my library (or main program) is interacting badly somehow with `Serial`, or there's some other race condition happening somewhere that just makes it look like some interaction with `Serial` is messing up. Hrm. Double-humbug.

I know the library and code here is long and complex; I'm working on reducing to a smaller test case. Thanks for your patience; I don't expect anyone to slog through all the code here. I just hope it illustrates the problem. I'll get there. :)
 
Okay, I've found a much smaller program that reproduces the issue. There's no DMX, no external libraries, no buffers to overflow, and no dynamic memory allocation.

Run the program, notice the LED is on, then send it an 'f' character, and notice the LED turns off. After a little bit of time, the LED turns on again; this shows the problem. Why does the LED turn back on? It appears the program crashes/spontaneously restarts.

Also note that commenting out the `switch` statement in lines 92-99 makes the program not crash, or at least I haven't seen it crash when this is commented out.

Am I missing something obvious?

The code:
Code:
// Basic main program, for testing a crash/restart.
//
// This uses the concept of sub-sketches, where serial input selects which
// sub-sketch to run. The initial running sketch is the Null sketch; it does
// nothing. The other sketches are:
// 1. Flasher: Does nothing in this program.

// Basic sub-sketch.
class Sketch {
 public:
  Sketch() = default;
  virtual ~Sketch() = default;

  virtual void setup() {}
  virtual void tearDown() {}
  virtual void loop() {}
  virtual String name() = 0;
};

// NullSketch supports the Null Pattern.
class NullSketch final : public Sketch {
 public:
  NullSketch() = default;
  ~NullSketch() override = default;

  String name() override {
    return "Null";
  }

  void setup() override {
    digitalWriteFast(LED_BUILTIN, HIGH);
  }
};

// Flasher flashes the LED according to a DMX input speed.
class Flasher final : public Sketch {
 public:
  Flasher() = default;
  ~Flasher() override = default;

  String name() override {
    return "Flasher";
  }

  void setup() override {
    digitalWriteFast(LED_BUILTIN, LOW);
  }
};

// The current sketch
NullSketch nullSketch{};
Flasher flasher{};
Sketch *currSketch = &nullSketch;
int currSketchType = -1;

// ---------------------------------------------------------------------------
//  Main program
// ---------------------------------------------------------------------------

// Changes the current sketch, if different. This ignores any unknown
// sketch type.
void changeSketch(int sketchType);

// Main program setup.
void setup() {
  // Initialize the serial port
  Serial.begin(115200);
  while (!Serial && millis() < 4000) {
    // Wait for initialization to complete or a time limit
  }
  Serial.println("Starting main program.");

  // Set up any pins
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWriteFast(LED_BUILTIN, HIGH);

  // Initialize with the Null sketch
  changeSketch('n');

  Serial.println("Hello, DMX World!");
}

// Main program loop.
void loop() {
  // Potentially choose a new sketch
  int avail = Serial.available();
  if (avail > 0) {
    int b = Serial.read();

    // First, transform aliases
    // and perform any commands
    switch (b) {  // -->COMMENT THIS OUT AND IT DOESN'T CRASH<--
      case 'F':
        b = 'f';
        break;
      case 'N':
        b = 'n';
        break;
    }

    if (b >= 0) {
      changeSketch(b);
    }
  }

  currSketch->loop();
}

// ---------------------------------------------------------------------------
//  Support functions
// ---------------------------------------------------------------------------

void changeSketch(int sketchType) {
  if (sketchType == currSketchType) {
    return;
  }

  // Check for a valid sketch type
  switch (sketchType) {
    case 'f':
    case 'n':
      break;
    default:
      return;
  }

  // Destroy any current sketch
  currSketch->tearDown();

  // Create a new sketch
  switch (sketchType) {
    case 'f':
      currSketch = &flasher;
      break;
    case 'n':
      currSketch = &nullSketch;
      break;
  }
  currSketchType = sketchType;

  Serial.printf("Changing sketch to: %s\n", currSketch->name().c_str());
  currSketch->setup();
}
 
Found some time to try your sketch with a T3.2 and a T-LC. It works in both cases without problems. I don't observe any restarts and it stays responsive 'for ever'
 
Thanks, @luni, for trying this out. I had to step away for a little while because it was frustrating. You're right that the program doesn't restart, but it does restart after it restarts with the original `.ino` and then the small program is loaded. Then the small program consistently restarts each time it's loaded using the IDE.

I've been trying different combinations of rebooting my computer and loading the two programs in different orders. When I look at this again, I'll try to come up with a specific sequence of steps where the problem occurs, but only after I test it on more than one machine; I can't yet rule out any role my computer plays here.

To be continued...
 
Status
Not open for further replies.
Back
Top