Freezing using Serial during a UART ISR on a Teensy LC

Status
Not open for further replies.

shawn

Well-known member
I'm doing my own UART processing and noticed that when doing Serial output during the ISR, the Teensy LC freezes. I'm trying to figure out why this is happening. I've created a contrived example. To use it, put serial data having 250kbaud, 8N2 onto RX3 (pin 7) (this is DMX data).

The code:
Code:
// Tests Serial output in the UART handling ISR.
// Trying to diagnose a freeze on a Teensy LC.

// Data timeout, in milliseconds.
constexpr int kTimeout = 1000;

// UART ISR declaration.
void uart_rx_isr();

// UART-specific variables
auto irq = IRQ_UART2_STATUS;
auto port = &KINETISK_UART2;
auto &hwSerial = Serial3;

volatile bool latch = false;     // Indicates that there's data
int count = 0;                   // The byte count
elapsedMillis lastDataTimer{0};  // For detecting data timeouts

void setup() {
  Serial.begin(115200);
  while (!Serial && millis() < 4000) {
    // Wait for Serial initialization
  }
  Serial.println("Starting.");

  // Configure the LED and turn it off
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWriteFast(LED_BUILTIN, LOW);

  // Start the serial monitoring
  hwSerial.begin(250000, SERIAL_8N2);
  port->C2 = UART_C2_RE | UART_C2_RIE;  // Receive only
  attachInterruptVector(irq, &uart_rx_isr);
}

void loop() {
  bool flag;

  // Read and set `latch` atomically
  NVIC_DISABLE_IRQ(irq);
  flag = latch;
  latch = false;
  NVIC_ENABLE_IRQ(irq);

  if (flag) {
    digitalWriteFast(LED_BUILTIN, HIGH);
    lastDataTimer = 0;
  }

  if (lastDataTimer >= kTimeout) {
    digitalWriteFast(LED_BUILTIN, LOW);
  }
}

// Receives and processes one byte.
void receiveByte(uint8_t b) {
  latch = true;
  Serial.printf("%d: hello\n", ++count);
}

// The UART ISR.
void uart_rx_isr() {
  if ((port->S1 & UART_S1_RDRF) != 0) {
    receiveByte(port->D);
  }
}

After running for a while, the program freezes after about 1500-2500 bytes. I'm trying to figure out the technical reason for this. What am I doing wrong or assuming? Do I need to flush the USB, for example, every so often? Etc.
 
Last edited:
Here's a program that will output the correct data onto TX1. It's a good idea to use a different Teensy than the Teensy LC under test because that better tests the real-world conditions.

Code:
// Basic program for sending DMX over Serial1.

#include <TeensyDMX.h>

namespace teensydmx = ::qindesign::teensydmx;

// Create the DMX sender on Serial1.
teensydmx::Sender dmxTx{Serial1};

void setup() {
  // Turn on the LED, for indicating activity
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWriteFast(LED_BUILTIN, HIGH);

  dmxTx.begin();
}

void loop() {
  // Do something, maybe alter channel values.
}
 
USB Serial should take care of itself as far as flush etc when not put in an odd way.

Not best to have this called during the _isr():
Code:
Serial.printf("%d: hello\n", count++);

Make this change :: volatile int count = 0; // The byte count

receiveByte() just does :: count++;

And in loop copy Count to local, compare local to last Count, and print when different.

Code:
volatile int count = 0;                   // The byte count
// Receives and processes one byte.
void receiveByte(uint8_t b) {
  latch = true;
  count++;
}
//...
loop(){
//...
  static int lCount=0;
  int thisCount = count;
  if ( thisCount != lCount ) {
    lCount=thisCount;
    Serial.printf("%d: hello\n", thisCount);
  }
//…
}

loop() is possibly running some 10,000's per second - not sure of the effect from this :: NVIC_DISABLE_IRQ()
 
Thank you for the response. However, I'm not following why you think the Teensy LC would freeze with this program?

To address the points in your response:
  • I don't believe `count` needs to be volatile because it's only being accessed from the interrupt, and only the UART2 interrupt at that, so any compiler optimizations shouldn't be a problem. `count` is not being accessed across the ISR-main program boundary.
  • The point of this program is to show that performing USB Serial output during the ISR freezes the program, and I'm not sure why. Yes, the printing could be done in loop(), but that's not the purpose of this question. It's a contrived example meant to show that the Teensy LC freezes, and I'm trying to figure out the reasons for this. I'm not trying to optimize or improve the program, but if I were, I would agree with your suggestion. The purpose isn't to print the count, it's to diagnose the freeze; the count is merely there for seeing where the freeze happens.
  • Disabling and re-enabling the IRQ via those `NVIC` calls ensures that the latch is read and reset atomically with respect to the ISR's use of the same variable.

Update: Running this on a Teensy 3.2 doesn't freeze. And to re-emphasize: I'm not trying to improve the example program; doing so would hide the issue. Why does printing using the USB Serial from the ISR freeze on a Teensy LC but not on a Teensy 3.2?
 
Last edited:
Thank you for the response. However, I'm not following why you think the Teensy LC would freeze with this program?

To address the points in your response:
  • I don't believe `count` needs to be volatile because it's only being accessed from the interrupt, and only the UART2 interrupt at that, so any compiler optimizations shouldn't be a problem. `count` is not being accessed across the ISR-main program boundary.
  • The point of this program is to show that performing USB Serial output during the ISR freezes the program, and I'm not sure why. Yes, the printing could be done in loop(), but that's not the purpose of this question. It's a contrived example meant to show that the Teensy LC freezes, and I'm trying to figure out the reasons for this. I'm not trying to optimize or improve the program, but if I were, I would agree with your suggestion. The purpose isn't to print the count, it's to diagnose the freeze; the count is merely there for seeing where the freeze happens.
  • Disabling and re-enabling the IRQ via those `NVIC` calls ensures that the latch is read and reset atomically with respect to the ISR's use of the same variable.

given notes in prior post #3 - the count is made volatile to allow safer use in loop()

and it is known that things like Serial.print() or other complex calls from an _isr() are bad form at best - never a good idea - and expected to cause trouble in some situations - that was the reason for the added code block - as the solution is not to put complex calls in _isr() routines.

And the note about the "Disabling and re-enabling the IRQ via those `NVIC`" was just to say it is being called very often - which may interfere with normal operation depending on the side effects of that when occurring at that high polling rate.
 
I’m still unclear what’s causing it to freeze on an LC but not on a 3.2. Your points do make sense, but they don’t address the main question.

The "real program" doesn’t actually Serial print this often. It does not print per byte, only like every 500 or something, and only once in a while. And even then, just a string with no count. I was just trying to create a representative program that illustrates the issue. (The advantage of the test program is that it freezes much earlier so it’s easy to demonstrate the issue. :))

I’d still like to Serial print from the UART ISR, but would like to know what to lock/avoid/do to guarantee that the call is safe and won’t freeze the program.
 
Last edited:
While the T_LC M0+ at 48 MHz is an impressive processor - it is certainly a lesser sibling of the T_3.2 M4 at 96 MHz. Catching some boundary condition or conflict for something known as 'not to be done' isn't surprising.

If it works when no Serial.print() in the _isr() code path - then all is well. And with loop() speed the net output should be the same with no misses at the given baud rate - though 250Kb maybe be pushing it - and any misses will be apparent. I haven't counted loops() /second on the T_LC in some time, but anything over 25K/sec should be counted.

Printing in an _isr() is expected to be problematic and to be avoided for general use.
 
First, I'm not using `loop()` in my main program. It's not relevant here and unrelated to 250kbaud, or any baud rate, for that matter. loop()/second isn't related to my question. In addition, loop() processing is not handling the incoming bytes. You may be referring to your suggested "fix"? If so, that's not related to my question either. I'm not trying to "fix" my contrived example program.

Second, I do understand that calling complex things in an ISR is not recommended.

I'm simply asking, "What is freezing?" and "How to avoid this freeze, if possible?"
Now, if someone knows 100% that "Calling Serial print from the UART ISR" will freeze the program, my question still stands: "What is freezing?"

For example, if you try accessing one of the UART registers before enabling the associated peripheral, then the system will freeze because that's how these ARM chips behave (I learned this from another question I asked a while ago). Enabling the peripheral is the solution here, and not enabling it is the cause.
 
Last edited:
Sometimes it is hard to know exactly what you are hitting...

Sometimes it is something like:
The Serial.print will not return until the last byte of the output has been at a minimum put into an output buffer that will be output to the device, in your case the USB. Now Suppose your outputs have exhausted all of the USB buffers that can be allocated to it, and as such spins waiting for a USB buffer to be available. Well suppose that this can only be accomplished by some low level ISR being processed for a USB event, and suppose that the priority of THAT interrupt is not high enough (low enough value) to interrupt your ISR that you are currently processing. Well you are now hung...

Again I don't know if that is your case or not. Some of the subsystems have code in them that try to detect this and either try to change the priority of something or handle the event itself. Example if you look at how the Serial1.write()... stuff is handled it funnels down to serial_putchar, which tries to detect that the ISR for the UART will not be called and does the code within putchar to take characters off of the output queue and output them to the serial port. But again I have not looked in awhile to see if USB does this.

Again as @defragster mentioned it is best to simply avoid the issue. By either not doing very much Serial output or check first before you do output, by calling something like Serial.availableForWrite() to see about how much room you have left.
 
I was thinking that something like priority inversion might be happening. Good point about what `serial_putchar` does. I'll go explore that...
Also great point about checking `Serial.availableForWrite()`. I'll explore that too.

I'm not disagreeing with either of you that avoiding Serial printing inside the UART ISR is preferred. Per your comment that it's best to "not do very much Serial output", though, that's exactly what I'm not doing: I'm really not doing much serial output at all. 20 bytes here and there, much smaller than a full USB packet.

To emphasize: I'm not necessarily trying to fix everything here. I'm trying to get away with minimal serial output, but also understand the issue so that if there is a fixable cause, I can get away with sending some bytes from time to time, less than a full USB packet. And if it turns out that some race or lockup conditions will happen, then I'd like to know that that exists too.

At the very least, I want to understand why lockup is happening at all. I'll consider my question answered for myself if I can learn the why, even if I can't fix it. I'm an academic at heart and want to understand things even if I can't fix them. Thank you both for the responses.
 
Also in cases like this I try to look to see if maybe there is some other issue that might happen.

For example suppose your ISR is called for some state that you are not handling. Example not sure what happens if you get an overflow... or framing error or... And if you don't process it, and return, which then causes your ISR to fire again...

Been there... Often times I personally try to detect these and use unused IO pins that get set/clear or toggle in different situations and hook up logic Analyzer...

For example is it hanging inside of your ISR or is it looping on your ISR? or ...
 
Experimenting with `Serial.availableForWrite()`, the program can run much longer and the output looks like it appears much faster. The program still freezes, but after many more bytes (most of the time). Here's the modification:
Code:
// Receives and processes one byte.
void receiveByte(uint8_t b) {
  latch = true;
  if (Serial.availableForWrite() >= 20) {
    Serial.printf("%d: hello\n", ++count);
  }
}

Still experimenting...
 
Also in cases like this I try to look to see if maybe there is some other issue that might happen.

For example suppose your ISR is called for some state that you are not handling. Example not sure what happens if you get an overflow... or framing error or... And if you don't process it, and return, which then causes your ISR to fire again...

Been there... Often times I personally try to detect these and use unused IO pins that get set/clear or toggle in different situations and hook up logic Analyzer...

For example is it hanging inside of your ISR or is it looping on your ISR? or ...

I'm verifying that the ISR no longer gets called when the freeze happens by toggling the LED in that function instead of in `loop()`:
Code:
const bool isUART0 = (&hwSerial == &Serial1);

// The UART ISR.
void uart_rx_isr() {
  digitalWriteFast(LED_BUILTIN, HIGH);
  if ((port->S1 & UART_S1_RDRF) != 0) {
    receiveByte(port->D);
  } else {
    if (isUART0) {
      port->S1 = 0;
    } else {
      port->D;
    }
  }
  digitalWriteFast(LED_BUILTIN, LOW);
}

Also note that all other flags are cleared in that `else`.

Here's what happens when it appears to freeze:
  1. The program starts and eventually, the Serial output stops.
  2. The ISR light is still flashing, so I know that's getting continuously called.
  3. I unplug and plug in the Serial3 signal a few times and note that when I unplug, the LED turns off, and when I plug it back in, the LED starts flashing again.
  4. I then leave the signal plugged in, and eventually, the LED turns off, indicating that the ISR is no longer being called.
  5. I unplug and re-plug the signal to verify that that doesn't trigger the ISR again. The LED stays off.
  6. I verify with a logic analyzer that the signal actually still has signal on it.

I've also changed `loop()` to use pin 2 instead of the LED pin for monitoring "data timeout". It stays off when I see the "freeze". This verifies that the ISR is no longer getting called.

So I guess my follow-on question is: Why does the ISR stop getting called?

Update: I added toggling the LED in `loop()` and verified that it still toggles after the "freeze". So, from my "big" program's perspective, it appears to freeze because I'm doing strictly event-based stuff without using `loop()`. It looks like "Why does the ISR stop getting called?" is the real question here. Thanks for engaging.

Maybe @defragster is right and something is fishy when I use the NVIC enable/disable IRQ feature. I'll experiment more with just `__disable_irq()` and `__enable_irq()` instead.
 
Last edited:
Experimentation update:
  1. I've tried other ways to atomically check and reset `latch`; ISR still stops receiving: __disable_irq()/__enable_irq(), ATOMIC_BLOCK, and nothing (non-atomic). This tells me that the atomicity (i.e. disabling and re-enabling any interrupts) has no effect on the result.
  2. I'm now using an additional pin that goes high if `errno` is ever non-zero (in `loop()`). The pin never goes high.
  3. Upping the check for `Serial.availableForWrite()` to be >= 64 before printing. No luck. Well, I've seen the ISR stop receiving once, but haven't seen it stop since. USB output has definitely stopped no matter what value I compare with (a value <= 64 because > 64 is never true).
 
Last edited:
just FYI:
> USB will hold partial buffers until filled, or a time elapses - after timeout a partial buffer will be sent on a timer. Actual mechanics in code - "TRANSMIT_FLUSH_TIMEOUT 5 /* in milliseconds */" ?? - just saw it noted over time.
> loop() is called used or not - and the (weak) function yield() is called with each iteration of loop() - and yield pings ALL serial's on the device for :: if (Serial[usb,1,2,3,...].available()) serialEvent();
- not sure if a local sketch void yield(){} could change anything if related.

Interesting loop() still runs - maybe record a time in the _isr() and when not called in some time watched in loop() - do some manual checks for the error condition or whatever might stall the UART or whatever leads to the _isr() at hand being triggered.
 
Interesting loop() still runs - maybe record a time in the _isr() and when not called in some time watched in loop() - do some manual checks for the error condition or whatever might stall the UART or whatever leads to the _isr() at hand being triggered.

This is exactly my question. I have no idea what is stalling the UART. I know when it stalls, but not why. So: What is stalling the UART? Your statement that I "check for the error condition or whatever might stall the UART" implies that I know this reason, that's why I'm still confused. :)
Edit: Thinking about it some more, you've given me the idea to look at the UART registers in `loop()` ("whatever leads to the _isr() at hand being triggered"), unless that's what you meant in the first place by this statement. Time to measure and toggle even more pins...

`loop()` is still running, which, if my assumption is correct, means that the system is not inside an ISR. (Is this true?)
If that's true, then the UART ISR isn't getting hung. I know it's not getting called, however, because the LED never turns on. (Note that I toggle the LED at the start and at the end of the UART ISR.)

But on the other hand, if the UART ISR isn't getting called anymore, then it feels like it's halting inside that ISR, perhaps inside `Serial.print` somewhere. This would mean that `loop()` shouldn't be getting called. (Unless my assumption that being inside an ISR means that `loop()` doesn't get called is incorrect.)

I also tried overriding `yield()`, as you suggested, but same effect: the UART ISR stops getting called.

My system:
Arduino: Teenyduino 1.49-beta4 build for Catalina
OS: macOS Mojave 10.14.6
USB: Amazon Basics powered USB 3.0 hub
Teensy: Teensy LC
 
Last edited:
Some more information:
In `loop()`, I toggle a monitor pin depending on the state of the UART's RDRF flag. I see pulses in normal operation, but when the ISR is no longer getting called, I don't see the pin toggle at all. This means to me that RDRF is no longer triggering the ISR.

The state of affairs so far:
  1. The Serial output stops eventually no matter what.
  2. Doing a check for `Serial.availableForWrite() >= 64` before Serial print keeps the UART ISR running, but Serial output eventually stops. (The UART monitor pin watching RDRF is still toggling.)
  3. Doing a check for anything less than 64 available-for-write bytes causes the UART ISR to eventually stop receiving events. (The UART monitor pin watching RDRF stops toggling.)

Update:
I'm toggling a pin to HIGH if the following bits ever go low (from `loop()`). The pin always stays LOW, so I know these stay on (and I've verified that `loop()` is still running):
  1. SIM_SCGC4_UART2 (peripheral being on)
  2. UART_C2_RE (receive enable)
  3. UART_C2_RIE (receive data interrupt enable)

Any more I'm not thinking of? These were tested both with and without a redefinition of `yield()`.
 
Last edited:
Have you tried a different compiler optimization? Smallest is default for LC, what happens when you try Fastest? Compilers sometimes may optimize things out, if there are code issues
 
Thank you for the suggestion. Here's the optimization results:
  1. Smallest: ISR stops, Serial output stops
  2. Debug: ISR stops, Serial output stops
  3. Fastest: Serial output stops, ISR does not stop (didn't wait longer than 2 minutes after Serial output stopped)
  4. Faster: Serial output stops, ISR does not stop (didn't wait longer than 2 minutes after Serial output stopped)
  5. Fast: Serial output stops, ISR does not stop (didn't wait longer than 2 minutes after Serial output stopped)

Another interesting thing to note is that Serial output stops at wildly different times, not just between optimization options, but between runs. Maybe that's due to the intricacies of USB and its connections to the computer or something.

I'd love to understand what's going on here, even just on a technical/academic level. Why does the Serial output stop, and why do ISR events eventually stop under certain build options? Thanks, all, for exploring this with me.
 
Last edited:
Calling USB Serial print functions from interrupts often works, but sometimes leads to really tough race conditions, especially if you also use Serial print stuff from your main program or any interrupt running at a different priority level.
 
Ah, I see. Do you think priority levels are the main issue causing any races? Maybe I can pursue changing the UART priority (or change the USB priority, whichever is lower to whichever is higher). Is it even a feasible question to ask which state values may be racing?

Looks like UART is 64 and USB is 112? What if I temporarily change USB to 64? Thinking aloud... Is this even worth playing around with?

Update:
I tried this with no luck, as a quickie test:
Code:
NVIC_SET_PRIORITY(IRQ_USBOTG, 64);
if (Serial.availableForWrite() >= 20) {
  Serial.printf("%d: hello\n", count);
}
NVIC_SET_PRIORITY(IRQ_USBOTG, 112);

I may just go ahead and implement some sort of mailbox scheme communicating the intent to print something, to another "process" (eg. some check in `loop()`, for now).
 
Unless there is loop(){while(1);} the loop() will cycle, may as well use it, especially since it keeps running.

Even if this conflict was found and resolved there may be another down the road - given the known propensity for this to cause trouble so best not to invite it.

tonton81 made a very nice and flexible CircularBuffer addition to queue messages at high rates with low overhead. Using that or any other way to queue data safely for output would be better for reliable operation.
 
Looks like UART is 64 and USB is 112? What if I temporarily change USB to 64?

Teensy LC implements only 4 distinct interrupt priority levels. So levels 0 to 63 are all the same (highest priority), 64 to 127 are all identical, as are 128-191 and 192-255.

I'm not necessarily saying I believe all your troubles are related to interrupt priorities. Generally my advice is to avoid use of interrupts, and when you must use them, keep the code simple and small and transfer data between interrupts and main program context using well established patterns like a circular buffer. I'm also generally opposed to the Arduino API offering callbacks or other ways to run user code from interrupts, because using them without tough race conditions is so difficult. That's why we don't have a serialEvent callback from ISR context. It's a seductive idea that seems like you'll get good performance, but it almost always turns out to be a troubleshooting nightmare! That's why I generally recommend not running this sort of code from interrupts.

But if you do experiment with interrupt priorities, keep this hardware limitation in mind. Cortex M0+ implements only 4 distinct priority levels. Hopefully that can save you from wasting time trying different combinations which aren't actually any different.

Also keep in mind the Cortex M0+ implementation is *NOT* designed to allow changing interrupt priorities while interrupts are enabled and in use. M3 and higher support that (even though it's probably also a support nightmare), but M0 does not. ARM specifically says M0 can't do this. What exactly goes wrong, I can't say, since I never do that (even on the chips which can support it). On Teensy LC, you must set the priority only once before you use the interrupt and then do not change it at runtime!
 
Thanks for the reminders about interrupt priority divisions.

Experimentation aside, I'd still like to understand what must happen for an ISR to stop getting called just because there's a race condition somewhere. The following are still unanswered questions for me:
  1. What state is getting corrupt?
  2. What would cause the UART ISR to stop getting called? Why does RDRF stop getting set?
  3. What's getting corrupted such that Serial.print no longer can send USB data?

I'd like people to hear these questions as "I'm trying to understand the system better" and from an academic level for deeper understanding. I don't want people to hear these questions as something that needs a fix or to propose alternatives; that's not why I'm exploring these questions. I'm already fixing my software to approach sending text output differently, so that's besides the point. I just want to understand the system more deeply, and if a fix or approach comes out of the answers, then great. If not, that's fine too.

And yes I know that "Serial printing from an ISR is a bad idea". "Don't do this" doesn't answer my questions. I still would like to know why it's a bad idea. So far, the only relevant and best answer I've seen has been "race condition". That sheds a little light on where I could look for understanding.

My original question tried to demonstrate the problem and ask why it happens. That's it. I'd like to stick with that.
 
There was a recent (some months back) condition resolved where UART serial would get in a state it didn't recover from when an error was detected. That was a true end of data error or something that could be triggered with pulses to the UART pins - that was hard enough to discover. Don't know if there was any notes left there that may be relevant for UART state verify, reset or clearing.

In this case the race is possibly something needing done gets put off and then the interrupt or indication is lost for follow through and needs the skipped ping to get back into service. Though it seems if that was the case some pending/indication flag would be showing.

If the repro was extended to the T_3.2 Serial3 - perhaps at slower speed of 24 or 48 MHz it would indicate a true solvable problem tied to software. Or related to the included TeensyDuino software implementation of UART Serial. If not then it may be something unique to the T_LC MCU handling and UART setup.
 
Status
Not open for further replies.
Back
Top