Serial1.end() crash in TD 1.46 but not in 1.42

PaulN82

Member
I just upgraded straight from TD 1.42 to 1.46 on Arduino IDE 1.8.9. Uploading code that was unchanged since 08/18 using this new version caused my Teensy 3.2 to hang. After considerable commenting and re-uploading I found it turned out to be caused entirely by Serial1.end(). I use this to allow the baud rate to be changed programmatically. To summarise some simple tests:

Calling Serial1.end() before Serial1.begin(xxx) has ever been called causes the Teensy 3.2 to hang.
Calling Serial1.end() after Serial1.begin(xxx) has been called does not cause a hang BUT causes the first packet of data read out of Serial1 to be garbled.

These issues never occurred with TD 1.42. Question is, what additional checks do I need to put in my code to prevent these new problems? Presumably a bug in Serial1 was fixed between versions, and I was just getting away with doing this.

Do I even need to call Serial1.end() if I am only changing the baud rate on-the-fly? can I just call Serial1.begin(xxx) again?

Thanks!
 
Given: Any change in baud rate on an active line risks losing data at the old rate - or not finding data until the new right rate is set. It can also trigger UART Hardware error detection based on the data stream.

If you have a reasonable short sketch that show the problem posting that would be helpful - though it would take something on the other end sending the data.

On a Teensy if you set up your code to use Serial1 as desired - Perhaps cross wire Tx and Rx from Serial2 pins and put the other end of the test code there and upload it when it shows the failure state on a single Teensy to minimize the overhead and assure it reproduces.

As far as .begin() - trying it would be the simple answer.

There was a similar situation some months back and it was resolved where differing baud rates caused the Serial UART hardware to detect an error, and the change for that fix may be what is triggering on this situation? I can't say for sure now if that came in 1.45 or 1.46 - if you scan the release notes for those releases it might help get back to that sitation:

It may be this from github : "serial1.c Fix serial end() when errors are received but final interrupt never runs 7 months ago" :: PaulStoffregen committed on Oct 18, 2018

That will give the timeframe for the version of TeensyDuino to test before and after to confirm if that change to .end() behavior is behind this.

However in any case getting it resolved will take a working sketch that exhibits the issue at hand.
 
As @defragster mentioned hard to know exactly what is going on without seeing some example...

a) calling Serial1.end() - without ever calling any other Serial1.begin() function should just return without doing anything as the begin function enables the system clock (memory access) to the registers. And Serial1.end() checks for this and returns if that has not been set.

b) calling Serial1.end after Serial1.begin... The changes were additions to the end function was the addition to read the status register AND the data register,
to clear some error conditions:
Code:
	UART0_S1;
	UART0_D; // clear leftover error status
So it would eat a character if one is found in the data register... But if you are changing baud rates, then not sure what to expect if new data came in before you actually did the change...
 
As @defragster mentioned hard to know exactly what is going on without seeing some example...

a) calling Serial1.end() - without ever calling any other Serial1.begin() function should just return without doing anything as the begin function enables the system clock (memory access) to the registers. And Serial1.end() checks for this and returns if that has not been set.

b) calling Serial1.end after Serial1.begin... The changes were additions to the end function was the addition to read the status register AND the data register,
to clear some error conditions:
Code:
	UART0_S1;
	UART0_D; // clear leftover error status
So it would eat a character if one is found in the data register... But if you are changing baud rates, then not sure what to expect if new data came in before you actually did the change...

@KurtE - I of course sampled and tested that thread's problem - though so long ago now :) … it was in fact the case that user was changing baud rates that triggered the error- IIRC it was framing error

Not the thread yet but Paul posted this : "Yes, a very rare bug was discovered months ago. If a byte was received into the UART's FIFO with a framing error, but the UART had not yet generated an interrupt before end() was called, the error state would remain in the hardware. Then later, calling begin() would lock up."

Here is that thread - not that it helps other than confirming the TeensyDuino version to pinpoint the change - but it was with a GPS connected finding what the BAUD rate is … KurtE you were there too ,,,

Serial-Problem
 
BTW - end of that 'Serial-Problem' thread with the fix Paul noted: The fix will be in Teensyduino 1.45 later this year. So 1.44 should work like 1.42 does and 1.45 will give you the problem you are suggesting - if not then it is something else.

Hi Frank_B :)
 
Minimal code to reproduce the bug. It's SPI + Serial1 combo that causes it!

Okay, I have found out specifically what causes it! It is only when SPI.begin() is called in combination with Serial1.end(). Does not matter what order. Here is the minimal code that reproduces the Teensy 3.2 hang, regardless of optimisation choice or CPU speed. It never gets as far as the main loop to flash the LED. Commenting out SPI.begin() makes it work again, as of course does commenting out Serial1.end(). So it wasn't Serial1 on its own, but when used with SPI (which my project in question does)! Code below:

Code:
#include <SPI.h>


void setup()
{
  pinMode(13, OUTPUT);


  SPI.begin();


  Serial1.end();
}


void loop()
{
  digitalWriteFast(13, HIGH);
  delay(500);
  digitalWriteFast(13, LOW);
  delay(500);
}

Thanks for pointing out the changes in Serial1 since 1.42. Now, the question is how did that end up being affected by SPI?
 
Good work with the sample @PaulN82 - hopefully somebody who can debug it will look soon.

If you get bored waiting doing an unZIP install of 1.44 to see it work and then 1.45 to see it fail would confirm if it is related to the Serial change - if it doesn't work like that then maybe SPI or something else changed.
 
With this code the problem has nothing to do with Serial1...

The simple issue with this code is:
SPI.begin - By default pins: 11, 12, 13 are initialized to be in SPI mode

I.e. pin 13 is not in digital output mode. So the digitalWrites to 13 won't work!
 
Ok, this is rather bizarre. Now I cannot reproduce the problem anymore, after being stuck with it for an entire day. KurtE is correct, but why on Earth did, at that time, commenting out Serial1.end() make it work? Now that has no effect and the problem is indeed writing to pin 13 in the loop. I just went back to my original project, uncommented Serial1.end(), and hey now it all works fine as before! Just to be sure, I have fully quit and restarted the IDE and Teensy loader, recompiled the code and re-uploaded and it continues to work fine. Unbelievable! A little concerning, I wonder what could have caused this? Something must have entered a bad state and affected the compiler output, and it just happened to involve Serial1. Any thoughts on what this could have been, so it doesn't happen again....

Fortunately I preserved the HEX file from the "bad" compile episode and have just now compared the two. There are 3 lines that are different out of 3776. Remember this is the same text code, same compile options! The lines that are different are 2812, 2817, 2818 (starting from 1):

"erroneous" HEX on left, good on right:

2812 :10AFB0004FF45041C4F2030148F2E822C5F6EF42D3 :10AFB0004FF45041C4F2030143F29862C5F6F042E7
2817 :10B00000C4F203040021196021604FF4504448F257 :10B00000C4F203040021196021604FF4504443F25C
2818 :10B01000E820C4F20304C5F6EF4020601020186059 :10B010009860C4F20304C5F6F04020601020186068

I don't know if you can get anything useful out of this but, there it is :D
 
It's not quite over yet, although the Teensy no longer hangs at Serial1.end(), I do not receive any data from it upon request over Serial1. I comment out Serial1.end() and then communications are back again. Defragster, going back to your first comment, I will try to minimally emulate this on one Teensy using a crossed link between Serial 1 and 2.
 
Update: Cannot reproduce problem on minimal equivalent code. Suggests a more complex interaction with the rest of my code.

I can however demonstrate that the cause of it from updating 1.42 to 1.46 is indeed those two new lines of code involving the addition of UART0_S1 and UART0_D from Oct '18 in Serial1.c. If I comment those two lines out, restart IDE, re-compile and upload, all good as before. If I revert, repeat, I get no serial data back on request. Going one step further, I can confirm that only the line UART0_D is causing the problem, not UART0_S1. Any thoughts on why this might be? How could reading that data register potentially cause a problem there? I do wonder if anyone else is being affected by this, particularly those starting new on 1.44+ and not realising what's changed? I could seamlessly change baud rate before, but now with this update I can't (get an erroneous transmission first time, then ok). For now I will just not call Serial1.end() (and looking more closely at it, I didn't really need to to change baud rate anyway!).

Another possible clue, Frank B noted that serial_end should return if Serial.begin has not been called before, so what else in my code could be altering the values in "if (!(SIM_SCGC4 & SIM_SCGC4_UART0)) return"? Do any other standard libraries change either of those? I'm only using EEPROM and SPI, and one of my classes includes Arduino.h.
 
Back
Top