Serial Problem

Status
Not open for further replies.
setup() code needs to init the baud or the _FE test faults the processor because Serial1 is 'offline':
Yes, I've been using a flag to get round that problem:
Code:
void setBaud(int rate) {
  curr_baud = rate;
  Serial.println("*** Setting " + String(curr_baud) + " baud ***");
  Serial1.begin(curr_baud);
  [COLOR="#B22222"]serial1 = true;[/COLOR] // Set flag to enable reporting //
}

void checkUART() {
  if ([COLOR="#B22222"]serial1[/COLOR]) {
    if (UART0_S1 & UART_S1_FE) {
      Serial.println("UART0_S1: " + String(UART0_S1));
    }
  }
}

  ...
  Seial1.end();
  [COLOR="#B22222"]serial1 = false;[/COLOR] // Clear flag to prevent crash! //
  ...
We might want to end the serial connection to fix it before starting it up again (while leaving periodic reporting on).
 
Seemed odd/wrong not to have the Serial1 baud set in setup() - so I put it there. Code as posted must somehow set it up in the non-testing case - but I didn't try to follow.

I didn't put counters in but my updated loop() version above still running - going into and out of _FE state in batches - short sample below.

@microderm - for your use case - when a _FE is detected it suggests the need to change the baud rate. I didn't put that in my code but suspect since I'm holding it at 9600 baud for my 115K GPS that anytime a _FE appears I should just change the baud and it would work without error.

I'm not sure what the goal of a fix from PJRC would be? Hardware by design detects _FE when the incoming bit stream indicates it happened. As long as it recovers - as it seems to - then the solution is to properly match the baud rates to the attached device. The only reason the problem persists - is because it hasn't been corrected. <edit> In this case from mismatched baud rate - not a wiring or hardware problem.

Here is one short segment of the output after it was set 'testing-false;'. Where Loop prints alone there is no _FE, when there is a _FE it shows the baud rate and the state of the incoming bytes:
Code:
*** Loop *** 

*** Loop *** 

*** Loop *** 
*** CURRENT 9600 baud ***

serialCheck(avail: 0, count: 5033) = ok


*** Loop *** 

*** Loop *** 

*** Loop *** 

*** Loop *** 
*** CURRENT 9600 baud ***

serialCheck(avail: 0, count: 56) = ok


*** Loop *** 
*** CURRENT 9600 baud ***

serialCheck(avail: 0, count: 12) = ok


*** Loop *** 
*** CURRENT 9600 baud ***

serialCheck(avail: 0, count: 11) = ok


*** Loop *** 

*** Loop *** 
*** CURRENT 9600 baud ***

serialCheck(avail: 0, count: 40) = ok
 
Put in some counters - in this case the _FE showed up on Loop #2 in 'Sequence #1', then it failed 68 times in a row, before recovering twice and then failing again.

The counts and recurrence of _FE coming and going seems to be determined by the bad baud data transmission - not a Teensy software problem AFAIK.

*** Loop *** # 68
*** Loop *** # 2
FRAMING ERROR !! AT>>61 _FE # 1

serialCheck(avail: 0, count: 606) = ok

*** Loop *** # 3
*** CURRENT 9600 baud *** _FE # 2 [_FE recover # 0

// continued failure here

*** Loop *** # 69
*** CURRENT 9600 baud *** _FE # 68 [_FE recover # 0

serialCheck(avail: 0, count: 18) = ok

*** Loop *** # 70
*** Loop *** # 71
*** CURRENT 9600 baud *** _FE # 69 [_FE recover # 1

serialCheck(avail: 0, count: 51) = ok

// continued cycling here

*** Loop *** # 3594
*** CURRENT 9600 baud *** _FE # 808 [_FE recover # 2785

serialCheck(avail: 0, count: 13) = ok

*** Loop *** # 3595
*** Loop *** # 3596
*** Loop *** # 3597
*** Loop *** # 3598
*** Loop *** # 3599
*** CURRENT 9600 baud *** _FE # 809 [_FE recover # 2789

serialCheck(avail: 0, count: 99) = ok

*** Loop *** # 3600
 
@microderm - for your use case - when a _FE is detected it suggests the need to change the baud rate.
Good idea. I used that concept when I wrote my baud rate verification function:
Code:
/*
 * Verifies whether the connection produces the expected data.
 * @return whether the connection produces the expected data.
 */
bool verifyConnection() {
    char ch = 0;
    bool success = false;
    elapsedMillis elapsed;
    while (elapsed < 2500) {
        // Serial baud test //
        if (UART0_S1 & UART_S1_FE) {
            serial->end();
            success = false; // Wrong speed //
            break;
        } else if (serial->available()) {
            ch = serial->peek();
            if (ch == '$') {
                success = true; // Right speed //
                break;
            } else if (ch == '\0') {
                success = false; // Wrong speed //
                break;
            }
            ch = serial->read();
        }
    }
    return success;
}
This determines the validity or error as quickly as possible without going to hardware level (bit timing etc.). When the rate is wrong I set a flag in EEPROM and restart Teensy so that it starts up with the correct rate straight away. Problem-free solution.

I'm not sure what the goal of a fix from PJRC would be?
Good question. What is the purpose of all this? Not many people have been affected, maybe I just write bad code and deserve to have problems?

Let me just ask this one question: How long does Serial need to be exposed to the wrong baud rate to cause it to be irreversibly broken (without restarting Teensy)?

In the example code that breaks it, we get the impression that it takes a bit of work to break it. We are happy with the knowledge that no-one will ever actually write code like that, so the problem is purely academic. That's a strong argument. I'm almost convinced ... until I remember why I'm here on the forum - I have seen the problem appear instantly with no time for code to adapt. I have seen it go wrong from a cold start, where I could guarantee what the starting conditions were, and where the code was right for the situation (starting at 9600, request a rate change, switch to 115200 etc.). Yet it still failed.

I wrote a new test to illustrate. This one uses the ideas above to test what happens if we just briefly dip in to a data stream at the wrong rate. As soon as we detect that things are not going well, we change back to the correct rate. It should work, it's quite passive, providing the minimum stress to the system. Yet, it doesn't work! See code in post below:
 
New test build notes:

- I'm using the board set up from the image of post #23, except I removed the buttons, so auto stopping the test cycle when gotcha found.
- Briefly dips into data stream at wrong rate. Switches rate back to correct speed as soon as problems detected.
- Must prevent repeated recursion when toggling rate by using a recurred flag, otherwise we get an infinite recursive rate toggling loop. This is because after errors, each rate change continues to reveal errors.

See output below:
Code:
*** Setup init *** 
*** Setup complete *** 

[900] *** Loop *** 
************************************************************
******************** Restart Sequence 1 ********************
************************************************************
[3400] *** Setting 9600 baud ***
[time] (baud) packet-data
[4261] (9600) $PGACK,103*40
[4274] (9600) $PGACK,105*46
[4290] (9600) $PMTK011,MTKGPS*08
[4311] (9600) $PMTK010,001*2E
[4328] (9600) $PMTK010,002*2D
[5164] (9600) $GPGGA,235942.800,,,,,0,00,,,M,,M,,*7B
[5206] (9600) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[5235] (9600) $GPRMC,235942.800,V,,,,,0.00,0.00,050180,,,N*42
[5286] (9600) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[6173] (9600) $GPGGA,235943.800,,,,,0,00,,,M,,M,,*7A
[6215] (9600) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[6244] (9600) $GPRMC,235943.800,V,,,,,0.00,0.00,050180,,,N*43
[6295] (9600) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[6782] *** Setting 115200 baud ***
[time] (baud) packet-data
[7179] (115200) $GPGGA,235944.799,,,,,0,00,,,M,,M,,*72
[7183] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[7185] (115200) $GPRMC,235944.799,V,,,,,0.00,0.00,050180,,,N*4B
[7189] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[7193] (115200) $PMTK001,220,3*30

serialCheck(avail: 0, count: 571) = ok

**********************************************************
******************** Toggle Baud Rate ********************
**********************************************************
[7679] *** Setting 9600 baud ***
[time] (baud) packet-data
*** Rate Problem Detected: Changing back to 115200 baud ***
[8084] *** Setting 115200 baud ***
[time] (baud) packet-data

******************** Gotcha! ********************
*******************************************************
******************** Stop Testing! ********************
*******************************************************


[10084] *** Loop *** 
[11085] *** Setting 9600 baud ***

[11085] *** Loop *** 
[12085] *** Setting 115200 baud ***

[12085] *** Loop *** 
[13085] *** Setting 9600 baud ***
First case of wrong rate, we correct rate immediately, but it is too late. It never recovers! That's why I restart Teensy when wrong rate detected.

And if you're not convinced by that, ask yourself this question: Shouldn't restarting a broken serial connection sufficiently fix it to allow it to function properly once the conditions are right?
I'm not sure what the goal of a fix from PJRC would be?
I think the answer to that question is simply to have calling serial.end(), serial.begin() sufficiently reset the internal flags etc. so that they no longer remember that there were once problems. We should teach them to forgive. A forgiving serial port leads to happy users.
 

Attachments

  • Teensy_Test_Cycle.zip
    2.4 KB · Views: 56
Based only on the other thread test by Paul the answer to
Let me just ask this one question: How long does Serial need to be exposed to the wrong baud rate to cause it to be irreversibly broken (without restarting Teensy)?
seems to be just one set of 'triggering' bits presented to quilify as a _FE. Some how the GPS triggers that with some passing regularity.

What happens if " serial_clear(void) " is called before and/or after changing baud rate? [or before .end() and after .begin()] That dumps the FIFO. Not sure what the CPU does with FIFO bytes on .end? Or with the byte in progress? Software waits for transmit to end before closing - but doesn't call serial_clear() . And .begin() doesn't seem to take note of prior .begin leaving the hardware in use before setting up during .begin?
 
I'm working with this again today. Running a trimmed down version of the code from msg #23.

file.png

After watching this waveform over and over, the problem appears to happen when the enable signal goes low during a time of incoming data. Still working on figuring out why.

The green trace is generated by a couple digitalWriteFast lines I added in interrupt function in serial1.c.

I added "delayinc" which really helps prevent the scenario where many runs don't reproduce the error.

Code:
// https://forum.pjrc.com/threads/53941-Serial-Problem

#define LED          13
#define LMP          28
#define GPS_EN       24
#define PMTK_SET_BAUD_115200      "$PMTK251,115200*1F"
#define PMTK_SET_DATA_100MS       "$PMTK220,100*2F"
#define PMTK_SET_FIX_100MS        "$PMTK300,100,0,0,0,0*2C"

int type_time = 0;
int curr_baud = 0;
int sim_count = 0;
int serial1_count = 0;
bool testing = true;
String data_type1;
int delayinc = 0;

void setup() {
  pinMode(22, OUTPUT);  // Blue trace - when loop delays
  pinMode(23, OUTPUT);  // Green trace - when Serial1 interrupt runs
  pinMode(LED, OUTPUT);
  pinMode(LMP, OUTPUT);
  pinMode(GPS_EN, OUTPUT);
  Serial.begin(19200);
  digitalWrite(LMP, LOW);
  Serial.println("*** Setup complete *** ");
}

void loop() {
  Serial.println("\n*** Loop *** ");
  sim_count++;
  Serial.println("********* Restart Sequence " + String(sim_count) + " ************");
  // (1) Reset //
  serial1_count = 0;                      // Counter zero
  Serial1.end();                          // Serial1 initially off
  digitalWrite(GPS_EN, LOW);              // Stop GPS
  delay(250);                       // Off period (problem free when set to 2500 ms)
  // (2) Start //
  digitalWrite(GPS_EN, HIGH);             // Start GPS
  setBaud(9600);                          // Read at 9600
  delay(2000);                      // Allow time to receive data
  // (3) Rate-change //
  Serial1.println(PMTK_SET_BAUD_115200);  // Set GPS to 115200 baud
  Serial1.flush();                        // Ensure message sent
  delay(1);                         // Allow to complete
  setBaud(115200);                        // Align data rates
  Serial.println("*** PMTK_SET_DATA_100MS ***");
  // (4) Settings //
  Serial1.println(PMTK_SET_DATA_100MS);   // Set GPS output to 10Hz
  // (5) Run //
  delay(500);                      // Allow time to receive data
  serialCheck(); // Check Health: should have received data by now!
digitalWriteFast(22, HIGH);
  delay(250); 
digitalWriteFast(22, LOW);
  setBaud(9600);  // Restore Serial1 to 9600 baud - will fail to read incoming 115200 baud data //
  delay(250 + delayinc);
  delayinc += 7;
}

void setBaud(int rate) {
  curr_baud = rate;
  Serial.println("*** Setting " + String(curr_baud) + " baud ***");
  Serial.println("[time] (baud) packet-data");
  Serial1.begin(curr_baud);
}

/*
 * Reads incoming data.
 */
void serialEvent1() {
  char ch;
  while (Serial1.available()) {
    serial1_count++;
    ch = Serial1.read();
    if (ch == '$') {
      data_type1 = "$";
      type_time = millis();
    } else if (ch == '\r') {
      // nothing //
    } else if (ch == '\n') {
      Serial.println("[" + String(type_time) + "] (" + String(curr_baud) + ") " + data_type1);
    } else {
      data_type1 += ch;
    }
  }
}

/*
 * Checks Serial1 data status and buffer contents. Lights external LED when no data.
 */
void serialCheck() {
  Serial.println();
  if (!serial1_count && !Serial1.available()) {
    Serial.println("******************** Gotcha! ********************");
    digitalWrite(LMP, HIGH); // ON //
        while (1) ; // stop here
  } else {
    Serial.println("serialCheck(avail: " + String(Serial1.available()) + ", count: " + String(serial1_count) + ") = ok"); 
  }
  Serial.println();
  serial1_count = 0; // Reset counter //
}
 
After much fiddling, I believe I've managed to make this highly reproducible.

Defragster, Microderm, can you run this code on your boards and check it reliably gives this result?

sc.png

It should always stop with "Gotcha!" on Loop 2, and those 10 lines should always print on Loop 1.

Code:
// https://forum.pjrc.com/threads/53941-Serial-Problem

#define LMP          28
#define GPS_EN       24
#define PMTK_SET_BAUD_115200      "$PMTK251,115200*1F"
#define PMTK_SET_DATA_100MS       "$PMTK220,100*2F"
#define PMTK_SET_FIX_100MS        "$PMTK300,100,0,0,0,0*2C"

int type_time = 0;
int curr_baud = 0;
int loopcount = 0;
String data;
elapsedMillis msec;
volatile boolean GPGGA=false, GPRMC=false, GPVTG=false;

void setup() {
  pinMode(23, OUTPUT);  // Green trace - when Serial1 interrupt runs
  pinMode(22, OUTPUT);  // Red trace - when we do bad stuff
  pinMode(LMP, OUTPUT);
  digitalWrite(LMP, LOW);
  while (!Serial) ; // wait
  Serial.println("Begin Test");

  // configure the GPS just once for 115200 baud and 10 Hz
  pinMode(GPS_EN, OUTPUT);
  digitalWrite(GPS_EN, LOW);
  delay(1000);  // ~900 needed to get GPS to forget prior settings
  digitalWrite(GPS_EN, HIGH);
  msec = 0; 
  Serial1.begin(9600);
  curr_baud = 9600;
  delay(1000);
  Serial1.println(PMTK_SET_BAUD_115200);
  delay(100);
  Serial1.begin(115200);
  curr_baud = 115200;
  Serial1.println(PMTK_SET_DATA_100MS);
  delay(750);
  digitalWrite(GPS_EN, LOW);
  delay(5);
  digitalWrite(GPS_EN, HIGH);
}

void loop() {
  Serial.printf("*** Loop %d ***\n", ++loopcount);
  Serial1.begin(115200);
  msec = 0;
  
  // wait for one full set of data
  GPGGA = GPRMC = GPVTG = false;
  while (!GPGGA && !GPRMC && !GPVTG) {
    yield(); // wait 
    if (msec > 2000) gotcha();
  }
  
  // wait for first line of data
  GPGGA = GPRMC = GPVTG = false;
  while (!GPGGA) yield(); // wait
  
  // do bad things while the 2nd line is arriving
  digitalWriteFast(22, HIGH);
  Serial1.begin(9600);
  delayMicroseconds(6050);  // 498 to 11620
  Serial1.end();
  digitalWriteFast(22, LOW);
  
  digitalWrite(GPS_EN, LOW);
  delay(5);
  digitalWrite(GPS_EN, HIGH);
}

/*
 * Reads incoming data.
 */
void serialEvent1() {
  char ch;
  while (Serial1.available()) {
    ch = Serial1.read();
    if (ch == '$') {
      data = "$";
      type_time = msec;
    } else if (ch == '\r') {
      // nothing //
    } else if (ch == '\n') {
      Serial.println("[" + String(type_time) + "] (" + String(curr_baud) + ") " + data);
      if (data.startsWith("$GPGGA")) GPGGA = true; 
      if (data.startsWith("$GPRMC")) GPRMC = true; 
      if (data.startsWith("$GPVTG")) GPVTG = true; 
    } else {
      data += ch;
    }
  }
}

void gotcha() {
  Serial.println("******************** Gotcha! ********************");
  digitalWrite(LMP, HIGH); 
  while (1) ; // stop here
}
 
Stops as suggested - 10 lines - have some alt info.
*** Loop 1 ***
[579] (115200) $PGACK,103*40
[581] (115200) $PGACK,105*46
[582] (115200) $PMTK011,MTKGPS*08
[584] (115200) $PMTK010,001*2E
[722] (115200) $PMTK010,002*2D
[1447] (115200) $GPGGA,170313.256,4812.8669,N,12227.0136,W,1,04,1.88,41.6,M,-17.0,M,,*6B
[1453] (115200) $GPGSA,A,3,29,05,16,13,,,,,,,,,2.11,1.88,0.95*06
[1458] (115200) $GPRMC,170313.256,A,4812.8669,N,12227.0136,W,0.40,20.40,181018,,,A*47
[1464] (115200) $GPVTG,20.40,T,,M,0.40,N,0.74,K,A*0C
[1534] (115200) $GPGGA,170313.356,4812.8742,N,12227.0184,W,1,04,1.88,41.6,M,-17.0,M,,*6B
*** Loop 2 ***
******************** Gotcha! ********************
 
@Paul - It happens on my machine!

Code:
Begin Test
[596] (9600) $PGACK,103*40
[609] (9600) $PGACK,105*46
[625] (9600) $PMTK011,MTKGPS*08
[646] (9600) $PMTK010,001*2E
[663] (9600) $PMTK010,002*2D
[1114] (115200) $PMTK001,220,3*30
[1415] (115200) $GPGGA,235942.800,,,,,0,00,,,M,,M,,*7B
[1419] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[1421] (115200) $GPRMC,235942.800,V,,,,,0.00,0.00,050180,,,N*42
[1425] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[1515] (115200) $GPGGA,235942.900,,,,,0,00,,,M,,M,,*7A
[1519] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[1521] (115200) $GPRMC,235942.900,V,,,,,0.00,0.00,050180,,,N*43
[1525] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[1615] (115200) $GPGGA,235943.000,,,,,0,00,,,M,,M,,*72
[1619] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[1621] (115200) $GPRMC,235943.000,V,,,,,0.00,0.00,050180,,,N*4B
[1626] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[1715] (115200) $GPGGA,235943.100,,,,,0,00,,,M,,M,,*73
[1719] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[1721] (115200) $GPRMC,235943.100,V,,,,,0.00,0.00,050180,,,N*4A
[1726] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[1816] (115200) $GPGGA,235943.199,,,,,0,00,,,M,,M,,*73
[1819] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[1822] (115200) $GPGSV,1,1,00*79
[1823] (115200) $GPRMC,235943.199,V,,,,,0.00,0.00,050180,,,N*4A
[1828] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
*** Loop 1 ***
[575] (115200) $PGACK,103*40
[576] (115200) $PGACK,105*46
[577] (115200) $PMTK011,MTKGPS*08
[579] (115200) $PMTK010,001*2E
[636] (115200) $PMTK010,002*2D
[1401] (115200) $GPGGA,235942.800,,,,,0,00,,,M,,M,,*7B
[1404] (115200) $GPGSA,A,1,,,,,,,,,,,,,,,*1E
[1407] (115200) $GPRMC,235942.800,V,,,,,0.00,0.00,050180,,,N*42
[1411] (115200) $GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32
[1501] (115200) $GPGGA,235942.900,,,,,0,00,,,M,,M,,*7A
*** Loop 2 ***
******************** Gotcha! ********************
 
Ok, here's a fix. Copy this to hardware/teensy/avr/cores/teensy3.

Indeed the problem is Serial1.end() isn't clearing error status bits that can result from additional bytes (or garbage - like bytes at the wrong baud rate) that have arrived into the FIFO but haven't been serviced by any interrupt. Serial1.end() shuts off the interrupt and shuts down the port. Then later when you call Serial1.begin(), it can't start up properly due to those status error bits.
 

Attachments

  • serial1.c
    18.9 KB · Views: 56
Up and running … Loop 150 and counting.

Cool - I made note of that.

Question - I didn't look at changed code - does a .begin after a .begin do any clearing with no .end in between when the port was already setup on entry?

<edit> Opened code - perhaps?
Code:
void serial_begin(uint32_t divisor)
{
[B]	if (SIM_SCGC4 & SIM_SCGC4_UART0)
	  serial_end();
[/B]	SIM_SCGC4 |= SIM_SCGC4_UART0;	// turn on clock, TODO: use bitband
 
Paul: I get the same output on the unfixed system. The new code runs nicely without problems.

I'll now test my more recent build to see whether .end() or just .begin() are sufficient etc.
 
Good.
The .end with timing then .begin is good and right in your sketch. I hacked the .begin calling .end and that wasn't right as posted :(

<edit>: And with that change going back to the :: Teensy_BareBones_Cycle.ino
With the testing as it was [without my changes to note _FE] is appears to be running fine so far.
 
Yes, my dipping (in and out at wrong speed) code (#55) is running well too. That code does not use the .end() command, so it appears not to need an explicit call to .end().

Loop 350 and counting...

Perfection! I'll remove the restart code from my main build and test it in the field...

Thanks guys!
 
The test of prior code I ran is up to 'Restart Sequence 1201' with testing and no GOTCHA as it was written.

@microderm: Without the .end() - there won't be any change in the prior behavior should it occur. Only the new code in .end() will prevent any FIFO or error carry over to the following .begin().
 
Status
Not open for further replies.
Back
Top