Random bytes loss on USB serial port

Status
Not open for further replies.
Hi everyone,
we recently started to use some Teensy boards in various parts of our product. Everything is working great, but I have a random loss of bytes during USB communication.

The system I am working on is this:

Untitled Diagram.png

(NB: Serial7 is connected to Temperature Controller 7 of course, is a typing error)

I am programming both boards with PlatformIO and the Teensy framework, while on the Odroid I have a Python program that interface with USBs and is bouncing messages back and forth to allow communication between the boards.

Basically, the Teensy 3.6 wants to control the temperature controllers, so
  • it sends a message to the Odroid
  • the message is bounced on the Teensy 4.0 over USB
  • then bounced on the proper temperature controller on the hardware serial port
  • the answer of the temperature controller is then doing the path in the opposite direction

Odroid

On the Odroid, the Python program is continuously listening on the USB port and reads any message until "\n" arrive.
This is the relevant code:
Code:
class FastSerial(serial.Serial):

    def __init__(self, port, name, run=False, baudrate=4000000, timeout=2, dtr=False):
        self.buf = bytearray()
        super().__init__(port=port, baudrate=baudrate, timeout=timeout)
        self.name = name

        if dtr:
            self.dtr = 0
        
        if run:
            self.recv_qs = weakref.WeakSet([])
            self.send_q = Queue()
            self.q_lock = Lock()

            listener = Thread(target=self.listen, args=())
            listener.start()


    def readline(self, newline=b'\n'):
        i = self.buf.find(newline)

        if i >= 0:
            r = self.buf[:i+1]
            self.buf = self.buf[i+1:]
            return r

        while True:
            i = max(1, min(2048, self.in_waiting))
            data = self.read(i)
            i = data.find(newline)
            if i >= 0:
                r = self.buf + data[:i+1]
                self.buf[0:] = data[i+1:]
                return r
            else:
                self.buf.extend(data)


    def listen(self):
        name = self.name
        q_lock = self.q_lock
        recv_qs = self.recv_qs
        readline = self.readline

        while True:
            line = readline().decode()
            if line:
                for q in recv_qs:
                    with q.lock:
                        # Each private q has its private lock created
                        # by the serial interface.
                        q.put_nowait(name + '.' + line)

This same code, in a different program, is reading messages from an FTDI chip connected to a serial interface of an FPGA at 4.000.000 baud, without loosing data.
(Still, I am not discarding the option that this can be the source of loss)

Teensy 3.6

On the Teensy 3.6 side, I am writing serial messages with a strict format, and each message is ending with "\n". It will be difficult to isolate all the functions that are composing the message, for now I will post the main ones:


Code:
/// One of the functions that is sending a message
void TEC::read_setpoint() {
    size = TEC::encode_msg(&tecCmds.setpoint, 1, false, 0);
    Serial.write(msg, size);
}

// Encode the message for the temperature controller
uint8_t TEC::encode_msg(const uint8_t *cmd, uint8_t cmd_size, bool set, uint32_t val) {
    memcpy(msg, this->msg_address, MSG_ADDRESS_SIZE);
    *(msg + 3) = '-';
    *(msg + 4) = 'T';
    *(msg + 5) = '.';
    memcpy(msg + 6, cmd, cmd_size);
    cmd_size += 6;

    if (set) {
        cmd_size = TEC::to_digits(msg, cmd_size, val);
    } else {
        *(msg + cmd_size) = '?';
        cmd_size++;
    }

    *(msg + cmd_size) = '\n';
    cmd_size++;

    return cmd_size;
}

This Teensy is also doing some other things: it has a DMA on DAC1, it uses DAC2 and it has 2 interrupts to read ADCs and change some GPIO but all of this is not interfering with the USB port.

Teensy 4.0

On the Teensy 4.0 side, I catch the messages using the serialEvent function, and I put them into a circular buffer. From there they are sent when the temperature controller is ready.

Code:
void serialEvent() {
    while (Serial.available()) {
        uint8_t msgLen = Serial.readBytesUntil('\n', usb_buffer, 20);
        err = 0;

        switch (usb_buffer[0]) {
            case '1':
                err = circular_buf_put2(cmd_buffer_1, usb_buffer + 1, msgLen - 1);
                if (err) {
                    circular_buf_reset(cmd_buffer_1);
                    Serial.write("ooo.E.TEC-0-0: buffer 1 full\n");
                }
                break;
            case '2':
                // ...
                // all cases are equal
            default:
                Serial.write("ooo.E.TEC: error: ");
                Serial.write(usb_buffer, msgLen);
                Serial.write("\n");
                break;
        }
    }
}

Questions

So the problem that I see happening almost randomly is that, for example, if I am expecting two messages like "ooo.M.message 1\n" and "ooo.M.message 2\n", instead I read "ooo.M.message 1ooo.M.message 2\n" or even "ooo.M.messooo.M.message 2\n". Of course when this happens, communication fail.

I red some posts on this forum about Serial, and in some of them Paul was mentioning that the "print" and "println" will wait for other function calls to improve communication efficiency. Still, here I am using "write" because I am printing a composite message saved on an array.

So I am asking you some questions:
  • did you experienced any bytes drop when reading USB port with Python programs? Or other programs?
  • do you know which is the best way to interface with USB on a Linux PC? (Odroid has Ubuntu 18.04 for arm) - maybe C++ or C?
  • is there any performance difference between using "Serial.print" and "Serial.write"?
  • is there a more efficient/better way to read incoming USB messages?
  • do you have any idea on how to isolate different parts of the system to test them and see where the bytes drop happen?

Concerning this last question I was thinking to attach the Teensy to a serial monitor and see if I have bytes drops. Still, this is not testing the entire program because to advance in its internal state machine, the Teensy 3.6 expect answers and of course the serial monitor is not answering. So maybe you have a better idea.

If some code or details are missing, feel free to ask.
Thank you in advance for your help!
 
Yes I think you should determine where in the system the error is happening. Your description of the problem lead me to believe it is in the return data, Teensy4 --> Teensy3. Your examples seem to show the code sending the data out to the Teensy4 from the Teensy3.

Your serialEvent interrupt handler could perhaps be causing the issue. In the code you show, it is calling Serial.write. If the interrupt happens while the main loop of functions is also calling Serial.write, then that could lead to data corruption. ( unless the underlying arduino library protects the user from such an error ). If this were the case I would expect your corrupted data to look like "ooo.M.messooo.M.message 2\nage 1\n" showing that a interrupt happen while message 1 was being written in loop, message 2 interrupted, and then message 1 finished.
 
Thank you rcarr for your answer.

I see your point, but from what I red on the forum, "serialEvent" is not an interrupt but just a check that is happening at the start of every loop. In my code I have defined both "serialEvent" and all "serialEvent1" through "serialEvent7" to react to messages from the the temperature controller.

So, in my understanding, once inside the loop the program on the Teensy 4.0 is doing this:

  • check if should call "serialEvent"
  • check if should call "serialEvent1"
  • check if should call "serialEvent..."
  • check if should call "serialEvent7"
  • do other things in the "loop" function
  • repeat

In theory, no interrupt is present in the Teensy 4.0 code.

Nonetheless my interpretation can be wrong so I will look into it!
 
Yes you seem to be correct, although a quick look in core shows there are different ways to attach with different levels of functionality.
 
Indeed those serialEvent() and serialEvent#() functions are not interrupt driven, they are checked to be called on Every yield() call.

yield() is called after each loop() and also with every delay() in the code
 
Thank you rcarr and defragster for you answers.

yield() is called after each loop() and also with every delay() in the code

I didn't know that, I supposed that the function would be called only at the end of each loop. I think this can be helpful since I am using a single global buffer to store incoming and outgoing messages.

I will look into it and I will report back if I find any improvement.

Strangely enough VSCode was just finding the extern definition of serialEvent and not other related code that you are mentioning. I will try to explore the code in the core directory then.
 
A relatively easy test would be in the sketch define this:
Code:
void yield(){}

yield() is a weak function you can override with that

Then at the end of loop manually insert calls to the desired : serialEvent() code
 
I am writing after some time since probably I found the solution.

During my tests, the serial port was opened and closed by the Odroid on which I was testing the system, but the Teensy were not rebooted. When the port is closed for a big amount of time, I imagine the serial buffer becomes full and messages become corrupted.

To avoid this problems, I implemented three pieces of code:

(1) Inside the setup function wait until the serial port is opened.
Code:
while (!Serial) {}

(2) Every time a serial communication is attempted
Code:
if (Serial) {
    Serial.print(...);
}

(3) Periodically, inside the loop
Code:
if (!Serial) {
    Serial.clear();
}
This last piece of code is added in case the port is closed by the Odroid while the communication is happening, so that characters still in the buffer are cleared.

Hope it helps!
 
Status
Not open for further replies.
Back
Top