Very weird network issue / bug

AndyA

Well-known member
No complete code / example because the system is way too big and complex :-( So not looking for a solution, more a pointer in the right direction if anyone has any ideas.

I have a Teensy4.1 application that streams data over the network from a desktop application.
It is using the QNEthernet and Teensy41_AsyncTCP libraries.

There are two network connections allowed, each using a different TCP port. Each runs a server, waits for a client connection and then receives data on that network port and writes it into a buffer.
Flow control packets are sent back from the teensy to the data source using the same TCP connection, these indicate the current state of the buffer and whether more data is needed. The data rate is around 16 Mbits/s per port.

99% of the time everything works fine.
However every now and then I get a weird error. I open both data ports but only send data on one of the two. I then log all the network traffic on wireshark.
On the second (idle) port I get flow control messages indicating the buffer is empty and needs data.
On the first port I get some flow control messages indication the buffer is full and some indicating it is empty.

I've added a debug printout to the code that sends the flow control packets to print out the AsyncClient TCP port number whenever it is setting the bytes used value to 0. It only ever prints out the correct value. In other words as far as I can tell it's not a bug in my code, I'm transmitting the data on the correct ports but somehow the network code is sending it on the wrong port.

This looks to be somewhat network packet timing related, I have two possible applications to source the data, it is more likely to happen with one than the other.

Anyone seen anything like this before where these libraries (or the underlying lwIP) with multiple open connections sends data on the wrong connection? Or any ideas how to go about trying to figure out what's going on here?
 
OK problem solved - posting here just in case someone else hits a similar issue in the future.
The transmit code was along the lines of:
Code:
bool sendMessage() {
  uint8_t dataBuffer[dataLength];
  [populate data buffer]
  myAsyncClient->add(dataBuffer,dataLength); 
  return myAsyncClient->send();
}

When that code exits the underlying network code has not yet made a copy of the data to send. If the memory location used for dataBuffer is used for something else the transmitted data is corrupted.
Since I was calling this for two different ports in rapid succession the first port often ended up receiving the data intended for the second port.

I changed the transmit buffer from being a local variable to being a private member of my port class and the problem went away.
 
Note that, other than the PHY initialization code, that Teensy41_AsyncTCP library doesn’t use QNEthernet. It only uses the included underlying lwIP stack.
 
Last edited:
Back
Top