Sporadic multi second freezes when calling client.stop() with nativeEthernet

Status
Not open for further replies.

bvanommen

New member
Hello everyone!

I am using a Teensy 4.1 as a server, it can receive commands (separated by \n) over ethernet using the TCP protocol. I have an issue where sometimes the server never responds to a command, or it takes a long time.
I have been able to reduce the server to the following code, which is simply an echo server:
Code:
#include <NativeEthernet.h>

// mac address 
byte mac[] = { 0xBE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
// Ip address 
byte ip[] = { 192, 168, 1, 177 };
int tcp_port = 5025;
char packetBuffer[256];
EthernetServer server = EthernetServer(tcp_port);

void setup()
{
  // initialize the ethernet device
  Ethernet.begin(mac, ip);

  // start server for listenign for clients
  server.begin();
}

void loop()
{
  // if an incoming client connects, there will be bytes available to read:
  EthernetClient client = server.available();
  while (client.available()) {
    unsigned char read_length = client.readBytesUntil('\n',packetBuffer,256);
    // read bytes from the incoming client and write them back
    // to the same client connected to the server
    
    client.write(packetBuffer,read_length);
    //--------------problematic part BEGIN//
    if (strcmp("CLOSE",packetBuffer) == 0){
        client.write("CLOSE OK\n",9);
        client.stop(); // problematic line
    }
    //--------------problematic part END//
  }
}
The problem is in the client.stop() call. I will preface that I don't know much about TCP, or networking control flow in general.
If I run the server without the problematic part, then the echoing works fine, but for every transaction the Teensy throws a TCP RST packet, as shown in the Wireshark screenshot below:
TCPRST.png
This feels wrong to me, and also, the environment I plan to run the device in throws an exception if this RST packet is received. My solution was the code in the problematic part. The client sends "CLOSE\n" when it is done with the socket, and will send no more commands. Then, the server responds with "CLOSE OK\n", and both parties can close the socket on their end. This works well most of the time, as is shown in the following Wireshark screenshot:
TCPGOOD.png
No RST packets, and both sides seem happy. There is one problem however: sporadically, the Teensy waits multiple seconds before replying and closing the socket. Sometimes no reply is sent at all. If I comment out the client.stop() part this does not occur. Two examples of this in Wireshark: (packets 26643 and 2664 have ~1.7s between them)
TCPBAD1.png
and here another instance (~2.9s):
TCPBAD2.png

Here is the python code I use to send the TCP packets, for easy reproduction.
Code:
import asyncio
import time
from timeit import default_timer as timer
async def tcp_echo_client(message):
    reader, writer = await asyncio.open_connection(
        '192.168.1.177', 5025,limit = 10000000)
    writer.write(message + b"CLOSE\n")
    await writer.drain()
    try:
        data = await asyncio.wait_for(reader.read(100),timeout = 5)
    except asyncio.TimeoutError:
        data = "0"
    try:
        await asyncio.wait_for(reader.read(9),timeout = 5) # CLOSE OK\n
    except asyncio.TimeoutError:
        pass
    writer.close()
    return data

def main():
    manual = """Client example."""
    print(manual)
    while(1):
        print("""Type a duration, and press enter to start sending messages.
        Type 0 to exit.""")
        cmd = int(input())
        delaytime = 0.02
        if cmd == 0:
            exit()
        i = 0
        times = []
        maxdt = 0
        start = timer()
        while i < cmd:
            i += 1
            reply = asyncio.run(tcp_echo_client(b"TESTDATA\n"))
            current_time = timer()
            time_diff_since_start = current_time - start
            times.append(time_diff_since_start)
            if i > 1:
                dt = abs(times[i-2] - time_diff_since_start)
                if dt>maxdt:
                    maxdt = dt
            print(time_diff_since_start)
            print(reply)
            time.sleep(delaytime)
        print(f"Maximum time delta was:{maxdt}")
main()
My test setup is a USB-ethernet dongle connected to my PC, with the ethernet cable going to the Teensy. I highly doubt the connection can be the problem, both because without the problematic code everything works fine, and a pingtest with powershell Test-Connection reveals a ping time of 1 ms or 0 ms (great job on the rounding microsoft, I highly doubt it really takes 0ms...).

I suppose my question is two-fold:
  1. What is the proper way of resolving the closing of the socket on both ends? Am I on the right track with my "CLOSE OK\n" message indicating it's OK to close the socket, or should I think of something else/more robust?
  2. Why does the Teensy freeze up in this case?
Especially question 2 I think is interesting for this forum, which I why I have decided to post. Of course, help with question 1 is also greatly appreciated :).

Thanks for taking the time to read my post!
 
Thanks a lot, the slow response is now fixed! I saw this startup.c fix before but it didn't seem to work for me, but probably I applied in the wrong way.
Bad news: I have now identified another error. Without client.stop() everything again seems to work fine. With it however, it seems the Teensy randomly panicks and sends a TCP RST packet:
TCPRSTEARLY.png
Sometimes this already happens after a few seconds, other times it might take a minute. Also, sometimes I just get no response at all:
10sBAD.png
The 10 seconds is the timeout of my python programme running out. This error seems more severe, as the socket stays open forever on the Teensy side; it never sends a TCP RST, nor a FIN message.

Any ideas? I am still running a test in the background to see if the 10s no reply bug also occurs without client.stop().

EDIT: The error also occurs without client.stop(). The way it happened this time was: first the TCP RST error as shown above. Then after that suddenly every few seconds the timeout error occurs, where Teensy still acknowledges packets through TCP, but somehow the packets don't seem to reach the code I'm running. Reprogramming the Teensy "clears" the second error.
A shot in the dark: could there be a memory leak in nativeEthernet, that causes the device to run low on memory? This would explain why the errors persist after they have occured once.
EDIT2: I can confirm the above again. First the TCP RST packet as a response to some data being pushed, and then the non-responses every few seconds after that.
 

Attachments

  • 10sBAD.png
    10sBAD.png
    19.1 KB · Views: 35
  • TCPRST.png
    TCPRST.png
    28.3 KB · Views: 38
Last edited:
Additional info:
I ran some tests where I only open 1 socket and then send a few thousand messages while keeping the socket open. This has so far not given any of the previous errors yet. Maybe the errors I experienced were related to opening and closing sockets too often?

Anyway, I'll just work with one socket that's open all the time, as in my situation there should only ever be one device connected anyway, and only on one port.
 
Status
Not open for further replies.
Back
Top