So what I observe happening, after some hours (always less than 24): the server stops responding to requests. I can trace the request up to the router which passes it to the WIZnet node, and I can see SPI talking, both MOSI and MISO are still alive but all http request are getting ignored. I have a Total Phase Beagle SPI sniffer; I just wish it could decode the meaning of the SPI data.
... so my interest in helping on this is keen and urgent.
I have no idea if this is related but I experienced a similar behavior.
I luckily had included a Telnet connection that allowed me to examine the socket status of all the WizNet sockets. I found that slowly all of the available sockets would become "stuck". They would usually be stuck in "Close Wait" status and the library would not assign (re-use) a socket locked in "Close Wait". The library I was using did not have a time-out to forcibly close a socket stuck in "Close Wait" after some suitable period.
I could watch my free sockets get stuck one by one until finally the system became unresponsive to any request.
It required a reboot to free the "stuck" sockets.
Note that there is/was some consideration for this case in the Teensy 1.6.12 ethernet library. See socket.cpp and note especially the #IF 0 related to releasing a socket in "Close Wait". Since this is a last ditch attempt to allocate a socket when no others are free it may be worth the risk of not having all data flushed... In my case, it certainly would have been ok.
A simple patch to socket.cpp in the ethernet library that enables this option may be a test tool to discover if a similar condition is causing your issue.
Code:
uint8_t socketBegin(uint8_t protocol, uint16_t port)
{
uint8_t s, status[MAX_SOCK_NUM];
//Serial.printf("W5000socket begin, protocol=%d, port=%d\n", protocol, port);
SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
// look at all the hardware sockets, use any that are closed (unused)
for (s=0; s < MAX_SOCK_NUM; s++) {
status[s] = W5100.readSnSR(s);
if (status[s] == SnSR::CLOSED) goto makesocket;
}
//Serial.printf("W5000socket step2\n");
// as a last resort, forcibly close any already closing
for (s=0; s < MAX_SOCK_NUM; s++) {
uint8_t stat = status[s];
if (stat == SnSR::LAST_ACK) goto closemakesocket;
if (stat == SnSR::TIME_WAIT) goto closemakesocket;
if (stat == SnSR::FIN_WAIT) goto closemakesocket;
if (stat == SnSR::CLOSING) goto closemakesocket;
}
#if 0
Serial.printf("W5000socket step3\n");
// next, use any that are effectively closed
for (s=0; s < MAX_SOCK_NUM; s++) {
uint8_t stat = status[s];
// TODO: this also needs to check if no more data
if (stat == SnSR::CLOSE_WAIT) goto closemakesocket;
}
#endif
SPI.endTransaction();
return MAX_SOCK_NUM; // all sockets are in use
In any event, it may be enlightening for your problem to add a socket status monitor feature to watch your free socket count come and go as the system responds to various requests.
The code snippet I used to monitor and watch this happen is
Code:
static const char *SnMr[] = {"Close", "TCP", "UDP", "IPRAW", "MACRAW"};
char socStatus[7];
for (uint8_t i = 0; i < 8; i++) {
switch (socketStatus(i)) {
case 0x00:
sprintf(socStatus, "Closed");
break;
case 0x14:
sprintf(socStatus,"Listen");
break;
case 0x17:
sprintf(socStatus, "Establ");
break;
case 0x1c:
sprintf(socStatus, "ClWait");
break;
default:
sprintf(socStatus, "0x%02x ", socketStatus(i));
break;
}
TelnetServer.printf("Socket(%d) SnSr = %s SnMR = %s\r\n", i, socStatus, SnMr[W5100.readSnMR(i)]);
}
Using this simple monitoring tool, I also quickly discovered that under some conditions I could run out of available sockets when using the default MAX_SOCK_NUM of four sockets.... This was a separate problem from the "Stuck in Close Wait" issue. It was just nice to watch and verify that the system had the resources to "go the distance" and accomplish what I needed it to do.
As you can see, the above is only a code snippet as your case is no doubt different. Perhaps you have a local serial port that can periodically spew the socket status.
In my case I was snowed out of the remote site and could only Telnet in over the network until I foolishly created a "test case" that used the last socket and the system was unresponsive until the snow melted this spring.