Optimization Fast/Faster/Fastest with/without LTO?

Status
Not open for further replies.
Back on the trail of vanishing sockets

I luckily had included a Telnet connection
<snip>
Code:
      TelnetServer.printf("Socket(%d) SnSr = %s SnMR = %s\r\n", i, socStatus, SnMr[W5100.readSnMR(i)]);
Do you have the rest of your telnet server setup code?

I'm back on the trail of this. Had to set aside the project Ethernet code to work on other things, like mfg test fixture hardware, but now I need to make Ethernet start working. Thanks!
 
Do you have the rest of your telnet server setup code?

Yes.. I do have the rest of my Telnet Server code but it would be confusing to post as it is intertwined with the rest of my specific application code.

I believe you would be better served by starting with a simple, bare bones Telnet Server example such as the included examples - ethernet - chatserver or a little more complex version with command decode capability https://gist.github.com/atomsfat/1813823 or any other simple, focused example of a Telnet Server that Google discovers.

HOWEVER, the original point of my reply to your situation was not to suggest that a Telnet Server was needed to investigate your periodic loss of Ethernet functionality but was to share that my difficulties were easily understood by simply watching the Ethernet socket status(es) as time progressed.

This socket status monitor could and probably should be used in a manner that does not depend on Ethernet resources from your application. After all, if your Ethernet subsystem locks up like you report then a Telnet based monitor tool would be locked up also......

Since your hardware is local, forget about a Telnet Server and simply use my debug monitor print statement in your main loop somewhere and periodically print the your Ethernet socket status to a local serial port like

Serial.printf("Socket(%d) SnSr = %s SnMR = %s\r\n", i, socStatus, SnMr[W5100.readSnMR(i)]);

of course you will need to add an appropriate delay between status prints and a for loop to examine all eight of the Ethernet sockets.

(I know the stock library is limited to four sockets but the hardware supports eight and I needed all eight in my application. Printing the status of unused sockets is not a fatal flaw of the technique.)

I'd bet you a doughnut that you will soon see that your Ethernet library, SPI, and related code are all working pretty much as expected and that the root problem is that you simply run out of free sockets because things happen and all of the available sockets get "stuck".

My solution was to identify this as the problem and in my case all the sockets were stuck in 'CLOSE_WAIT' status and therefore not available to be reused even though they were abandoned by their user.

See the test for CLOSE_WAIT in Ethernet.cpp noting especially the //TODO inside the #if 0 as isolated below

Code:
uint8_t socketBegin(uint8_t protocol, uint16_t port)
{
	uint8_t s, status[MAX_SOCK_NUM];

	//Serial.printf("W5000socket begin, protocol=%d, port=%d\n", protocol, port);
	SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
	// look at all the hardware sockets, use any that are closed (unused)
	for (s=0; s < MAX_SOCK_NUM; s++) {
		status[s] = W5100.readSnSR(s);
		if (status[s] == SnSR::CLOSED) goto makesocket;
	}
	//Serial.printf("W5000socket step2\n");
	// as a last resort, forcibly close any already closing
	for (s=0; s < MAX_SOCK_NUM; s++) {
		uint8_t stat = status[s];
		if (stat == SnSR::LAST_ACK) goto closemakesocket;
		if (stat == SnSR::TIME_WAIT) goto closemakesocket;
		if (stat == SnSR::FIN_WAIT) goto closemakesocket;
		if (stat == SnSR::CLOSING) goto closemakesocket;
	}
#if 0
	Serial.printf("W5000socket step3\n");
	// next, use any that are effectively closed
	for (s=0; s < MAX_SOCK_NUM; s++) {
		uint8_t stat = status[s];
		// TODO: this also needs to check if no more data
		if (stat == SnSR::CLOSE_WAIT) goto closemakesocket;
	}
#endif

After you identify the problem and if it is similar to what I encountered, you could try a quick solution by activating this disabled code (change #if 0 to #if 1) and see if you can reclaim the "stuck" sockets.

-------------------

A more complete snippit of the monitor print code that I used is included below.

This would need to be periodically called from somewhere in your main loop..

Code:
static const char *SnMr[] = {"Close", "TCP", "UDP", "IPRAW", "MACRAW"};
    char socStatus[7];

    for (uint8_t i = 0; i < 8; i++) {
      switch (socketStatus(i)) {
        case 0x00:
          sprintf(socStatus, "Closed");
          break;
        case 0x14:
          sprintf(socStatus,"Listen");
          break;
        case 0x17:
         sprintf(socStatus, "Establ");
          break;
        default:
          sprintf(socStatus, "0x%02x  ", socketStatus(i));
          break;
      }

      Serial.printf("Socket(%d) SnSr = %s SnMR = %s\r\n", i, socStatus, SnMr[W5100.readSnMR(i)]);
    }
 
Last edited:
One other very odd thing I saw again yesterday was the WIZ850io gets stuck in a mode with both LEDs apparently on solid. I can reset the Teensy, which has code to drive the WIZ850io line low for 1 msec (W5500 datasheet asks for 500 usec min Reset(L)). During this startup period after the HW reset, the LEDS go out but then when my TempServer app starts both LEDs come on solid again and the app can't get any data out the WIZ850io.

The only cure is to pull the power. Then TempServer app starts up again, only now it can init the WIZ850io and carry on, until it fails with socket issues. Tempserver is a minimal change to the provided demo Ethernet>Webserver which I should go back to. Tempserver/Webserver makes a very different use of sockets than say NTP_clock_set which runs for days.

So somehow it's possible to get the WIZ850io into a state where even a HW reset input won't recover it, it needs a POR. I'm baffled and concerned what this could be. I've seen this "both LEDs on, can't recover without power cycle" twice now, the other time running DhcpStressTest-T3 which normally runs for days, but does have a fair amount of exception recovery built in (and apparently I can't even be consistent about program naming, sorry about that). When it failed this time the IP address it reportedly got back was 0.0.0.0 but not according to my router, so it seems to be a WIZ850io communication failure, not a DHCP server failure.

In EthernetClass socket.cpp, line 42 and following, normally (e.g. NTP_clock_set) I see only socket begin (step 1) execution. I have Step 3 uncommented. When things fail (TempServer) there is a blast of repeated attempts at Step 3 and the response to a client seems to timeout due to client.println() inability to get any data out in a response from Teensy. Teensy doesn't hang, it still outputs temperature to an attached ILI9341 display.

If the Arduino Keepers want things to be "good for beginners", a starting point would be to have superb documentation of libraries so that you could learn a lot by reading the code. There's not a single function comment explaining socketBegin(). This file has 12 "gotos" in it... goto generally not a Good Thing. There are lines such as #80:
delayMicroseconds(250); // TODO: is this needed??
with no explanation of why it is there or what the issues may be: nothing to help anyone else pursue this question to a useful end.
OK, end of mini-rant. For now.

I have a lot of the socket.cpp debug printf uncommented (if only we had a real debugger!) so I can see where the trouble may lie. I'll add output of socket use. My goal is to get the code I need (interface to another machine) working reliably and figure out what is causing socket issues under specific circumstances, and fix or patch around it.

If anyone wants to see the mangled version of Ethernet I've forked, it's here.

This probably is not the best thread for discussion of Ethernet/WIZ850io socket issues...
 
Last edited:
So somehow it's possible to get the WIZ850io into a state where even a HW reset input won't recover it, it needs a POR. I'm baffled and concerned what this could be.

I'm concerned too. Would you be willing to help me set up the test here?

I looked at your TempServer code. It seems to include 7 files. 5 look like libraries provided by Teensyduino. 2 look familiar, but aren't files from Teensyduino. Can you tell me exactly which libs you're really using?

On the first 5, are they actually the libs from Teensyduino. It so, which version? I see you mentioned "the mangled version of Ethernet I've forked", so I also need you to be crystal clear about whether this lockup problem is happening with the Ethernet lib and other code from Teensyduino, or with modified versions.

For hardware, this looks like I need the WIZ850io, the 2.8 inch TFT, and a TMP102 sensor. Would this one be the right sensor to buy? Is anything else needed?

Which browser are you using to access TempServer? Is testing through a local ethernet LAN ok, or does this problem only occur if the communication goes through higher latency links?
 
Duplicating Tempserver: libraries

Yes, more than willing. Was out of town several days so just now able to reply. So that I can add debugging to the libs I am using, and also hold them in a known state across our development team even while trying multiple Arduino or Teensyduino releases, I fork them from their source github repos. Here are the libs in use:
#include <ILI9341_t3.h> // stock/current from your repo
#include <font_Arial.h> // from ILI9341_t3, stock/current
#include <XPT2046_Touchscreen.h> // stock/current from your repo
#include <SPI.h> // stock from your repo but my fork is behind so I will update that today.
#include <Ethernet.h> // fork from your repo but with added socket status tracking and printfs added mostly to socket.cpp
#include <TeensyID.h> // from https://github.com/systronix/TeensyID, ahead of sstaub but just examples, nothing substantive
#include <Systronix_TMP102.h> // from https://github.com/systronix/Systronix_TMP102, ours supports extended range

Yes the SparkFun TMP102 breakout is fine, I have used several. We have put the TMP102 sensor on our own custom board. The extended range (to +125C) was unique for a direct-to-digital sensor at its price point ($.50 @1000), but accuracy is only +/- 2 deg C. Now we like the newer but a bit more expensive TMP275: $1 @1000, same range, much better +/- 0.5 deg C. That accuracy difference may not sound like much but +/- 2C is a total uncertainty of 7.2 deg F! Not so good. The TMP275 is accurate to 1.8 deg F. There are more subtleties in the data sheet but that's the big picture. For this test, accuracy is irrelevant of course.

To access tempserver, I mainly use Chrome and FF on a PC and Chrome on Android. I access both on our local LAN and via DDNS over the Internet.

Today I'm going to add the output of the requesting IP address so I can see who that is and block them if they are bots.

I also plan to run Wireshark and see why the sockets get stuck in "Close Wait" mode.

As noted in my other comments I've seen this hang also in my DhcpStressTest, which does not use any temp sensor or the ILI9341.

Also using Arduino 1.8.2 and TD 1.36, with TyQt 0.8.0

Continuing my pursuit of the sockets issue here: https://forum.pjrc.com/threads/43761-Ethernet-library-socket-issues?p=149296#post149296
 
Last edited:
Thanks. I've ordered the Sparkfun TMP102, and I've put this on my list of stuff to test. Realistically, it's likely to be at least a few weeks until I can look at this one... but it's now on the list so I won't forget.
 
Status
Not open for further replies.
Back
Top