Possible bug with Wiz860io

This read sequence seems to work for me
Code:
    while (len > 0 ) {
      int read_bytes;
      if ((read_bytes = client.available()) > 0) {

        if (read_bytes > buf_size)  read_bytes = buf_size;
        client.read(buf, read_bytes);
        len -= read_bytes;
        command_read += read_bytes;
      }
    }

After testing this a bit last night it looks like this isn't a complete solution.

When I let the test run indefinitely it ends up failing in the same way after 30-45 minutes. So, this is significantly better but still not quite there. I'm going to try to get another teensy/wiz850io thingy setup so I can run tests faster.

I suspect the change in socket.cpp still needs to be made to make this completely reliable.
 
Dang. i confirmed my while() hung after 30 minutes. I just did a test with your original sketch and the socket.cpp hack ... it eventually hung, albeit, the monitor screen had gone blank ??, the client was definitely stopped, but not aborted. I checked tcpdump and there was no streaming traffic, but there was a packet (time out resend?, i failed to save the output :-( ) Maybe i disturbed the monitor output somehow. i'll retest ...

EDIT: the sketch is still running (absence of proof is not proof of absence)

In socket.cpp Paul has added state to cache recv data count from the SPI fetch that determines available() -- so that may explain why we see bogus (> 16K) values ?
state.RX_RSR -= ret;
this unsigned calculation could flip the available value into the 50K range, a race condition some where???
 
Last edited:
My server code isn't really safe when dealing with a read error. It closes the connection if it detects a problem, but there are still cases where it can get into a bad state and lock up. That's a real problem with my code that I plan on fixing in the future, but for now I want the code as simple as possible to debug the ethernet stuff.

Once you hit a problem it's probably best to restart the teensy and the client, otherwise you could be seeing remnants of bad state from the previous failure.

With very few exceptions, every time I've seen the teensy's streaming reads get out of sync it successfully closes the connection and the client disconnects quickly.
 
Oh, you said other things!

Hmm, I've never seen the teensy's monitor screen go blank. Perhaps a cat was sitting on the keyboard? Also, if the sketch stops reading new bytes from the TCP stream the sender will stall out from lack of acks.

Your theory about caching values sounds plausible. You know the old joke right? "There are 2 hard problems in computer science; Naming things, cache invalidation, and off-by-one errors". There being a race condition between the firmware and the caching behavior in the ethernet module is something that could be very hard to find and only show up in specific situations (like the bursting traffic I want to send/receive).
 
With Paul's patch to socket.cpp, the sketch ran all night long. :)

Observation:
Without the patch and the original sketch (but without client.stop()), here's another strange tcpdump snippet

Code:
...
07:05:59.898447 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 23557, win 1028, length 0
07:05:59.898463 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [P.], seq 23557:24581, ack 1, win 29200, length 1024
07:05:59.898469 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [P.], seq 24581:24585, ack 1, win 29200, length 4
07:05:59.898835 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 24585, win 1024, length 0
07:05:59.899231 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 24585, win 2048, length 0
07:05:59.910970 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [.], seq 24585:25611, ack 1, win 29200, length 1026
07:06:00.111467 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [P.], ack 25611, win 60134, length 0
07:06:00.111513 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [P.], seq 25611:26637, ack 1, win 29200, length 1026
...
07:28:24.684367 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [.], ack 1, win 29200, length 0
07:28:24.684459 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 1, win 0, length 0
...
The teensy (192.168.1.148) is advertising its window of 1024 or 2048 (as expected), but then it advertises a window of 60134! So it's like the teensy has managed to coerce the wiznet window to something outrageous. The final state is the linux host probing every 2 minutes, hoping to get a non-zero window from the teensy.
 
Last edited:
That's super-strange, but probably just a side-effect of the ethernet support code getting out of sync with what's happening in the wiznet.

I started 2 tests (I got a 2nd module soldered together) at around 7pm PST last night. Both had Paul's suggested change to socket.cpp. One is running my original teensy code, and the other is running manitou's suggested change.

As of a few minutes ago they are both still running without problem (14 hours without error) which suggests; Paul is on the right track, and manitou's suggested change isn't necessary (although I do like it).

That means I can move on to implementing the rest of the OPC server / led controller, which brings me one step closer to building a bigger cube of LEDs :)

20232016_10108638344864578_7280565540077088275_o.jpg

EDIT:
Thank you two so much. I've been using the teensy for a few years and it's (imho) the best arduino around. I was a little worried that this forum post wouldn't be received well (inadvertently violate a rule, be a dumb question or the wrong topic), but both of you have been wonderful :) .
 
Last edited:
I'm working on this again today.

but then it advertises a window of 60134! So it's like the teensy has managed to coerce the wiznet window to something outrageous.

Yes, but it's really looking like a bug in my code mismanaging the RX_RSR cached state. The problem later manifests as the Wiznet chip window getting messed up, but only because is was given incorrect Sock_RECV based on the wrong state.

I hope to have a fix soon, which allows this to work with the cached receive state. The caching really does eliminate a *lot* of redundant SPI overhead talking to the Wiznet chip. The workaround throws most of that away, so I really want to find a proper fix.
 
I woke up this morning to find 1 of the 2 tests failed after 300+ minutes, and the other still going strong. I restarted the failed test before heading into work. It could have been a fluke, and if there is a glitch roughly once a day and the disconnect/reconnect is quick I think it'll be fine.

Both of them are running an updated version of the server code that includes some more connection safety stuff, zeroconf, and FastLED to send the color values out to LEDs. To compile the latest you'll need the EthernetBonjour ( https://github.com/TrippyLighting/EthernetBonjour ) and FastLED libraries. If you want to run it you'll have to update the client too (it's a very minor change).
 
Back
Top