Forum Rule: Always post complete source code & details to reproduce any issue!
Page 2 of 2 FirstFirst 1 2
Results 26 to 38 of 38

Thread: Possible bug with Wiz860io

  1. #26
    Junior Member
    Join Date
    Oct 2017
    Location
    Oakland, CA
    Posts
    18
    Quote Originally Posted by manitou View Post
    This read sequence seems to work for me
    Code:
        while (len > 0 ) {
          int read_bytes;
          if ((read_bytes = client.available()) > 0) {
    
            if (read_bytes > buf_size)  read_bytes = buf_size;
            client.read(buf, read_bytes);
            len -= read_bytes;
            command_read += read_bytes;
          }
        }
    After testing this a bit last night it looks like this isn't a complete solution.

    When I let the test run indefinitely it ends up failing in the same way after 30-45 minutes. So, this is significantly better but still not quite there. I'm going to try to get another teensy/wiz850io thingy setup so I can run tests faster.

    I suspect the change in socket.cpp still needs to be made to make this completely reliable.

  2. #27
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,588
    Dang. i confirmed my while() hung after 30 minutes. I just did a test with your original sketch and the socket.cpp hack ... it eventually hung, albeit, the monitor screen had gone blank ??, the client was definitely stopped, but not aborted. I checked tcpdump and there was no streaming traffic, but there was a packet (time out resend?, i failed to save the output :-( ) Maybe i disturbed the monitor output somehow. i'll retest ...

    EDIT: the sketch is still running (absence of proof is not proof of absence)

    In socket.cpp Paul has added state to cache recv data count from the SPI fetch that determines available() -- so that may explain why we see bogus (> 16K) values ?
    state[s].RX_RSR -= ret;
    this unsigned calculation could flip the available value into the 50K range, a race condition some where???
    Last edited by manitou; 10-25-2017 at 07:13 PM.

  3. #28
    Junior Member
    Join Date
    Oct 2017
    Location
    Oakland, CA
    Posts
    18
    My server code isn't really safe when dealing with a read error. It closes the connection if it detects a problem, but there are still cases where it can get into a bad state and lock up. That's a real problem with my code that I plan on fixing in the future, but for now I want the code as simple as possible to debug the ethernet stuff.

    Once you hit a problem it's probably best to restart the teensy and the client, otherwise you could be seeing remnants of bad state from the previous failure.

    With very few exceptions, every time I've seen the teensy's streaming reads get out of sync it successfully closes the connection and the client disconnects quickly.

  4. #29
    Junior Member
    Join Date
    Oct 2017
    Location
    Oakland, CA
    Posts
    18
    Oh, you said other things!

    Hmm, I've never seen the teensy's monitor screen go blank. Perhaps a cat was sitting on the keyboard? Also, if the sketch stops reading new bytes from the TCP stream the sender will stall out from lack of acks.

    Your theory about caching values sounds plausible. You know the old joke right? "There are 2 hard problems in computer science; Naming things, cache invalidation, and off-by-one errors". There being a race condition between the firmware and the caching behavior in the ethernet module is something that could be very hard to find and only show up in specific situations (like the bursting traffic I want to send/receive).

  5. #30
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,588
    With Paul's patch to socket.cpp, the sketch ran all night long.

    Observation:
    Without the patch and the original sketch (but without client.stop()), here's another strange tcpdump snippet

    Code:
    ...
    07:05:59.898447 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 23557, win 1028, length 0
    07:05:59.898463 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [P.], seq 23557:24581, ack 1, win 29200, length 1024
    07:05:59.898469 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [P.], seq 24581:24585, ack 1, win 29200, length 4
    07:05:59.898835 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 24585, win 1024, length 0
    07:05:59.899231 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 24585, win 2048, length 0
    07:05:59.910970 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [.], seq 24585:25611, ack 1, win 29200, length 1026
    07:06:00.111467 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [P.], ack 25611, win 60134, length 0
    07:06:00.111513 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [P.], seq 25611:26637, ack 1, win 29200, length 1026
    ...
    07:28:24.684367 IP 192.168.1.4.58509 > 192.168.1.148.7890: Flags [.], ack 1, win 29200, length 0
    07:28:24.684459 IP 192.168.1.148.7890 > 192.168.1.4.58509: Flags [.], ack 1, win 0, length 0
    ...
    The teensy (192.168.1.148) is advertising its window of 1024 or 2048 (as expected), but then it advertises a window of 60134! So it's like the teensy has managed to coerce the wiznet window to something outrageous. The final state is the linux host probing every 2 minutes, hoping to get a non-zero window from the teensy.
    Last edited by manitou; 10-25-2017 at 11:30 AM.

  6. #31
    Junior Member
    Join Date
    Oct 2017
    Location
    Oakland, CA
    Posts
    18
    That's super-strange, but probably just a side-effect of the ethernet support code getting out of sync with what's happening in the wiznet.

    I started 2 tests (I got a 2nd module soldered together) at around 7pm PST last night. Both had Paul's suggested change to socket.cpp. One is running my original teensy code, and the other is running manitou's suggested change.

    As of a few minutes ago they are both still running without problem (14 hours without error) which suggests; Paul is on the right track, and manitou's suggested change isn't necessary (although I do like it).

    That means I can move on to implementing the rest of the OPC server / led controller, which brings me one step closer to building a bigger cube of LEDs

    Click image for larger version. 

Name:	20232016_10108638344864578_7280565540077088275_o.jpg 
Views:	19 
Size:	102.1 KB 
ID:	11888

    EDIT:
    Thank you two so much. I've been using the teensy for a few years and it's (imho) the best arduino around. I was a little worried that this forum post wouldn't be received well (inadvertently violate a rule, be a dumb question or the wrong topic), but both of you have been wonderful .
    Last edited by cconstantine; 10-25-2017 at 05:00 PM.

  7. #32
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,356
    I'm working on this again today.

    Quote Originally Posted by manitou View Post
    but then it advertises a window of 60134! So it's like the teensy has managed to coerce the wiznet window to something outrageous.
    Yes, but it's really looking like a bug in my code mismanaging the RX_RSR cached state. The problem later manifests as the Wiznet chip window getting messed up, but only because is was given incorrect Sock_RECV based on the wrong state.

    I hope to have a fix soon, which allows this to work with the cached receive state. The caching really does eliminate a *lot* of redundant SPI overhead talking to the Wiznet chip. The workaround throws most of that away, so I really want to find a proper fix.

  8. #33
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,356
    Please give this fix a try.
    Attached Files Attached Files

  9. #34
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,588
    Quote Originally Posted by PaulStoffregen View Post
    Please give this fix a try.
    OK so far. I've run some UDP and TCP benchmarks at various SPI clock rates on wiz820io and T3.2, all seem good. The original sketch in this thread is still running ....

  10. #35
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,356

  11. #36
    Junior Member
    Join Date
    Oct 2017
    Location
    Oakland, CA
    Posts
    18
    Thanks! I'm testing it out now. So far so good.

  12. #37
    Junior Member
    Join Date
    Oct 2017
    Location
    Oakland, CA
    Posts
    18
    I woke up this morning to find 1 of the 2 tests failed after 300+ minutes, and the other still going strong. I restarted the failed test before heading into work. It could have been a fluke, and if there is a glitch roughly once a day and the disconnect/reconnect is quick I think it'll be fine.

    Both of them are running an updated version of the server code that includes some more connection safety stuff, zeroconf, and FastLED to send the color values out to LEDs. To compile the latest you'll need the EthernetBonjour ( https://github.com/TrippyLighting/EthernetBonjour ) and FastLED libraries. If you want to run it you'll have to update the client too (it's a very minor change).

  13. #38
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,356
    I did some more testing here and discovered another long-standing bug where connections closed by the remote host aren't always properly closed and reused.

    Committed a fix.

    https://github.com/PaulStoffregen/Et...f40d86a5a973f9

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •