Possible bug with Wiz860io

Hello,

I'm trying to write an LED controller that receives OPC (http://openpixelcontrol.org/) over ethernet and I've run into a weird problem that I don't think is in my code (but certainty could be!).

My setup:
Desktop PC running ubunt 16.06, arduino 1.8.5, and teensyduino 1.40
Teensy 3.1 with WIZ820io & Micro SD Card Adaptor from the pjrc store and a WIZ850io.


Client code: https://github.com/cconstantine/wiz850io_poc
Server code: https://github.com/cconstantine/teensy_opc

When I run those together I get the following output very quickly:
My IP Address: 192.168.1.120
header (4):
0
0
30
0
channel: 0
command: 0
length : 12288
first byte : 0
command_read : 12292
available : 0
receive time :9

header (4):
0
0
30
0
channel: 0
command: 0
length : 12288
first byte : 1
command_read : 12292
available : 54268
receive time :9

header (4):
1
1
1
1
channel: 1
command: 1
length : 257

In the above, the first 2 message frames come in correctly, but it looks like the 3rd one starts in the middle of the last message frame. The previous frame's color bytes are all ones, and it appears to be leaking into the next frame's header.

If I set the desired fps on the client to something big like 120 (optional 2nd argument) OR if I set the 'fixup_delay' in the server to something like 8 the problem goes away. This makes me think it's a timing issue in the ethernet code.

Help?
 
I could be reading it wrong, but in your teensy client code, the while-read loop isn't advancing buf in
int read_bytes = client.read(buf, to_read);
It seems you would possibly over-write older data in buf ....
 
Thanks for the reply!

I think you're kind of right. By not advancing 'buf' I'm overriding the data I receive, but I did that intentionally for this code. Right now I'm just trying to reliably read the OPC message frames and that read-loop will eventually fill a much bigger buffer for the LED color values.

This code works if I saturate the network from the client which suggests the logic in it is correct, it's only when there's a pause between OPC message frames that it starts to break down.
 
You don't quite understand. You have at least two bugs in the while-read loop. The while-read loop correctly tries to accumulate len bytes of data, since TCP may split the original message into different size chunks. (TCP provides a "stream" of bytes, not records.) But your client.read(buf,to_read) will continue to read into the start of buf. That logic may avoid the 2nd bug, since if you properly read all 12,288 bytes into buf, your sketch would overflow the size of buf. The cumulative read would look something like

Code:
uint8_t * bufptr = buf;
while(len > 0) {
      unsigned int to_read = buf_size < len ? buf_size : len;
      
      /************** Inner loop delay *********************/
      delay(fixup_delay);
      /*****************************************************/

      int read_bytes = client.read(bufptr, to_read);
      if (read_bytes > 0) {
        len -= read_bytes;
        command_read += read_bytes;
        bufptr += read_bytes;
      }
}
but the logic above would overflow buf, so you need to add additional code to check that buf_size hasn't been exceeded AND to finish consuming (client.read()) the remaining bytes of len, so you are ready to receive the next header.

Your original while-read loop will consume the len bytes, but the the Serial.print of buf[0] will not necessarily be the first byte of the record/frame.

EDIT: The other curious thing is available : 54268 The wiznet chips typically have only a 2K TCP receive buffer (max could be configured to 16K), so I don't see how client.available() reports 54K ???
Maybe use wireshark or tcpdump to observe your TCP stream.

Are you powering the wiznet chip from separate 3v3 power supply and GND? I don't think the T3.1 has enough 3v3 power (wiz850io can draw 132 ma)
 
Last edited:
Hmm, I think I'm not explaining myself well enough, thanks for sticking with me on this. I understand that TCP provides a stream of bytes. The OPC records / messages are in that stream, and I'm trying to read them one message at a time. It's slightly tricky because the protocol doesn't have a delimiter, and the message size is dynamically encoded at the beginning of each message. Once I get this to work reliably I'll add the code to store the led color values in a meaningful way and send them off to the strips (probably using the FastLED library). For now, I'm happy to throw the color values away as long as I can follow the stream of OPC messages.

Your analysis of the code is correct, but bug 1 is there intentionally and does block bug 2. Because I'm not doing anything with the 'data' portion of the OPC message, I'm just allowing it to be overwritten. You are correct again with the 'first byte' not really being the first byte of the 'data' portion of the OPC message; it's really more like the first byte of the last chunk of it. After work I should be able to flesh that bit out to store the full thing in a single buffer like you said, but I doubt it'll fix the problem I'm experiencing. I've spent a full day trying to find a problem in that loop, and if it wasn't for the fact that saturating the NIC with data fixes the problem I'd be inclined to say that is where the problem exists.

I thought the available bytes at the end of the last successful read was interesting too. I didn't notice that it was so far out from what the ethernet chip could do. I wonder if the driver is getting into a bad state and thinking there is a lot more data available to read, and when I do the next header read I'm getting residual data from the last read from some internal buffer.

Oh wow, I hadn't considered that power could be the problem, but of course it could cause weird intermittent issues! I haven't added a 3.3v power supply outside adaptor board between the teensy and the NIC. If this is the problem then the WIZ820io & Micro SD Card Adaptor from the pjrc store (https://www.pjrc.com/store/wiz820_sd_adaptor.html) is no good for using ethernet. It looks like the adaptor board has an MCP1825 on it to provide the WIZ850 with up to 500ma of 3.3v power, so it's probably fine? I suppose my usb port could be under-volting... I'll measure that and maybe even cut the usb power supply for the teensy and use a bigger external 5v supply.

I haven't used wire-shark to inspect the network. Because the only thing that's changing between working and not-working is the timing I hadn't thought to check that. I'm not very good at wire-shark, but I'll give it a try too.
 
Any chance you could give the client code a conventional makefile?

Apparently my copy of cmake is too old.

Code:
CMake Error at CMakeLists.txt:2 (cmake_minimum_required):
  CMake 3.0 or higher is required.  You are running version 2.8.12.2

Sad reality is I just don't have the time to fiddle, just to investigate this issue.
 
Hmm, yeah. I really like cmake, and thought the client code might get more complicated.

In the meantime I've bumped the version requirement down (I'm not using any 3.0 features) so you should be good to go.
 
Makefile added, and it's 100% reasonable for you to ask for the ability to reproduce without putting any effort in. Let me know if there's more I can do.
 
Last edited:
Oh wow, I hadn't considered that power could be the problem, but of course it could cause weird intermittent issues! I haven't added a 3.3v power supply outside adaptor board between the teensy and the NIC. If this is the problem then the WIZ820io & Micro SD Card Adaptor from the pjrc store (https://www.pjrc.com/store/wiz820_sd_adaptor.html) is no good for using ethernet. It looks like the adaptor board has an MCP1825 on it to provide the WIZ850 with up to 500ma of 3.3v power, so it's probably fine? I suppose my usb port could be under-volting... I'll measure that and maybe even cut the usb power supply for the teensy and use a bigger external 5v supply.
Ahhh, I did not look at the PJRC wiz adaptor schematic. As you note, it should work fine on T3.1. I have standalone wiz chips and am "sensitive" to the power requirements.
 
I too have it running on wiz820io and am getting what you were getting .... works with delay(10)

when it stops after 3 pkts, tcpdump is showing a window of 0 then a Reset

Code:
15:39:21.248290 IP 192.168.1.148.7890 > 192.168.1.4.53947: Flags [.], ack 25611, win 1022, length 0
15:39:21.476370 IP 192.168.1.4.53947 > 192.168.1.148.7890: Flags [P.], seq 25611:26633, ack 2, win 29200, length 1022
15:39:21.676810 IP 192.168.1.148.7890 > 192.168.1.4.53947: Flags [.], ack 26633, win 0, length 0
15:39:22.052365 IP 192.168.1.4.53947 > 192.168.1.148.7890: Flags [.], ack 2, win 29200, length 0
15:39:22.052459 IP 192.168.1.148.7890 > 192.168.1.4.53947: Flags [R.], seq 2, ack 26633, win 0, length 0
this is on T3.2 1.8.3/1.37
 
I'm not familiar enough with tcpdump to know what all that means, but the reset could be from the server code. If the header doesn't look right (non-zero channel and command), it terminates the connection with a `client.stop()` which I think sends a Reset.
 
well, i suspect a problem with wizio firmware. 0 window says, T3.x isn't servicing the buffer fast enough, and transmitter has to wait, til non-zero window is advertised (flow control) -- not really abnormal behavior, but wiznet seems to get confused and Resets the connection.
FWIW, delayMicroseconds(300) or greater allows sketch to work.

(i also get same results on wiz850io with T3.2)
 
I'm not sure the delay (either `delay()` or `delayMicroseconds()`) is a workable answer. I can't get it to work reliably with a delay as high as 4ms (which is way too big). It works for awhile, but if I leave it for something like an hour I still end up getting a corrupted read.

I'm not sure I follow your explanation, mostly because flooding the wiz850io by running the client at 120+fps appears to fix the problem. If there is a bug in the flow-control logic in the firmware I would expect the problem to get worse when attempting to push more packets.
 
The fact that client.available() is reporting a bogus number suggests a wiznet firmware problem, because that is fairly low-level request to the wizio firmware.

I have blasting TCP benchmarks that I run, and I have not seen that problem (yet). My TCP receiver logic looks like

Code:
  while (sender.connected()) {
    if ((n = sender.available()) > 0) {

      if (n > RECLTH)  n = RECLTH;
      sender.read(packetBuffer, n);
      bytes += n;
    }
  }

and if i run your client .cpp against my TCP reciever, it seems to run forever (til i got tired of waiting).
 
Last edited:
I don't have a complete answer yet, but I believe I have tracked this down enough for a quick workaround.

Look in socket.cpp, line 301: (in hardware/teensy/avr/libraries/Ethernet)

Code:
	if (inc >= 250 || state[s].RX_RSR == 0) {

Change this so it always evaluates true:

Code:
	if (true) {

This code is meant to optimize TCP throughput for programs which do a large number of small reads. I'm not sure why it's failing. There must be a subtle bug.

But this quick hack should at least let you give it a try without this issue.
 
It could be a Wiznet firmware bug, or it could be a bug in my code, or a subtle Wiznet issue that's only triggered by my optimization attempt. Very difficult to say....

I'll look at this in more depth tonight or tomorrow. It's going to take some serious effort. For now, hopefully the workaround helps?
 
The fact that client.available() is reporting a bogus number suggests a wiznet firmware problem, because that is fairly low-level request to the wizio firmware.

I have blasting TCP benchmarks that I run, and I have not seen that problem (yet). My TCP receiver logic looks like

Code:
  while (sender.connected()) {
    if ((n = sender.available()) > 0) {

      if (n > RECLTH)  n = RECLTH;
      sender.read(packetBuffer, n);
      bytes += n;
    }
  }

and if i run your client .cpp against my TCP reciever, it seems to run forever (til i got tired of waiting).

I'd expect that code to run forever also. The problem is that I'm double-getting some values from the TCP stream. I'll bet that if you inspect the PacketBuffer you'll find that sometimes you get duplicated values.
 
This read sequence seems to work for me
Code:
    while (len > 0 ) {
      int read_bytes;
      if ((read_bytes = client.available()) > 0) {

        if (read_bytes > buf_size)  read_bytes = buf_size;
        client.read(buf, read_bytes);
        len -= read_bytes;
        command_read += read_bytes;
      }
    }
 
It could be a Wiznet firmware bug, or it could be a bug in my code, or a subtle Wiznet issue that's only triggered by my optimization attempt. Very difficult to say....

I'll look at this in more depth tonight or tomorrow. It's going to take some serious effort. For now, hopefully the workaround helps?

I'll give that a try when I get home from work, thanks!

I'm still kind of in disbelief that it looks like I found a bug in either the wiznet firmware or your code. I got this hardware less than a week ago, and it doesn't look like the code you wrote has been updated in about a year without complaint. I feel like a freshmen CS student who claims they found a bug in gcc!
 
This read sequence seems to work for me
Code:
    while (len > 0 ) {
      int read_bytes;
      if ((read_bytes = client.available()) > 0) {

        if (read_bytes > buf_size)  read_bytes = buf_size;
        client.read(buf, read_bytes);
        len -= read_bytes;
        command_read += read_bytes;
      }
    }

Do you mean that with this as the only change the bug goes away? I haven't tried that exact thing, but I have tried reading client.available() to ensure there are bytes to read and still had problem.s
 
Yep, i used that while() in your sketch. ( i don't think i had any other changes laying around ...) Worked on both wiz820io and wiz850io. Also worked on 1.8.4/1.40, and worked with T3.2@120mhz and 30mhz SPI clock

Paul's socket.cpp patch seemed to work as well.
 
Last edited:
Yep, i used that while() in your sketch. ( i don't think i had any other changes laying around ...)

Paul's socket.cpp patch seemed to work as well.

Super-strange. The nice thing about git is you should be able to see if there are other changes (like a non-zero fixup_delay). I suppose I have *another* thing to check when I get home ;)
 
This read sequence seems to work for me
Code:
    while (len > 0 ) {
      int read_bytes;
      if ((read_bytes = client.available()) > 0) {

        if (read_bytes > buf_size)  read_bytes = buf_size;
        client.read(buf, read_bytes);
        len -= read_bytes;
        command_read += read_bytes;
      }
    }

Just tried this out and it totally works! That suggests that the problem might be related to doing a read() that requests more bytes than are available(). I'm going to leave the test running overnight and see if it's really-really fixed. I'd be willing to say that trying to do a read() of more bytes than are available() is undocumented behavior, so maybe this isn't a bug in the ethernet support code? Or maybe the update should be to the read(void*, int) method to cap the size to the available() bytes?

Interesting, and thanks so much for the help!
 
Back
Top