Ethernet for Teensy3

Status
Not open for further replies.
Thanks for your feedback. I'd have to say I was pretty impressed when it all started up and gave such good initial results.

I am sending Midi over UDP, with typically 4 byte payloads. The performance objective is bursts of 16,000 pkts per second (which I agree is ambitious). I am currently achieving 4170 pkts per second with this payload size, and am hoping to still find some further improvements. I had seen the benchmarking done (I think it was using using some of Paul's code), at the link below, and from this I got the impression that I can still get quite a bit more speed. The exact setup of the benchmarking is not clear to me, but seems to imply that receiving UDP with Teensy3 & Wiz820 @ 24MHz SPI Clk, could achieve up to 11.6mbps, which is much more than the 400kbps I am measuring. Are you (or any other member) familiar with this 11.6mbps measurement, and exactly what it was measuring, and with what code configuration ??

https://github.com/manitou48/DUEZoo/blob/master/wizperf.txt

Thanks,
Craig
 
There's quite a bit of overhead per packet, not just in the IP and UDP protocols, but also in how the Ethernet library has to talk with the W5x00 chip.

If you pack more than one MIDI message per UDP packet, you'll probably get much better overall bandwidth. Of course, that requires more complex code to check if more than 1 MIDI message is ready to send and packing them into variable-length messages. But if you can do that, I'm imagine it will be well worth the effort.
 
The thread in which some of the speed measurements are briefly explained is
HERE
The link in the post leads directly to the sketch that user manitou used to arrived at these measurement results.
 
Thanks everyone for the comments. Reading between the lines, I think I've concluded that I have the currently optimal software configuration for best performance, and further gains can only come from packing more data into each UDP. I therefore did some more measurements with different packet sizes, which I share here for anyone who is interested.

Col 1 : UDP Payload Size
Col 2 : Pkts/Sec
Col 3 : UDP Payload Bytes/Sec

4 4167 16667
16 3704 59259
32 2273 72727
64 775 49612
100 444 44444
128 328 41967

Performance seems to optimise at around payload size of 32. This allows me to pack around 8 midi messages per 32 byte UDP payload achieving thee 16,000 Midi Msgs / second target. Yes the coding will be a bit of a pain, but worth the result.

Thanks everyone,
Craig
 
Thanks for the testing. I've had similar questions with OSC bundles/messages but had not gotten around to testing.
This will be of great use!
 
Glad to be able to contribute something useful.

I've spent a few hours today modifying both the Linux side, and the Teensy side of the UDP connection, to send multiple Midi messages in larger UDP packets. I was having a few problems, which I eventually traced to a "#define UDP_TX_PACKET_MAX_SIZE 24" statement, in EthernetUDP.h. I guess this small value was chosen to save memory, but I do wonder if this isn't a little too low for the default value. Perhaps 128, or even 256 might be better. Would it be sensible to increase this number in a subsequent build?

Also, is there any reason why the code to set W5200 SPI clock to 24MHz is commented out by default? It would be nice if we didn't have to go into the libraries and edit these things on each release. (I know everyone wish list is different, but these things seem like generally useful changes.)

Thanks, Craig.
 
I couldn't find code in the Ethernet library that uses that constant:
UDP_TX_PACKET_MAX_SIZE

I can find it in really old implementations of the ethernet arduino library.


Our OSC library uses the constant which is unfortunate given how many much larger messages people expect to be able
to make with OSC.
 
Hi. I've just searched the libraries I am using, and I agree, and was surprised, that this constant is not used anywhere within the Arduino environment. I therefore went and retested my code, and I definitely am losing the characters after the 24th. Then it dawned on me that I had followed the example on the Arduino Ethernet Libraries page where this constant is used to dimension the Rx and Tx buffers. I had not checked before to see if this constant was also used in the libraries, I just assumed it was. So this is easily fixed for me, but likely to lead to others becoming confused.

char packetBuffer[UDP_TX_PACKET_MAX_SIZE]; //buffer to hold incoming packet,

http://arduino.cc/en/Tutorial/UDPSendReceiveString

Whilst I am in this post, I can confirm that my Midi Decoder, using the Teensy3.1 with WIZ820, is now receiving 16,000 midi messages per second, when packed into UDP packets with payload size around 32 bytes. So thumbs up to both Teensy and Wiznet!
 
That's what I suspected UDP_TX_PACKET_MAX_SIZE is a red herring perhaps from earlier hardware limitations.
I looked up the wiznet 5200 errata and found this:

"User should limit the use of buffer in order to avoid receiving or transmitting the
maximum packet size in TCP/UDP. In the case of TCP, the Windows size (RX buffer) should
be limited to the size of 1459 bytes or under. In the case of UDP, the packet size that can
be sent from the Application Layer should be limited to the size of 1452 bytes or under."

So the question is: what is the sweet spot in terms of total throughput? 32 bytes seems small to me.
Just to clarify: your tests were with the Wiznet 820 and a Teensy 3.1 and Teensyduino 1.19 ?
 
Last edited:
The W5200 has a lot of overhead to access pretty much any data, but once a transfer is under way, you get the SPI clock speed. The key to performance involves reading the whole UDP packet, or at least several hundred bytes, all at once.

When I looked at OSC some time ago, it was reading small chunks of 4 or 8 bytes at a time, based on the needs of the OSC header, bundle and message parsing. That's much better than reading only 1 byte, but not nearly as good as reading 100 or more bytes into a modestly sized buffer, from which the parser could read small chunks without the terrible overhead of going back to the W5200 chip each time.
 
Yes, we are definitely looking at tuning OSC processing. The first go round of the OSC library was aiming for feature completeness,
semblance with the Processing OSC API and portability so we had to make it work on the low-memory footprint Arduinos.
It sounds like there would be some interest in some tuning focussed on Teensy 3.1 and W5200. I will try to put some of that
into this release.
 
Yes. Tests were with Arduino 1.0.5, Teensyduino 1.19, Wiznet 820, and Teensy 3.1. I had previously used Teensy2++ and Wiznet 810, but with only a fraction of the throughput.
 
Did you consider using a streaming model for this data? On one Wiznet based project, I struggled with packet boundaries, etc., until I realized that due to the on-board buffering in the Wiznet modules, and the API to read how many bytes are in the receive buffer, etc., you can just read the byte stream flow and forget about packet boundaries. On the sending end, same deal. So if the source data is fast enough to fill an MTU-sized packet, so be it. Doing this helped me a lot, esp. on a project where the server was a Win 2000 server and that software, heavily layered, just streamed all data. Much like a web server does.

I started that product using UDP, seemingly for simplicity in being connectionless. But due to WAN/LAN firewall transversal policy issues, I went to TCP. Reconnection wasn't that bad, except to be sure not to reuse the same port number when reconnecting because some hosts take 30 sec. or more to time out and close the old socket.
 
Last edited:
Indeed: streaming can be good for large continuous sources of data and TCP has the negotiations in place to fine tune the buffer sizes (in theory).

If you want to use OSC over TCP you can SLIP wrap the data to create the packet boundaries.

We do this when high reliability of OSC is required or when streaming images or multiple channel audio streams.
 
Indeed: streaming can be good for large continuous sources of data and TCP has the negotiations in place to fine tune the buffer sizes (in theory).

If you want to use OSC over TCP you can SLIP wrap the data to create the packet boundaries.

We do this when high reliability of OSC is required or when streaming images or multiple channel audio streams.

Hmmm...that possibility escaped my attention. Would you have some sample code on how to achieve this ?
Even better would be if you could include that into the next release!
 
Wiznet does not have dynamic TCP window sizing in the as of the 812MJ's chip (5100). I don't think they added it to the newest one either. This was an issue for me in using a 3G/4G cellular router/modem (Cradlepoint). The round trip on cellular is often (busy hours) 300-600mSec for the 2nd-nth packet. The first packet after a time period of dormancy can be seconds, and will timeout TCP packet and sometimes connection.
To deal with this in the heat of battle (field big system integration), I disabled their fixed window size so every packet had to get a TCP ACK to cope. Since the data volume was low, this did slow throughput but eliminated the latency timeout issue.

I began doing this with UDP hoping for simplicity. But port-forwarding for static LAN addressed ethernet nodes proved to be politically impossible due to firewall policies. Even if it was permitted, sustaining the forwarding rules at scale was not viable, hence, the move to TCP and passive mode FTP (security policy). A better idea was to use port triggering, but (a) I didn't know how to make my app do that and (b) the IT gods could not say port triggering in the NAT they run (big fat Cisco edge routers) would support triggering. (Actually, they were clueless, having drunk the CCNA kool-aid).

Another issue was that many enterprise systems have a proxy at the edge. It's essentially a layer 3 bridge. It hosed up often, dropping packets, etc. They didn't care because there is virtually no UDP traffic, and TCP thrashes about correcting these.

In the end, it worked well...
Streaming had to be done due to the way the Win 2000 server apps pay no attention to packet boundaries. They just "send". Even though we had messages never larger than an IP packet's non-jumbo MTU, and even though the Wiznet's buffer per socket per direction (tx,rx) are larger than an MTU, there was no way to prevent a data frame from spanning a packet boundary, UDP or TCP. So streaming was a simple solution. The embedded system just read all the available bytes in the Wiznet's ring buffer. If there was an incomplete application program data frame, just hold on to the data and wait for more to arrive. It does take an app buffer to retain that . I suppose a/the streaming library would do that.
 
Last edited:
Status
Not open for further replies.
Back
Top