USB Host Ethernet Driver

Maybe my router caught on and is protecting me?

I can't find an available FLOOD tool.

I set up Win Ubuntu and it gave this:
~$ping 192.168.0.3 -i 0
PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
ping: cannot flood; minimal interval allowed for user is 200ms

Says USER - I forget my SUDO pwd :(
 
I think I just need a better router, it’s only the WiFi that goes out, the wired ports still work it just decides to not want to communicate properly.
 
Got back to the latest from github, also running IDE 1.8.9 with TD 1.48 installed there - much nicer build!

Running T4 at 960 MHz through powered hub.

fBench doing 1024 messages of 4096 Bytes Udp then Tcp:
Code:
Udp	192.168.0.3	0.143	4,194,304	234,904.773
Udp	192.168.0.3	0.150	4,194,304	223,284.328
Udp	192.168.0.3	0.165	4,194,304	203,976.202
Udp	192.168.0.3	0.161	4,194,304	208,362.148
Udp	192.168.0.3	0.192	4,194,304	174,325.310
Udp	192.168.0.3	0.159	4,194,304	211,400.058
Udp	192.168.0.3	0.161	4,194,304	208,144.877
Udp	192.168.0.3	0.159	4,194,304	211,311.527
Udp	192.168.0.3	0.183	4,194,304	183,509.072
Udp	192.168.0.3	0.192	4,194,304	174,978.291
Tcp	192.168.0.3	0.772	4,194,304	43,469.065
Tcp	192.168.0.3	0.772	4,194,304	43,490.706
Tcp	192.168.0.3	0.774	4,194,304	43,372.629
Tcp	192.168.0.3	0.773	4,194,304	43,394.999
Tcp	192.168.0.3	0.769	4,194,304	43,622.172
Tcp	192.168.0.3	0.776	4,194,304	43,258.443
Tcp	192.168.0.3	0.770	4,194,304	43,553.078
Tcp	192.168.0.3	0.771	4,194,304	43,522.199
Tcp	192.168.0.3	0.774	4,194,304	43,336.951
Tcp	192.168.0.3	0.769	4,194,304	43,621.248

And the SerMon report for the Tcp portion:
Code:
Megabytes: 4.194304  Seconds: 0.7800  KBits/Sec: 43018.5026
Megabytes: 4.194304  Seconds: 0.9000  KBits/Sec: 37282.7022
Megabytes: 4.194304  Seconds: 0.9060  KBits/Sec: 37035.7969
Megabytes: 4.194304  Seconds: 0.9180  KBits/Sec: 36551.6688
Megabytes: 4.194304  Seconds: 0.9180  KBits/Sec: 36551.6688
Megabytes: 4.194304  Seconds: 0.9120  KBits/Sec: 36792.1404
Megabytes: 4.194304  Seconds: 0.9360  KBits/Sec: 35848.7521
Megabytes: 4.194304  Seconds: 0.9420  KBits/Sec: 35620.4161
Megabytes: 4.194304  Seconds: 0.9360  KBits/Sec: 35848.7521
Megabytes: 4.194304  Seconds: 0.9420  KBits/Sec: 35620.4161

fBench count of 1280 sets of 4096 Tcp report just under a second for 5.242 MB:
0.966 5,242,880 43,424.776
And sketch in SerMon thinks it was just over a second :
Megabytes: 5.242880 Seconds: 1.0560 KBits/Sec: 39718.7879

Playing with the numbers for Udp 3000 messages of 32000 bytes in 1 fBench second?
0.990 96,000,000 775,465.875
1.038 96,000,000 739,825.806
1.019 96,000,000 753,759.301
1.011 96,000,000 760,004.148
0.999 96,000,000 769,068.005
1.000 96,000,000 768,217.790
 
edited again to show temps was holding at 78°C - blowing over heat sink dropped to 72 - cool enough to pull out more heat swapping fingers down to 66C.

Same counts as above Tcp takes 18 times longer:
Code:
Udp	192.168.0.3	0.917	96,000,000	837,919.067
Tcp	192.168.0.3	17.959	96,000,000	42,764.607

Using : benchtx -a 192.168.0.22 -m 20480 -mn 10 now showing:
Starting benchtx...
Benchmark client started.
Protocol: TCP
Remote IP Addr: 192.168.0.22
Remote Port: 7007
Message Size: 20480
Num. of messages: 10

Megabytes: 0.204800 Seconds: 0.0780 KBits/Sec: 21005.1282

and fBench:
0:43:8.130 Rx Tcp 192.168.0.22 0.033 204,800 49,664.287

And temp back to 77 and 78 again.

BACK TO 600 MHz ::
Temp dropped to 57 already and speeds match the above in this post:
Tcp 192.168.0.3 17.952 96,000,000 42,780.673
Tcp 192.168.0.3 17.933 96,000,000 42,826.881
Udp 192.168.0.3 1.032 96,000,000 743,937.707
Udp 192.168.0.3 0.906 96,000,000 847,596.798
 
Yeah, that’s pretty much what I would expect the numbers to be with the current code, UDP will always be faster than TCP there’s nothing I can do about that. I may be able to squeeze a little more speed out of TCP transmit once I pack the buffer full of several messages instead of individual ones, but I don’t think it’ll be too much of a difference. The easiest way to really increase the TCP size for transmit and receive is to increase the window size, but that comes at the cost of increased RAM and it’s already kind of up there. I don’t know whose really going to need these high speeds in most applications, but at least the transfer medium isn’t bottlenecking the whole thing or adding extra overhead like a SPI chip would. It would be nice to offer the same buffer sizes that a desktop computer offers, but without some kind of external RAM chip it won’t be possible so I think for what we have it’s performing well beyond my expectations.
 
Indeed no complaints here - looks impressive. Just looking to let you see it tested somewhere else and confirm " pretty much what I would expect the numbers to be "

Same results even at 396 MHz on T4 - so it is not CPU bound!
fBench 3000msg of 32000::
0:0:53.92 Tx Udp 192.168.0.3 0.885 96,000,000 867,429.251
0:0:55.80 Tx Udp 192.168.0.3 0.875 96,000,000 877,397.921
0:1:20.257 Tx Tcp 192.168.0.3 17.924 96,000,000 42,848.598
0:1:38.345 Tx Tcp 192.168.0.3 17.940 96,000,000 42,809.628

Even at less than half the CPU speed at F_CPU=151 MHz - it is doing well and transferring OVER half TCP … 30 versus 18 seconds transfer time:
Code:
[B]Looped: 1405745  LoopedUSB: 3192[/B]  FNETMemFree: 129856  LinkSpeed: 100BASE	 F_CPU=151200000	deg  C=45
Megabytes: 96.000000  Seconds: 32.6300  KBits/Sec: 23536.6227
…
[B]Looped: 1342345  LoopedUSB: 3129[/B]  FNETMemFree: 112784  LinkSpeed: 100BASE	 F_CPU=151200000	deg  C=44
Megabytes: 96.000000  Seconds: 28.3000  KBits/Sec: 27137.8092

Not doing transfer the LOOP counts show:: Looped: 1547614 LoopedUSB: 3402

And from fBench with Udp logged too showing it is still near ONE SECOND as before:
Code:
Tcp	192.168.0.3	32.607	96,000,000	23,553.396
Tcp	192.168.0.3	28.252	96,000,000	27,183.614
[B]Udp	192.168.0.3	0.975	96,000,000	787,940.085
Udp	192.168.0.3	1.014	96,000,000	757,536.303[/B]
 
Oddly enough packing the bytes didn't actually make it any faster, reason being that it has to use the callback timer to say when to send a message if the buffers not full and that timer has to be as short as possible so the latency stays down, but also long enough that it can fill the buffers. By default the timer function doesn't even go low enough to what I would like it to without editing the minimum value in the host code and after testing a few different timer values I've come to a consensus that it's not even worth pursuing that and to just stick with the existing way that it sends the buffers.
 
Oddly enough packing the bytes didn't actually make it any faster ...

Good you put that off until it was stable to test against. I suppose there may be some subset of 'right size' and 'right regularity' that could be the perfect solution. But easy to see cases where partial buffer could also get in the way or be the wrong size and the added overhead/complexity.
 
just starting to experiment. cloned latest from github. example compiles OK for T4, but fails for T3.6 (1.8.10 1.48)
Code:
/u1/home/linux/arduino-1.8.10/hardware/tools/arm/bin/../lib/gcc/arm-none-eabi/5.4.1/../../../../arm-none-eabi/bin/ld: /tmp/arduino_build_419500/ASIXEthernet_Test.ino.elf section `.bss' will not fit in region `RAM'
/u1/home/linux/arduino-1.8.10/hardware/tools/arm/bin/../lib/gcc/arm-none-eabi/5.4.1/../../../../arm-none-eabi/bin/ld: region `RAM' overflowed by 28768 bytes
 
I figured it would probably fail for the T3.6 with the current buffer sizes, the buffers can be lowered you just won’t get the same speeds, not to mention that from test the T3.6 seemed to never get the same speeds even when they were both using the same buffer size.
 
just starting to experiment. cloned latest from github. example compiles OK for T4, but fails for T3.6 (1.8.10 1.48)
Code:
/u1/home/linux/arduino-1.8.10/hardware/tools/arm/bin/../lib/gcc/arm-none-eabi/5.4.1/../../../../arm-none-eabi/bin/ld: /tmp/arduino_build_419500/ASIXEthernet_Test.ino.elf section `.bss' will not fit in region `RAM'
/u1/home/linux/arduino-1.8.10/hardware/tools/arm/bin/../lib/gcc/arm-none-eabi/5.4.1/../../../../arm-none-eabi/bin/ld: region `RAM' overflowed by 28768 bytes

Good to see you here @manitou - remembered right it was K66 testing here : teensy3/blob/master/etherraw.ino - apparently this library set has direct MCU ethernet support and was suggested the PJRC K66 ethernet shield could be made to work? That wouldn't help T4 … but T_3.6 maybe and it sounds like the 1170 maybe as well by then?

Some tests here with T_3.6 and USB/LAN - but mostly T4. T_3.6 is how vjmuzik was mostly running it seemed before getting carried away with RAM :) p#256 above shows speed for T4 at 151 MHz - wonder how that compares to T_3.6 with same RAM - looking at Frank B T_3.6 Teensy64 - but not testing now … it is so late it is early again. The T4 does seem to have some useful optimizations.
 
The original library does have direct support, but that support would have to be rewritten to work with Teensyduino, but we don’t have to use the original libraries code to use the hardware. You can take the existing code such as from your etherraw that initializes the hardware and just direct the raw Ethernet frames to the FNET stack, then direct the stack output to the hardware. There are some other functions that would have to be defined that read and write to the PHY, to properly support some of the library functions such as multicast. I made sure to make the library as easy to interface with other hardware as possible so it can be used with other transfer mediums such as the built in Ethernet port or even a w5500 if you so desired. The only requirement is that the transfer medium has to be setup somewhere else instead of the FNET library doing it like it was originally designed to do, that’s where my usb driver comes in. So if someone gets a usb WiFi driver going they’re driver would have to handle the scanning and connection to a network and then FNET processes the Ethernet frames from that network.
 
I definitely want to see how well the gigabit Ethernet on the 1170 performs if we have access to it, I would like for it to be available directly onboard so I don’t lose I/O pins just to support Ethernet, but I understand Ethernet is more of a niche market so it probably doesn’t make sense to have a dedicated port for it if not many people are going to use it.
 
Here are some results of tests I did on T4@600mhz and T3.6@180mhz with the amazon USB Ethernet dongle. Tests are with linux/mac boxes and windows (fbench and cygwin programs) on a 100T local area network. I'm running arduino 1.8.10 with teensyduino 1.48 using USBHost_t36 lib.

Here are some TCP and UDP numbers using packet sizes that I have used in other Wiznet and Ethernet tests.

Code:
  T4/T3.6 with USB ethernet  1.8.10 1.48  100T ethernet
                       T4  T4nt     T3.6   T3.6nt
UDP latency(us)      5694  1773    10854     2648  8-byte UDP RTT
UDP send (mbs)         71    95       64       59  20 1000-byte packets
UDP recv (mbs)         27    40       67       51
UDP pps              7205 32331     3325    19361  blast 1000 8-byte packets

TCP send (mbs)         39    78       19       44  100 1000-byte 
TCP recv (mbs)         40    30       17       20

ping (avrg ms)          2              2
   32KB buffers, nt is "no thread"

Observations:
  • USB dongle MAC adddress is 00:50:b6:be:aa:e2 (from ARP table)
  • To get the example ASIXEthernet_Tests to measure in-bound UDP I had to change SOCK_STREAM to SOCK_DGRAM in bench_srv_init() in Functions.ino. The UDP receiver consumes the first UDP packet as "connect" and terminates when a packet of length 1 arrives. FWIW, the FNET UDP buffer is 2048 bytes. I don't know why the T4 UDP receive rate is slower than T3.6??
  • I tested the Windows fbench.exe against some linux network programs. I couldn't get fbench UDP send rates to exceed 10 mbs, even though fbench TCP was able to transmit at wire speeds (98 mbs). A cygwin UDP transmitter on the Windows box could send at wire speeds ?
  • TCP data rates are sensitive to TX and RX buffer sizes as noted in the table above. Here is a plot of T4 TCP transmit
    tcptrace.gif
    The white tick marks are segment transmissions/ACKs. The slope of those transmissions is about 86 mbs (32768 bytes over 2.6 ms), and then the T4 pauses for about 3.3 ms, so the effective rate is only 44 mbs. For the T3.6 (not shown) the graph showed 8192 bytes over 1.1 ms (60 mbs), but the delay between bursts was 10 ms (!) resulting in 6mbs effective data rate. (? the TeensyThreads time slice is 10 ms for T3.6, 5 ms for T4?).
    T3.6 update: I reduced heap to 64KB and increased T3.6 TX/RX buffers to 32KB, and I was able achieve 19 mbs. The TCP plot still shows T3.6 idle for 10ms, but the slope of the data transfer portion is 37 mbs (7 ms for 32KB), effective is 15 mbs.
  • power tests with hacked USB cable and meter: T4 idle 102 ma, total with USB Ethernet active 276 ma. T3.6 power 68 ma and 255 ma.
  • I need to figure out how to make some simple socket-based FNET examples that doesn't require all of the service abstractions. And configure static IP address.
  • I hacked SNTP example (no threads) to measure 8-byte UDP RTT: T3.6 2648 us, T4 1773 us.
  • No threads. I disabled using threads in the example. The T4nt column shows the improvement in network performance. Only TCP receive had a lower performance because T4 is dropping TCP packets (32 retransmits). The T3.6 UDP latency dropped to 1066 us.
  • I added an echo service (UDP or TCP), T3.6 UDP latency (RTT) for 8-byte packet is 1651 us, no thread, (TCP 2000 us).

References:
usbether.gif
 
Last edited:
There is definitely a learning curve to FNET being that there really isn’t any documentation available on how to do certain things like create your own sockets. The author of FNET just tells people to use one of the existing services as an example on how to create your own socket service, I’ve been looking through the benchmark service myself to try and figure it out for my own application needs. The socket api can be found on this page, being that it’s BSD-like it’s probably easier to find a tutorial on how to create BSD sockets in general and adapt it to FNET’s commands, but I haven’t looked into that myself.
 
Thanks. I'm familiar with the BSD socket API (all of my linux/macos/cygwin network tools are based on BSD sockets). The FNET src/service/bench/ code has examples of most of the TCP/UDP socket calls. The FNET details on polling and callbacks is what I need to figure out. The T3.6 native ethernet and lwIP beta testing we did was based on callbacks and polling.
 
There is a function that adds the services to a polling list that gets called from the usbthread, all of that is handled by the init function for the service you are trying to add. The specific command is fnet_service_register, the first parameter specifies a function that is to be called for polling your service and the second parameter can be whatever you want it to be. Most services use the second parameter for an interface descriptor so that you can have multiple instances of the same service by just having different interfaces instead of creating a new object class. Once the service is registered all of the polling is handled for you, as for callbacks those aren't specific to fnet and you don't even need callbacks to run a service, that being said all of the callbacks that the existing services use are defined in their interface descriptor so that each instance can have their own user defined callbacks or they can be left as null if the user doesn't want to use them.
 
Here are some results of tests I did on T4@600mhz and T3.6@180mhz with the amazon USB Ethernet dongle. Tests are with linux/mac boxes and windows (fbench and cygwin programs) on a 100T local area network. I'm running arduino 1.8.10 with teensyduino 1.48.

...
Observations:
  • power tests with hacked USB cable and meter: T4 idle 102 ma, total with USB Ethernet active 276 ma. T3.6 power 68 ma and 255 ma.

Re POWER tests - with copper heat sink this was at 912 (?) MHz - posted on T4_Beta thread:
Code:
>> Starting two T4's hit 208 or 215 mA as it comes up before USB LAN adapter is connected.
- nothing else connected PJRC beta has a green LED - otherwise current is just higher MCU MHz and USBHost device active.
- Did this measure because if the USB LAN adapter is connected at startup it fails to ever connect when OC'd to 912 MHz

>> To the running T4 connect the USB Adapter and current goes to 368 to 384 mA and connect to LAN
- This is where running temp with heat sink hits 73-78 degrees C

Throughput didn't improve IIRC over 600 MHz or dramatically over 396 MHz { Tcp did drop down w/151 MHz - but Udp about the same } - but there would be a couple more spare cycles for other stuff - and another 100 ma burned off to warm the heat sink.
 
I figured it would probably fail for the T3.6 with the current buffer sizes, the buffers can be lowered you just won’t get the same speeds, not to mention that from test the T3.6 seemed to never get the same speeds even when they were both using the same buffer size.

For the T3.6, I reduced the heap to 64K and then I could use 32KB buffers for TCP. With 8 KB buffers TCP was only getting 5mbs, with 32KB buffers TCP now transfers at 19 mbs on T3.6 -- still not as fast as T4. I updated my table in post #264
 
I got a second AMAZON LAN USB dongle today. What can I do with 2 Teensys trading messages to test throughput?

Is the fBench Send/Receive symmetric WRT how the Teensy responds send and receive to count messages?

@manitou - how about the p#264 type tests?

Wondering if these adapters recognize when they are plugged straight through and swap the send receive without a crossed cable? Would need Teensy to come online with static IP and then they could PING each other without router doing attack detection - if Teensy had a ping ...
 
I haven't tried static IP yet. The two dongles should talk to each other -- I don't know if you could wire them to each other. Try it first with them hooked to a switch/router.

The benchmark example should work between the two dongles, though fbench UDP transmit sends a dummy 2 byte datagram before sending the requested stream, then terminated by 1-byte datagrams. the Teensy UDP transmit does not start with 2-byte datagram so receiver will consume one of your "good" datagrams as a "connect" packet. The Teensy UDP transmit does finish with 1-byte datagrams to signal end of test.

I got the FNET SNTP demo working by hacking a copy of ASIXEthernet_Test example. I haven't had luck with a raw socket lib example -- sendto() freezes up Teensy. I suspect there are underlying callbacks that need to be supported.
 
Yes the adapter has auto crossover detection, there is a parameter for a static IP address for the DHCP service as well as an autoip service which is the one that should be used for direct connection between devices. I don’t think both services can be running at the same time so either you can compile for one or the other, add a command for the serial line to switch between the two, or program an external button to switch it. TCP tests should be able to be sent between devices without changing anything, although for UDP you would have to recompile for it by changing the parameter for the fbench client.
 
No threads. I disabled TCP threads in the T4 network bench example (added a T4nt column in table in post #264). Performance numbers improved for everything except TCP receive (further study needed). Although the preemptive TeensyThreads (T3.6 10 ms timeslice, T4 5 ms) would be useful where sketch had lots of other work to perform, the thread model reduces network performance.
 
Last edited:
I’m not sure if there would be any improvements to speeds if multi threading was disabled in the FNET library, it didn’t seem to make a difference when I first activated it, but at the time I was still using a T3.6 and the transfer speed didn’t lessen. I figured there would be some latency added by doing multiple threads, but I mainly wanted that for debugging purposes so I could see what was causing lockup’s without the whole Teensy freezing and it came in handy in a couple situations. For some other projects I’m going to be doing I do plan on making them multithreaded so I also wanted to be able to support that from the beginning so I didn’t run into problems later.
 
not to mention that from test the T3.6 seemed to never get the same speeds even when they were both using the same buffer size.

One of the reasons T4 was faster than T3.6 is that T4 TeensyThreads was using a 5 ms timeslice -- Teensy 3* boards run with a 10 ms threads timeslice. The T4 5 ms timeslice was a bug, see https://forum.pjrc.com/threads/4150...-first-release?p=219088&viewfull=1#post219088

the latest TeensyThreads from github fixes the T4 timeslice bug.
 
Back
Top