ethernet shield testing on K66/K64 sticky post (i'll continue to update this post with new data ...)
running 1.6.9 with 1.29beta4 on ubuntu (32-bit), lwIP 1.4.0
K66 beta PROTO6 board (and K64 beta, 8/31/16)
ether shield with PHY LAN8720A, RJ45, two LEDs, 3.3v/gnd, and 12 K66 pins (8 required): 3,4,24-28,39, (16-19)
Raw ethernet tests:
Configuration summary:
Code:
F_CPU 120000000
192.168.1.17
enetbufferdesc_t size = 32
rx_ring size = 384
buffer size 1520
RX buffers 12
TX buffers 10
MDIO PHY ID2 (LAN8720A should be 0007): 7
MDIO PHY ID3 (LAN8720A should be C0F?): C0F1
PHY control reg 0x3100 100mbs, auto negotiate, full duplex
PHY status reg 0x7829
PHY reg 17 0x2
MPU_RGDAAC0 0x37DF7DF
SIM_SCGC2 0x1
SIM_SOPT2 0x3D10C0
ENET_PALR 0x4E9E500
ENET_PAUR 0x18808
ENET_EIR 0x0
ENET_EIMR 0x0
ENET_ECR 0xF0000112
ENET_MSCR 0x1E
ENET_MRBR 0x5F0
ENET_RCR 0x45F2D104
ENET_TCR 0x104
ENET_TACC 0x1
ENET_RACC 0x80
ENET_MMFR 0x60023100
Running extensions to Paul's original
raw ethernet sketch. My
etherraw.ino sketch, a monolithic menagerie of various low-level tests with hand-crafted packets.
- ARP request(broadcast), reply, respond OK, (192.168.1.17) at 04:e9:e5:00:00:01. request/response time 115 us. could build ARP table
- ICMP/ping reply OK. ping time from linux host: rtt min/avg/max/mdev = 0.125/0.129/0.132/0.011 ms (print's disabled in sketch)
- UDP receive test: linux box sends 20 1000-byte UDP packets as fast at it can. k66 receives all in 1620us (bout 98 mbs), receiver clock is started when first packet arrives.
- UDP blast : k66 sends 20 1000-byte packets (with sequence numbers) to UDP sink program on linux, linux measured 96 mbs.
- UDP echo reply, RTT avrg 0.000142 min 0.000090 max 0.000483 seconds for 8-byte pkt, initiated from linux host. and with simple K66 sendto/recvfrom echo'd back from linux: min 102 max 230 avrg 119 us
- UDP NTP query using sendto/recvfrom
- total run power: no shield, just K66@120MHz beta board 50 ma; shield no LEDs, not running? 71ma; shield running (2 LEDs) 155 ma. LAN8720A PHY spec is 100 ma, with power-down command option (4+ ma).
power measured through hacked USB cable. (also see K64 ether power)
Turn off power to PHY withmdio_write(0,0,0); // auto negotiate off
mdio_write(0,0,0x0800); // power down
total power drops to 56 ma. (shield LEDs off)
- PHY access seems to work with TA(2) or TA(0), ref says TA(2)
- PROMiscuous mode is set? doesn't seem to work without it ? FIX: sketch was setting PAUR incorrectly, should be ENET_PAUR = ((MACADDR2 << 16) & 0xFFFF0000) | 0x8808;, then you can disable PROM in RCR
- enable hardware checksum insertion in TACC and TX descriptors. you MUST zero outgoing packet's IP and UDP/TCP/ICMP checksum fields. ? first IP header checksum bad for UDP blast. pkts > 198 bytes have bad checksums (0) ?? TODO

- since hardware checksum is flakey, added software checksums to sketch, setting UDP checksum field to 0 skips calculation
- enabled RX and TX interrupts,, ENET_EIMR 0xA00000, just counting for now
- modified output() to keep trying til ring output buffer is available
- etherraw sketch works with female headers on K64 beta (teensy 3.5) 8/31/16
Code:
void udp_ntp(int reps, int ms) {
int i, sport, t;
uint32_t secs;
IPAddress sender;
uint8_t buff[48] __attribute__ ((aligned(4)));
UDP_lth=0;
for (i=0;i<reps; i++) {
buff[0] = 0x1b; // ntp query
sendto(buff,sizeof(buff),4444,manitou,123);
while(UDP_lth==0) check_rx(); // poll ether ring
recvfrom(buff,sizeof(buff), 4444, &sender, &sport);
secs = *(uint32_t *) (buff+40);
Serial.print("ntp "); Serial.println(swap4(secs));
t=millis();
while(millis() -t < ms) check_rx(); // active delay
}
}
- To simulate TCP performance, a UDP transmit function was configured like lwIP (MSS=1460, max window 2*MSS) and uses a "slow start" of a full window of data, sending 1460-byte packets. The window size and RTT latency will determine TCP bulk transfer rates. With a simple linux UDP "ack" program (no delayed ACK), our K66 TCP-like transfer rate was 58 million bits/second (mbs) on the home wired-Ethernet. Since the K66 hardware checksums are flakey, we are calculating IP header checksum in software, and using 0 for UDP checksum.
We include a micros() timestamp in each packet to measure RTT.
RTT stats: 1000 pkts min 349.000000 max 6142.000000 avrg 408.313000 (microseconds)
So without the slow-start blast, the data rate would have been 8*1460/349=33 mbs. Increasing the window to 4*MSS increased the throughput from 58 to 85 mbs. The RTT jitter is caused by other traffic (broadcasts) on the home net, or the linux host having something better to do.
Benchmarks: the table below compares lwIP on mbed with K66 to date.
Code:
K66@120mhz K66 mbed K64@120mhz mbed LPC1768@96mhz
raw ether lwIP lwIP+RTOS lwIP+RTOS
UDP latency(us) 142 183 288 292
UDP send (mbs) 96 85 52 40
UDP recv (mbs) 98 67 4 2
TCP send (mbs) 58* 59 26 25
TCP recv (mbs) 51 21 19
- UDP latency RTT for 8-byte payload
- UDP send: blast 20 1000-byte packets, rate measured at receiver
- UDP recv: rate limit linux sends til MCU receives 20 1000-byte pkts, no losses
- the poor lwIP-RTOS UDP recv rate is caused by buffer management and thread management, the lwIP-RTOS UDP can receive 7 1000-byte packets at wire speed
- UDP blast 1000 8-byte packets: 66534 pps
- mbed lwIP uses MSS 1460, and TCP window of 2*MSS
* the TCP send for raw ether uses TCP-over-UDP described above
(using lwIP UDP on mbed K64F, faux TCP-over-UDP gets 41mbs, min RTT 549us)
The etherraw sketch is only a proof-of concept, providing insights for integrating the K66 ethernet with lwIP. The sketch uses 43K of flash and 39K RAM. One could develop a raw Ethernet API to do UDP by adding proper ARP management, transmit packet construction, receive buffer management, multiple streams, and handling gateway forwarding.
The proto beta K66 does not have unique MAC address in ROM, so MAC address is hardwired in sketch. Like other teensy 3's, the production K66 should provide unique MAC address from ROM (beta3 and later).
lwIP tests: (no RTOS NO_SYS=1)
Working with lwIP 1.4.0 and using Makefile with teensy3 core, I have developed some TCP/UDP examples, see
https://github.com/manitou48/teensy3/tree/master/k66lwip
The raw API (no RTOS) requires polling the Ethernet hardware and callbacks. I am not sure how to integrate the library into the IDE. There are lots of lwIP tuning options. Particularly for TCP, a lot of work is required to manage timers, buffers, and packet arrivals. One can appreciate the
advantages of network co-processors like wizNet, WINC1500, and ESP8266.
The
memory usage for the test sketch is 52KB of Flash and 40KB of RAM. Included in the RAM is 34KB for the Ethernet ring DMA buffers. (For comparison, memory usage for a similar mbed K64F program lwIP+RTOS is 58KB Flash and 55KB RAM.) Packet buffers are allocated/freed from the heap (malloc()), and additional stack RAM is consumed for automatic variables. memcpy() is used to move between ring buffers and packets. The mbed K64F RTOS lwIP uses zero-copy, but that requires additional house-keeping to reclaim transmit buffers.
- k66 lwIP recognized ARP request and replied. K66 issued ARP request and handled reply. Using static IP address.
- ICMP reply (ping) working, rtt min/avg/max/mdev = 0.127/0.135/0.231/0.021 ms
ICMP port unreachable OK
- telnet to k66 is properly rejected with TCP reset packet
- k66 will forward traffic through gateway
- UDP echo 8-byte RTT = 183 us, 20x1000 recv = 67 mbs, UDP send blast 20x1000 = 85 mbs BUT had to do a UDP echo first to establish ARP for target ?
(see table above). UDP NTP query ok, code snippet below
Code:
ether_init("192.168.1.23","255.255.255.0","192.168.1.1");
...
void ntp_callback(void * arg, struct udp_pcb * upcb, struct pbuf * p, struct ip_addr * addr, u16_t port)
{
if (p == NULL) return;
if (p->tot_len == 48) {
uint32_t secs = ((uint32_t *) p->payload)[10]; // NTP secs
Serial.println(swap4(secs));
}
pbuf_free(p);
}
void udp_ntp(int pkts) {
int i;
struct udp_pcb *pcb;
pbuf *p;
uint32_t ms;
ip_addr_t server;
inet_aton("192.168.1.4", &server);
pcb = udp_new();
udp_bind(pcb, IP_ADDR_ANY, 4444); // local port
udp_recv(pcb,ntp_callback,NULL /* *arg */); // do once?
for(i=0; i<pkts; i++) {
p = pbuf_alloc(PBUF_TRANSPORT, 48, PBUF_RAM); // need each time?
*(uint8_t *)p->payload = 0x1b; // NTP query
udp_sendto(pcb,p,&server,123);
pbuf_free(p);
ms=millis(); // ether delay
while(millis()-ms < 5000) ether_poll();
}
pbuf_free(p);
udp_remove(pcb);
}
- TCP client and server working. TCP recv rate 51 mbs, but the K66 lwIP TCP send rate is less than 1 mbs, so some tuning required, buffer management? The sending packet instantaneous data rate has some pauses (500ms) with some peak rates of 24 mbs. See graphs in post #875. Tuning fix: Leaving the TCP fast-timer at 250 ms, increasing the TCP window to 4*MSS, and adding tcp_output(), the TCP send rate is 59 mbs. With the larger window, TCP receive rate increases to 81 mbs. See table above.
- DHCP enabled and tested OK
- tested OK with 1.6.11 and 1.30beta3 8/24/16
- tried breadboard/jumpers of shield to beta3 board, failed. breadboard+jumpers not suitable for 50MHz RMII?
6/29/16. defragster reports beta3+female-headers+shield OK
- lwIP works with female headers on K64 beta, change Makefile to build for K64 8/31/16
- echosrv.ino, a TCP and UDP echo server works. UDP works, but TCP hangs when connect is not within 20 s ?
- websrv.ino works and turns LED on/off 9/15/16
- tcpecho_raw.ino derived from someone else's sketch, works (hack SYN_RCVD timeout to avoid 20s connect problem)
- testing lwIP 1.4.1 9/16/16
- make also works on MACOS (change teensy3 and tools symbolic links), make failed on windows/cygwin
- lwIP multicast tests with sketch src/mtalk.ino showed could transmit and receive multicast. Use mbed driver's code to set GAUR/GALR registers for CRC/hash of multicast MAC/group.
- Using stepl's ether_lwip.zip in the IDE and modifying boards.txt
teensy35.build.flags.common=-g -Wall -ffunction-sections -fdata-sections -nostdlib -I/myhome/sketchbook/libraries/lwip/src/include
I was able to build lwIP 2.0.2 sketch with IDE and tested httpd, ftpd, tftpd with SdFat on uSD (8/10/17). binary-mode fetch from uSD of SDTEST4.WAV (17173152 bytes) took 184.6 s with ftp (TCP), 8 secs with tftp (UDP), but only 3.7s with the browser or wget http://192.168.1.19/SDTEST4.WAV (37mbs). FIX:to lwip_ftp.cpp add tcp_nagle_disable(pcb); in ftpd_dataconnected(), ftp takes 2.38 s (58 mbs). Also tested apps/sntp, apps/iperf, DHCP, and DNS. Below is current consumption of T3.5 with ethernet shield from power up to doing 6.5s TCP transmit (6.6 seconds), sample rate is 100 ms. With no shield, an idle T3.5@120mhz consumes about 58 ma.

lwIP TODO:
- lwIP tuning options: memory/pbuf's, LITE options, memcpy, checksum (integrate ether hardware checksum, if working)
- if needed, enable/test: IP frag
- DEBUG and stats_display() want to use printf
- integrate yield() with ethernet polling
- mbed uses zero-copy (need to reclaim TX buffers)
- integrate multicast CRC/hash into lwIP API
- integrate lwIP into IDE and add a little "class"
Maybe someone more capable can figure out how to build lwIP in the IDE. My current thought is to use make to create liblwip.a, copy that into hardware/tools/arm/arm-none-eabi/lib/, and add -llwip to k66 build options in boards.txt. Then the IDE library would just need the lwIP include files.
Update: see
IDE with lwIP 2.0.2
Paul would like to have the shield API be a drop-in replacement for existing Arduino Ethernet interface (maybe based on
UIPEthernet?).
Unresolved: ?
- ethernet hardware checksums -- OK with stepl's lwIP 2.0.2
- lwIP TCP server will hang if not connected to in 20s? -- OK with lwIP 2.0.2
- shouldn't call ether_poll in a callback, serialization violation (ooops, i do it in some sketches)
- develop appropriate API
github sources:
References:
- shield PCB and parts info
- PHY LAN8720A data sheet
- lwIP info and downloads
- lwIP http://lwip.wikia.com/wiki/Porting_for_an_OS
- lwIP http://lwip.wikia.com/wiki/Porting_For_Bare_Metal
- lwIP http://lwip.wikia.com/wiki/LwIP_with_or_without_an_operating_system
- lwIP Raw/native API polling and callbacks, TCP raw echo example or here
- lwIP 2.0 API
- lwIP for STM no RTOS
- mbed K64 lwip-eth or here
- my mbed K64 testing TCP/IP with RTOS
- FNET TCP/IP stack for native Ethernet on K66, K64, 1062
- earlier anecdotal wiznet testing SPI-limited, and mbed K64 ethernet/lwIP and WiFi/Ethernet power graphs
- xxxajk SLIP IP stack
- IP stacks for ENC28J60: UIPEthernet (Arduino-like), primitive ethercard (poll/callback)
- other posts in this thread: #585 #593 #680
- 8/8/17 stepl's https://forum.pjrc.com/threads/45647-k6x-LAN8720(A)-amp-lwip lwIP 2.0.2, SdFat, ftpd, tftpd, httpd, VisualStudio
K64 with ethershield
