@manitou Thanks for that! Well heck, looks like everything's done then. Is there anything useful I can add at this point or should I move on to something else? Maybe work on getting the existing Arduino Ethernet library integrated with the T4.1 hardware, or is the plan to just move to lwip for everything in the future? Thanks.
Paul's hope has always been (since T3.6 beta testing) wanting to provide an arduino-like library for the T3.6/T4.1 native ethernet. If you can do that, you will receive great adulation. Implementing a TCP/IP suite with timers, interrupts, and memory management is no small task and makes one appreciate off-loading all of that to a co-processor (e.g. Wiznet SPI ethernet). Here are some notes from the "private" thread used during pre-beta T4.1 testing
-----------
Ethernet and lwip testing and performance
I'll summarize my lwIP testing as I add to this
sticky post
The beta teensy4.1 came with an
RJ45 jack/PCB kit with 6 pin ribbon cable. The cable is attached to the 6 pins on the T4.1 as shown
here
Initial testing was done in February 2020 with Arduino 1.8.11 and TD 1.51-beta1
low-level Ethernet tests:
Paul provided a low-level
sketch to configure the T41 ethernet interface. The interface is configured with 12 RX and 10 TX ring descriptors using 512-byte packet buffers. The sketch provides
ARP reply and
ICMP echo (ping) reply using a fixed IP and MAC address. The sketch listens in promiscuous mode (ENET_RCR_PROM) and prints incoming packets (broadcasts and multicasts). ICMP ping round-trip times were 396 us, slowed somewhat by the packet printing.
Code:
PLL6 = 80202001 (should be 80202001)
GPR1 = 80020018
RCSR:0061, LEDCR:0480, PHYCR 8000
RCSR:00A1, LEDCR:0280, PHYCR 8000
enetbufferdesc_t size = 32
rx_ring size = 384
MIBC=40000000
ECR=F0000000
ECR=F0000112
MDIO PHY ID2 (LAN8720A is 0007, DP83825I is 2000): 2000
MDIO PHY ID3 (LAN8720A is C0F?, DP83825I is A140): A140
BMCR: 3100
BMSR: 786D
I extended Paul's sketch, adding hand-crafted UDP packets to create
etherraw.ino based on 2016
K66 beta ethernet testing. I disabled promiscuous mode and increased packet buffers to 1536 bytes and updated to use single ISR to count packets. The 1062 has an option to coalesce packet interrupts, but I'm still using legacy mode.
- ARP request(broadcast), reply, respond OK, (192.168.1.17) at 04:e9:e5:00:00:01. request/response time 93 us. could build ARP table
- ICMP/ping reply OK. ping RTT time from linux host: 120 us (print's disabled in sketch)
- UDP receive test: linux box sends 20 1000-byte UDP packets as fast at it can. T41 receives all in 1660 us (bout 96 mbs), receiver clock is started when first packet arrives.
- UDP blast : T41 sends 20 1000-byte packets (with sequence numbers) to UDP sink program on linux, linux measured 96 mbs.
- UDP echo reply, RTT for 8-byte pkt with simple T41 sendto/recvfrom echo'd back from linux: 93 us, with T41 echoing, 98 us.
- UDP NTP query using sendto/recvfrom, RTT 182 us
- modified output() to keep trying til ring output buffer is available
- To simulate TCP performance, a UDP transmit function was configured like TCP (MSS=1460, max window 2*MSS) and uses a "slow start" of a full window of data, sending 1460-byte packets. The window size and RTT latency will determine TCP bulk transfer rates. With a simple linux UDP "ack" program (no delayed ACK), our T41 TCP-like transfer rate was 70 million bits/second (mbs) on the home wired-Ethernet. Increasing the window to 4*MSS increased the throughput to 95 mbs. Since the T41 hardware checksums are flakey (?maybe), we are calculating IP header checksum in software, and using 0 for UDP checksum.
We include a micros() timestamp in each packet to measure RTT.
RTT stats: 100 pkts min 330 max 463 avrg 335 (microseconds)
The RTT jitter is caused by other traffic (broadcasts) on the home net, or the linux host having something better to do.
T41 ethernet registers:
Code:
PLL6 = 80202001 (should be 80202001)
GPR1 = 80020018
RCSR:0061, LEDCR:0480, PHYCR 8000
RCSR:00A1, LEDCR:0280, PHYCR 0000
enetbufferdesc_t size = 32
rx_ring size = 384
MIBC=40000000
ECR=F0000000
ECR=F0000112
MDIO PHY ID2 (LAN8720A is 0007, DP83825I is 2000): 2000
MDIO PHY ID3 (LAN8720A is C0F?, DP83825I is A140): A140
BMCR: 3100
BMSR: 786D
ENET_PALR 0x4e9e500
ENET_PAUR 0x18808
ENET_EIR 0x0
ENET_EIMR 0xa000000
ENET_ECR 0xf0000112
ENET_MSCR 0x12
ENET_MRBR 0x600
ENET_RCR 0x45f25104
ENET_TCR 0x104
ENET_TACC 0x1
ENET_RACC 0x80
ENET_MMFR 0x6004786d
lwIP tests: (v2.0.2, no RTOS NO_SYS=1)
I ported lwIP (v2.0.2) from 2016
T3.5/T3.6 beta ethernet testing. I updated the low-level interface to support the T41 ethernet and configured 5 Tx and 5 Rx ring descriptors with 1536-byte packet buffers. lwIP uses
polling and callbacks (no RTOS) and provides ARP, IP, ICMP, UDP, TCP, DHCP, DNS, and multicast. There are numerous tuning and configuration options (lwipopts.h). Hardware checksums appear to be working with lwIP.
I developed test sketches for NTP, DNS, multicast, web server, web client, httpd with SD or SdFat-beta lib, tfttp server (SD, SdFat-beta, or SPIFFS), ftpd (get/put) with SD lib, and TCP/UDP client/server benchmarking.
TCP performance improves from 59 mbs to 81 mbs by increasing TCP window to 4*MSS in lwipopts.h. Some of the benchmark results are reported in the table below (column T41e is the Teensy 4.1 ethernet).
Code:
Ethernet performance
T41e 1062SDK T41fnet T41QNE T41USBe T35e T4+W5500 1170 info
TCP xmit (mbs) 73 87 92 86 78 59 9 84
TCP recv (mbs) 93 71 78 78 30 81 11 91
UDP xmit (mbs) 97 97 97 89 95 85 11 98 blast 20 1000-byte pkts
UDP xmit (pps) 149476 137453 152052 152146 32331 66534 21514 149146 blast 1000 8-byte pkts
UDP recv (mbs) 91 95 96 94 40 67 9 99 no-loss recv of 20 1000-byte pkts
UDP RTT (us) 94 104 104 162 1651 183 150 250 RTT latency of 8-byte pkts
ping RTT (us) 120 108 103 162 2000 127 82 315
ePower (ma) 59 100 59 59 174 100 132 ethernet module current
tests on 100mbs full-duplex Ether with linux box on switch
T41fnet and T41USBe FNET TCP/IP native (arduino wrapper) and USB host, no threads
T41 QNEthernet, arduino API wrapper for lwIP 12/22/21
W5500 SPI @37.5MHz, 2KB buffers
1170 lwIP 100T SDK -O3 @996MHz 9/15/21
The
memory usage for a sketch is 66 KB flash and 148 KB RAM. RAM usage includes 15 KB of Ethernet ring DMA buffers. Teensy also copies flash into RAM for faster execution. Packet buffers are allocated/freed from the heap (malloc()).
Below is
power consumption of T4.1 with Ethernet. After powering up, the sketch delays a few seconds before transmitting TCP packets for about 8 seconds. Power stays high, even after the transfer is complete. A temperature probe on the PHY chip measures 40℃ .
Turn off power to PHY with
mdio_write(0,0,0); // auto negotiate off
mdio_write(0,0,0x0800); // power down
builtin microSD and native ether performance
Code:
microSD read times (seconds) (4.2MB file, 2KB reads)
T4 SD lib T3.5 SdFatv1 T4.1 SdFatv2 T4.1 SPIFFS EFLASH
tftp 4.76 2.2 2.2 s 2.3 s
http 3.6 0.9 0.9
ftp 2.8 0.6
read() 2.76 0.26 0.19 s 0.2 s
lwIP TODO:
- upgrade to latest lwIP
- lwIP tuning options: memory/pbuf's, LITE options, memcpy
- IP frag is enabled, but not tested
- lwip_servers (httpd, ftpd, tftpd) with SdFat fails to build (worked on T3.5 SdFatv1) ? needs T4/VFS support
- DEBUG and stats_display() want to use printf
- integrate yield() with ethernet polling
- coalesce interrupts and use DMA, cache management (NXP SDK fsl_enet.c)
- mbed uses zero-copy (need to reclaim TX buffers), DMA?
- integrate multicast CRC/hash into lwIP API
- bettter integrate lwIP into IDE
- develop Ethernet/TCP/IP stack? lwIP variations? FNET? Paul would like to have the shield API be a drop-in replacement for existing Arduino Ethernet interface (maybe based on STM32duino lwIP or UIPEthernet?). Done? see T4.1 FNET and Arduino API wrapper 2020
lwIP updates:
6/23/20 D Drown refactors lwIP include files so no modifications to boards.txt is required to build lwIP sketches with the IDE. Lib now supports 1588 time stamps. Paul's
repository updated from
https://github.com/ddrown/teensy41_ethernet. Also see
NTP server with GPS PPS and 1588.
8/30/21 QNEthernet, shawn's lwIP-based Ethernet lib
https://forum.pjrc.com/threads/68066-New-lwIP-based-Ethernet-library-for-Teensy-4-1 platformio
FNET and NativeEthernet tests: May, 2020
An
Arduino-like API has been provided by @vjmuzik atop of
FNET TCP/IP. The API includes the
Arduino TCP/IP examples. I have tested ARP, ICMP, DHCP, DNS, multicast, and various TCP/UDP client/servers. The
memory usage for a sketch is 88 KB flash and 118 KB RAM. RAM usage includes 6 KB of Ethernet ring DMA buffers, and Teensy copies flash into RAM. Packet buffers are allocated/freed from the heap (malloc()). Performance results are presented in the table above (T41fnet column). Increasing FNET_SOCKET_DEFAULT_SIZE from 2048 to 4*1460 in NativeEthernet.h improved T4.1 TCP recv from 60.5 mbs to 78.4 mbs and TCP xmit from 45.2 mbs to 92 mbs! Edit: there are now function calls to set buffer sizes.
fnet_perf.ino and tftp servers
fnet_tftpd (with SD lib) or
fnet_tftpd_SPIFFS
See discussions on
NativeEthernet thread
References: