Teensy 4.1 - W5500 using SPI DMA

Hi everyone,

I am working on a project using the Teensy 4.1 where I need to implement communication across two Ethernet interfaces.

The Teensy's native Ethernet interface is working perfectly; I have tested it with iperf and it consistently reaches ~90 Mbps. However, for the second interface, I have attached a W5500 chip via SPI, and I am struggling to get the throughput up. Currently, the W5500 is capping out at roughly 16 Mbps.

My Setup & What I've Tried:

  • Library: I am using the standard Wiznet ioLibrary_Driver to initialize the chip.
  • Optimization: I have already increased the SPI frequency and I am using burst read/write operations (registered via reg_wizchip_spiburst_cbfunc).
  • Hardware: Teensy 4.1 connected to a W5500 module.
The Comparison:I came across this article where a Nucleo F412 achieved ~43 Mbps with the same W5500 chip: A Comparative Analysis of TCP iperf Server Performance

Given that the Teensy 4.1 is significantly more powerful than the Nucleo, I expected to at least match that speed.

My Questions:

  1. Is it realistically possible to reach that 40+ Mbps speed on the Teensy 4.1 with the W5500, or is there a hardware architecture difference I am overlooking?
  2. I recently discovered the TsyDMASPI library. Would moving to a DMA-based SPI implementation be the key to bridging the gap between 16 Mbps and 43 Mbps, or is the bottleneck likely elsewhere?
Any insights or experience with high-speed W5500 implementation on Teensy would be greatly appreciated.

Thanks!
 
Which Ethernet library are you using? Because you’ve done some iperf measurements, I’m guessing QNEthernet? (IPerfServer example.) Assuming that’s true, the reason may be due to how the W5500 driver is written. It uses raw frames and the lwIP stack handles everything. Other W5500 drivers generally use the chip itself for the IP stack. That might explain some of the speed difference.
 
Which Ethernet library are you using? Because you’ve done some iperf measurements, I’m guessing QNEthernet? (IPerfServer example.) Assuming that’s true, the reason may be due to how the W5500 driver is written. It uses raw frames and the lwIP stack handles everything. Other W5500 drivers generally use the chip itself for the IP stack. That might explain some of the speed difference.
Actually for the native ethernet chip interface I use FNET, but this is not the problem. I have tested multiple ethernet libraries for the native ethernet chip and the throughput remains the same 90 Mpbs. But the interesting fact is that I saw in the above mentioned Nucleo article that the authors achieved 43 Mpbs with the Nucleo board (which is undoubtedly not as performant as Teensy 4.1) with the same ioLibrary and the same W5500 chip.
I was curious if someone else managed to optimize the transfer/communication for W5500 chip using ioLibrary.
For now, burst read/write SPI operations and SPI frequence increase managed to increase the throughput only from 12 to 16 Mpbs.
 
Have you tried the regular Ethernet.h library?

From the tone of your messages so far, I'm guessing you may have immediately discounted it without trying, under the assumption Arduino Ethernet.h couldn't possibly be as good as the code Wiznet publishes. If that's the case, I'm writing this message to let you know I personally poured a lot of work into optimizing Teensy's support for Wiznet W5000 chips several years ago, before we had Teensy 4.1 with native ethernet. I optimized the low-level SPI code, but also I made mid-level optimizations that as far as I know nobody else attempted, like caching certain socket level state on the microcontroller side to eliminate redundant communication normally done by all the other Wiznet libraries. My optimization not only made SPI run faster, I made it able to perform less SPI communication for the same result in many common usage scenarios.

If using Arduino IDE, I'd recommend turning on Verbose Output in File > Preferences. Keep an eye on the messages Arduino IDE prints about which libraries it actually used. There are many copies of the normal Ethernet.h library. You want the one that comes with Teensy software. Look to make sure it's in the same location as the other Teensy libraries like SPI. If you have manually installed any other Ethernet.h library, it probably override the Teensy provided one.

To get faster than 14 MHz SPI, you will need to edit w5100.h at line 28. When editing this file, I'd recommend first adding a syntax error and click Verify, to make sure you really are editing the actual copy. Again, it's easy to get many similar-looking copies of this library, so this quick check can save a lot of frustration wondering why edits seem to have no effect.

It's been years since I made all those optimizations. I don't recall what the performance really was. But I remember testing on my LAN with that faster setting did give much better speed on W5200 and W5500, which is the reason I left it (comment out) in the code. If you give at try, I hope you'll show us the actual performance it's able to give on your LAN.

Of course with TCP on any network with latency, buffer sizes become important. You can only expect high throughput with small buffer Wiznet chips on a LAN where packet latency is very low. If you have a router or "layer 3" anything that adds packet latency between your devices, you probably won't be able to achieve (SPI) wire speed performance due to Wiznet's small buffers.
 
Last edited:
From the tone of your messages so far, I'm guessing you may have immediately discounted it without trying, under the assumption Arduino Ethernet.h couldn't possibly be as good as the code Wiznet publishes. If that's the case, I'm writing this message to let you know I personally poured a lot of work into optimizing Teensy's support for Wiznet W5000 chips several years ago, before we had Teensy 4.1 with native ethernet. I optimized the low-level SPI code, but also I made mid-level optimizations that as far as I know nobody else attempted, like caching certain socket level state on the microcontroller side to eliminate redundant communication normally done by all the other Wiznet libraries. My optimization not only made SPI run faster, I made it able to perform less SPI communication for the same result in many common usage scenarios.

Thank you for your reply. I have seen examples using ioLibrary for Wiznet chips and I have used it at first.
I will try to use the Teensy Ethernet library and check the throughput.

I may have misunderstood from your response...Is the code from Teensy Ethernet library optimized so that is uses DMA for SPI transfers?
I have seen in related posts that DMA integration may help increase the speed with a factor of 2x/3x.
 
No, DMA is not used.

I am curious where you found the suggestion that DMA would give a 2X to 3X performance improvement? Is it specific to Wiznet over SPI? Is it based on an actual implementation with measured performance on specific Wiznet chips, or only theoretical speculation? If it is only theory, I'm curious whether the person offering it explains why in detail related to how Wiznet's chips work.

I'm asking because, at least in my opinion, DMA is probably a poor fit with the Wiznet SPI interface. Bus master DMA is an excellent fit for traditional memory mapped ethernet ring buffer interfaces, which is exactly how the native ethernet port on Teensy 4.1 works. But that's not how Wiznet chips work.
 
No, DMA is not used.

I am curious where you found the suggestion that DMA would give a 2X to 3X performance improvement? Is it specific to Wiznet over SPI? Is it based on an actual implementation with measured performance on specific Wiznet chips, or only theoretical speculation? If it is only theory, I'm curious whether the person offering it explains why in detail related to how Wiznet's chips work.

I'm asking because, at least in my opinion, DMA is probably a poor fit with the Wiznet SPI interface. Bus master DMA is an excellent fit for traditional memory mapped ethernet ring buffer interfaces, which is exactly how the native ethernet port on Teensy 4.1 works. But that's not how Wiznet chips work.
Thanks for the insight. I definitely see your point regarding the W5500’s architecture; it’s certainly not a native memory-mapped ring buffer like the Teensy 4.1’s internal Ethernet interface.

The '2X to 3X' figure I mentioned was based on benchmarks like those found in this implementation using an STM32F4 (here the author also used Wiznet ioLibrary to configure the chip), where moving to SPI+DMA boosted throughput from 1.75 Mbps to 4.43 Mbps.

Even though that’s a different MCU, I assumed the Teensy 4.1’s high clock speed might make this less of an issue than on a Nucleo, but I’m curious: on the Teensy 4.1, is the SPI overhead already so low that DMA wouldn’t offer a meaningful reduction in 'dead time,' or is there a specific conflict with how the Teensy SPI library handles the W5500's variable-length data phases?
 
Thanks! Yes, it works quite well, but over the last year or two, something changed in the iperf (v2) protocol, so a few features that used to work when I wrote it have changed somehow. Also, the documentation I had to work with was the source code, which kind of sucked.

The unidirectional tests work, and that’s how I’ve been testing the QNEthernet library speeds. But the bidirectional one, for example, doesn’t work anymore (‘-r’ option).

It’s on my to-do list to fix.

In any case, I don’t expect any major changes in order to be used with the regular Ethernet library.
 
May I suggest trying to duplicate their results? That may provide insights on what they’re doing.
Well, for now I am not able to duplicate their result, as I can not replicate the hardware setup...
That's why I asked the topic because I wonder how did they manage to reach that throughput (I was concerned if it is real or not and if so, if there are any differences on the hardware behavior between the Teensy 4.1 and Nucleo board)
@istrateandrei26 I'm curious which code you used to handle the iperf protocol when you were testing?
Actually I set up a tcp server by opening a socket in setup() method and then I do polling inside loop() method. This is the same classic method I used when I tested with native ethernet chip.

The difference is the routine which initializes specific configuration for W5500 chip.
It is not something custom written by myself, it aligns with public examples :)
 
Actually I set up a tcp server by opening a socket in setup() method and then I do polling inside loop() method. This is the same classic method I used when I tested with native ethernet chip.

The difference is the routine which initializes specific configuration for W5500 chip.
It is not something custom written by myself, it aligns with public examples :)
Sorry, what I meant was what protocol did you use? Iperf has its own protocol that needs to be parsed. I was wondering where you got details on parsing the iperf stuff.
 
I started trying to port IPerfServer to regular Ethernet for Wiznet. Currently stuck on a struct as typedef issue after working around a number of QNEthernet features not in regular Ethernet.

Here's where it's at so far. Maybe someone better with finer points of C++ could know how to solve the compile errors with ConnectionState type?

Code:
// SPDX-FileCopyrightText: (c) 2022-2025 Shawn Silverman <shawn@pobox.com>
// SPDX-License-Identifier: AGPL-3.0-or-later

// IPerfServer implements an iPerf server for TCP traffic.
// Useful command: iperf -c <IP address> -i 1 -l 1460
// Other supported options:
// * -C (compatibility)
// * -r (tradeoff)
// * -d (dualtest)
//
// Specifying the -l (len) option with a value of 1460 appears to give
// better results when the server (the Teensy) is sending traffic back
// to the iperf client using the "tradeoff" option.
//
// With this command: `iperf -c <IP address> -i 1 -l 1460 -r`
// it appears that the QNEthernet stack can achieve about 94.9 Mbps in
// both directions. (Note: The `iperf3` command won't work.)
//
// Multiple connections are supported.
//
// This code was inspired by "lwiperf" by Simon Goldschmidt:
// https://git.savannah.nongnu.org/cgit/lwip.git/tree/src?h=STABLE-2_1_3_RELEASE
//
// Other references:
// * Dan Drown's iPerf experiments (June 25, 2020):
//   [Teensy 4.1 ethernet](https://blog.dan.drown.org/teensy-4-1-ethernet/)
//
// This file is part of the QNEthernet library.

// C++ includes
#include <algorithm>
#include <cstdint>
#include <cstring>
#include <iterator>
#include <utility>
#include <vector>

#include <Ethernet.h>

// -------------------------------------------------------------------
//  Configuration
// -------------------------------------------------------------------

constexpr uint32_t kDHCPTimeout = 15000;  // 15 seconds

constexpr uint16_t kServerPort = 5001;

// The settings are sent after every set of bytes of this size.
constexpr size_t kDefaultRepeatSize = 128 * 1024;  // 128 KiB

// -------------------------------------------------------------------
//  Types
// -------------------------------------------------------------------

enum Flags : uint32_t {
  kVersion1 = 0x80000000,
  kExtend   = 0x40000000,
  kRunNow   = 0x00000001,
};

uint32_t htonl(uint32_t n) { return __builtin_bswap32(n); }
uint32_t ntohl(uint32_t n) { return __builtin_bswap32(n); }
uint16_t htons(uint16_t n) { return __builtin_bswap16(n); }
uint16_t ntohs(uint16_t n) { return __builtin_bswap16(n); }

// v1 header.
struct SettingsV1 {
  uint32_t flags;
  uint32_t numThreads;
  uint32_t port;
  uint32_t bufLen;
  uint32_t winBand;
  int32_t amount;  // Non-negative: bytes
                   // Negative: time (in centiseconds)

  // Fixes the endianness.
  void reorder() {
    flags = ntohl(flags);
    numThreads = ntohl(numThreads);
    port = ntohl(port);
    bufLen = ntohl(bufLen);
    winBand = ntohl(winBand);
    amount = ntohl(amount);
  }
};

// "Extended" header.
struct ExtSettings {
  // Extended fields
  int32_t type;
  int32_t length;
  uint16_t upperFlags;
  uint16_t lowerFlags;
  uint32_t versionUpper;
  uint32_t versionLower;
  uint16_t reserved;
  uint16_t tos;
  uint32_t rateLower;
  uint32_t rateUpper;
  uint32_t tcpWritePrefetch;

  // Fixes the endianness.
  void reorder() {
    type = ntohl(type);
    length = ntohl(length);
    upperFlags = ntohs(upperFlags);
    lowerFlags = ntohs(lowerFlags);
    versionUpper = ntohl(versionUpper);
    versionLower = ntohl(versionLower);
    reserved = ntohs(reserved);
    tos = ntohs(tos);
    rateLower = ntohl(rateLower);
    rateUpper = ntohl(rateUpper);
    tcpWritePrefetch = ntohl(tcpWritePrefetch);
  }
};

// This is the iPerf settings struct sent from the client.
struct Settings {
  SettingsV1 settingsV1;
  ExtSettings extSettings;
};

enum class IOStates {
  kReadSettingsV1,     // First settings
  kReadExtSettings,    // First settings
  kReadBlockSettings,  // Settings in front of a block
  kRead,
  kWrite,  // Clients use this state
};

// Keeps track of state for a single connection.
class ConnectionState {
  public:
  ConnectionState(EthernetClient client, bool isClient)
      : remoteIP{client.remoteIP()},
        remotePort(client.remotePort()),
        client{std::move(client)},
        ioState(isClient ? IOStates::kWrite : IOStates::kReadSettingsV1) {}

  // Put these before the moved client
  IPAddress remoteIP;
  uint16_t remotePort;

  EthernetClient client;
  bool closed = false;

  IOStates ioState;

  Settings settings;
  uint8_t settingsRaw[sizeof(Settings)];  // For raw comparisons
                                          // without having to
                                          // consider ordering

  size_t settingsSize = 0;
  size_t repeatSize = kDefaultRepeatSize;
  size_t byteCount = 0;
  uint32_t startTime = millis();
};
//typedef struct ConnectionState ConnectionState;

// -------------------------------------------------------------------
//  Main Program
// -------------------------------------------------------------------

// Digits buffer.
#define TCP_SND_BUF 2048
uint8_t kDigitsBuf[TCP_SND_BUF + 10];

// Keeps track of what and where belong to whom.
std::vector<ConnectionState> conns;

// The server.
EthernetServer server{kServerPort};

// Other buffers
uint8_t settingsBuf[sizeof(SettingsV1) + sizeof(ExtSettings)];

// Forward declarations
void networkChanged(bool hasIP, bool linkState);
bool connectToClient(ConnectionState& state,
                     std::vector<ConnectionState>& list);
void processConnection(ConnectionState& state,
                       std::vector<ConnectionState>& list);

// Main program setup.
void setup() {
  Serial.begin(115200);
  while (!Serial && (millis() < 4000)) {
    // Wait for Serial
  }
  printf("Starting IPerfServer...\r\n");

  //uint8_t mac[6];
  //Ethernet.macAddress(mac);  // This is informative; it retrieves, not sets
  //printf("MAC = %02x:%02x:%02x:%02x:%02x:%02x\r\n",
  //       mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

  // Listen for link changes
  /*Ethernet.onLinkState([](bool state) {
    if (state) {
      printf("[Ethernet] Link ON, %d Mbps, %s duplex\r\n",
             Ethernet.linkSpeed(),
             Ethernet.linkIsFullDuplex() ? "Full" : "Half");
    } else {
      printf("[Ethernet] Link OFF\r\n");
    }
    networkChanged(Ethernet.localIP() != INADDR_NONE, state);
  });*/

  // Listen for address changes
  /*Ethernet.onAddressChanged([]() {
    IPAddress ip = Ethernet.localIP();
    bool hasIP = (ip != INADDR_NONE);
    if (hasIP) {
      IPAddress subnet = Ethernet.subnetMask();
      IPAddress gw = Ethernet.gatewayIP();
      IPAddress dns = Ethernet.dnsServerIP();

      printf("[Ethernet] Address changed:\r\n"
             "    Local IP = %u.%u.%u.%u\r\n"
             "    Subnet   = %u.%u.%u.%u\r\n"
             "    Gateway  = %u.%u.%u.%u\r\n"
             "    DNS      = %u.%u.%u.%u\r\n",
             ip[0], ip[1], ip[2], ip[3],
             subnet[0], subnet[1], subnet[2], subnet[3],
             gw[0], gw[1], gw[2], gw[3],
             dns[0], dns[1], dns[2], dns[3]);
    } else {
      printf("[Ethernet] Address changed: No IP address\r\n");
    }

    // Tell interested parties the network state, for example, servers,
    // SNTP clients, and other sub-programs that need to know whethe
    // to stop/start/restart/etc
    networkChanged(hasIP, Ethernet.linkState());
  });*/

  printf("Starting Ethernet with DHCP...\r\n");
  byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
  if (!Ethernet.begin(mac)) {
    printf("Failed to start Ethernet\r\n");
    return;
  }

  // We don't really need to do the following because the
  // address-changed listener will notify us
  // printf("Waiting for local IP...\r\n");
  // if (!Ethernet.waitForLocalIP(kDHCPTimeout)) {
  //   printf("Failed to get IP address from DHCP\r\n");
  //   return;
  // }

  // Initialize the digits buffer
  for (size_t i = 0; i < sizeof(kDigitsBuf); ++i) {
    kDigitsBuf[i] = (i % 10) + '0';
  }
}

// The address or link has changed. For example, a DHCP address arrived.
void networkChanged(bool hasIP, bool linkState) {
  if (!hasIP || !linkState) {
    return;
  }

  // Start the server and keep it up
  if (!server) {
    printf("Starting server on port %u...", kServerPort);
    fflush(stdout);  // Print what we have so far if line buffered
    server.begin();
    printf("%s\r\n", server ? "done." : "FAILED!");
  }
}


static inline bool isExtended(const ConnectionState& s) {
  return (s.settingsSize > 0) &&
         (((s.settings.settingsV1.flags &
            static_cast<uint32_t>(Flags::kExtend)) != 0));
}

static inline bool isV1(const ConnectionState& s) {
  return (s.settingsSize > 0) &&
         (((s.settings.settingsV1.flags &
            static_cast<uint32_t>(Flags::kVersion1)) != 0));
}

static inline bool isRunNow(const ConnectionState& s) {
  return (s.settingsSize > 0) &&
         (((s.settings.settingsV1.flags &
            static_cast<uint32_t>(Flags::kRunNow)) != 0));
}

static inline bool isClient(const ConnectionState& s) {
  return s.ioState == IOStates::kWrite;
}


// Main program loop.
void loop() {
  EthernetClient client = server.accept();
  if (client) {
    // We got a connection!
    IPAddress ip = client.remoteIP();
    uint16_t port = client.remotePort();
    conns.emplace_back(std::move(client), false);
    printf("Connected: %u.%u.%u.%u:%u\r\n", ip[0], ip[1], ip[2], ip[3], port);
    printf("Connection count: %u\r\n", conns.size());
  }

  std::vector<ConnectionState> list;  // Add new connections to here

  // Process data from each client
  for (ConnectionState& state : conns) {  // Use a reference so we don't copy
    if (!state.client.connected()) {
      printf("Disconnected: %u.%u.%u.%u:%u\r\n",
             state.remoteIP[0],
             state.remoteIP[1],
             state.remoteIP[2],
             state.remoteIP[3],
             state.remotePort);
      // First check to see if we need to open a connection back to
      // the client
      if (isV1(state) && !isRunNow(state)) {
        if (!isClient(state)) {
          connectToClient(state, list);
        }
      }
      state.closed = true;
      continue;
    }

    processConnection(state, list);
  }

  if (!list.empty()) {
    conns.insert(conns.end(),
                 std::make_move_iterator(list.begin()),
                 std::make_move_iterator(list.end()));
    list.clear();
  }

  // Clean up all the closed clients
  size_t size = conns.size();
  conns.erase(
      std::remove_if(conns.begin(), conns.end(),
                     [](const ConnectionState& state) { return state.closed; }),
      conns.end());
  if (conns.size() != size) {
    printf("Connection count: %zu\r\n", conns.size());
  }
}

// Connects back to the client and returns whether the connection was
// successful. This adds any new connection to the given list.
bool connectToClient(ConnectionState& state,
                     std::vector<ConnectionState>& list) {
  printf("Connecting back to client: %u.%u.%u.%u:%" PRIu32 "...",
         state.remoteIP[0],
         state.remoteIP[1],
         state.remoteIP[2],
         state.remoteIP[3],
         state.settings.settingsV1.port);

  EthernetClient client;
  if (!client.connect(state.remoteIP, state.settings.settingsV1.port)) {
    printf("FAILED.\r\n");
    return false;
  }
  printf("done.\r\n");

  list.emplace_back(std::move(client), true);
  ConnectionState& newState = list[list.size() - 1];
  newState.settings = state.settings;
  std::memcpy(newState.settingsRaw, state.settingsRaw, state.settingsSize);
  newState.settingsSize = state.settingsSize;
  newState.repeatSize = state.repeatSize;

  return true;
}

// Sends data until it can't fill the buffer.
void send(ConnectionState& state) {
  while (true) {
    if (state.settings.settingsV1.amount < 0) {
      // The session is time-limited
      uint32_t diff = millis() - state.startTime;
      uint32_t time = -state.settings.settingsV1.amount * 10;
          // Convert to milliseconds (from centiseconds)
      if (diff >= time) {
        printf("Closing client (time): %u.%u.%u.%u:%u\r\n",
               state.remoteIP[0],
               state.remoteIP[1],
               state.remoteIP[2],
               state.remoteIP[3],
               state.remotePort);
        state.client.close();
        return;
      }
    } else {
      // The session is byte-limited
      if (state.byteCount >=
          static_cast<size_t>(state.settings.settingsV1.amount)) {
        printf("Closing client (bytes): %u.%u.%u.%u:%u\r\n",
               state.remoteIP[0],
               state.remoteIP[1],
               state.remoteIP[2],
               state.remoteIP[3],
               state.remotePort);
        state.client.close();
        return;
      }
    }

    const uint8_t* buf;
    size_t len;

    int avail = state.client.availableForWrite();
    if (avail <= 0) {
      return;
    }
    size_t already;
    size_t size = std::min(state.settingsSize, state.repeatSize);
    if (state.byteCount < state.settingsSize) {
      already = state.byteCount;
    } else {
      already = (state.byteCount - state.settingsSize)%state.repeatSize;
    }
    if (already < size) {
      buf = &state.settingsRaw[already];
      len = size - already;
    } else {
      buf = &kDigitsBuf[already % 10];
      len = state.repeatSize - already;
    }
    len = std::min(len, static_cast<size_t>(avail));

    state.client.write(buf, len);
    state.byteCount += len;
  }
}

// Compares a signed available value with an unsigned size. This returns -1 if
// avail < size, zero if avail == size, or 1 if avail > size.
static int compareAvail(int avail, size_t size) {
  size_t a = static_cast<size_t>(avail);
  if ((avail < 0) || (a < size)) {
    return -1;
  }
  if (a == size) {
    return 0;
  }
  return 1;
}

// Processes data from a single connection. This adds any new
// connections to the given list.
void processConnection(ConnectionState& state,
                       std::vector<ConnectionState>& list) {
  while (true) {
    switch (state.ioState) {
      case IOStates::kReadSettingsV1: {
        if (compareAvail(state.client.available(), sizeof(SettingsV1)) < 0) {
          return;
        }

        // Read a SettingsV1
        SettingsV1 s;
        state.client.read(state.settingsRaw, sizeof(SettingsV1));
        state.byteCount += sizeof(SettingsV1);
        std::memcpy(&s, state.settingsRaw, sizeof(SettingsV1));
        s.reorder();

        // Set up the state
        if (s.flags == 0x30313233) {
          state.settingsSize = 0;
          state.ioState = IOStates::kRead;
          printf("%u.%u.%u.%u:%u: Older version\r\n",
                 state.remoteIP[0],
                 state.remoteIP[1],
                 state.remoteIP[2],
                 state.remoteIP[3],
                 state.remotePort);
        } else {
          state.settings.settingsV1 = s;
          state.settingsSize = sizeof(SettingsV1);
          if (isExtended(state)) {
            state.settingsSize += sizeof(ExtSettings);
            state.ioState = IOStates::kReadExtSettings;
          } else {
            state.ioState = IOStates::kReadBlockSettings;
          }
          state.repeatSize = state.settings.settingsV1.bufLen;
          if (state.repeatSize == 0) {
            state.repeatSize = kDefaultRepeatSize;
          }
          if (isV1(state) && isRunNow(state)) {
            connectToClient(state, list);
          }

          printf("%u.%u.%u.%u:%u: Settings:\r\n"
                "    flags=0x%08" PRIx32 "\r\n"
                "    numThreads=%" PRIu32 "\r\n"
                "    port=%" PRIu32 "\r\n"
                "    bufLen=%" PRIu32 "\r\n"
                "    winBand=%" PRIu32 "\r\n"
                "    amount=%" PRId32 "\r\n",
                state.remoteIP[0],
                state.remoteIP[1],
                state.remoteIP[2],
                state.remoteIP[3],
                state.remotePort,
                s.flags, s.numThreads, s.port, s.bufLen, s.winBand, s.amount);
        }

        break;
      }

      case IOStates::kReadExtSettings: {
        if (compareAvail(state.client.available(), sizeof(ExtSettings)) < 0) {
          return;
        }

        // Read an ExtSettings
        ExtSettings s;
        state.client.read(state.settingsRaw + sizeof(SettingsV1),
                          sizeof(ExtSettings));
        state.byteCount += sizeof(ExtSettings);
        std::memcpy(&s, state.settingsRaw + sizeof(SettingsV1),
                    sizeof(ExtSettings));
        s.reorder();

        // Do more setup
        state.settings.extSettings = s;

        printf("%u.%u.%u.%u:%u: ExtSettings:\r\n"
              "    type=%" PRId32 "\r\n"
              "    length=%" PRId32 "\r\n"
              "    flags=0x%04" PRIu16 "%04" PRIu16 "\r\n"
              "    version=%u.%u.%u.%u\r\n"
              "    rate=%" PRIu64 "\r\n"
              "    tcpWritePrefetch=%" PRIu32 "\r\n",
              state.remoteIP[0],
              state.remoteIP[1],
              state.remoteIP[2],
              state.remoteIP[3],
              state.remotePort,
              s.type, s.length, s.upperFlags, s.lowerFlags,
              static_cast<uint16_t>(s.versionUpper >> 16),
              static_cast<uint16_t>(s.versionUpper),
              static_cast<uint16_t>(s.versionLower >> 16),
              static_cast<uint16_t>(s.versionLower),
              (static_cast<uint64_t>(s.rateUpper >> 8) << 32) |
                  static_cast<uint64_t>(s.rateLower),
              s.tcpWritePrefetch);

        state.ioState = IOStates::kReadBlockSettings;

        break;
      }

      case IOStates::kReadBlockSettings: {
        size_t size = std::min(state.settingsSize, state.repeatSize);
        if (compareAvail(state.client.available(), size) < 0) {
          return;
        }

        // Read settings
        state.client.read(settingsBuf, size);
        state.byteCount += size;

        // The iperf code is hard to understand, so I'm unclear how to
        // handle settings mismatch; comment the following out for now:
        // // Compare with the existing settings
        // if (std::memcmp(settingsBuf, state.settingsRaw, size) != 0) {
        //   printf("%u.%u.%u.%u:%u: Settings error: bytes=%zu\r\n",
        //          state.remoteIP[0],
        //          state.remoteIP[1],
        //          state.remoteIP[2],
        //          state.remoteIP[3],
        //          state.remotePort,
        //          state.byteCount);
        //   state.client.close();
        //   return;
        // }
        // What we see instead: 4-byte zero flags followed by ASCII digits

        if (size != state.repeatSize) {  // Stay here otherwise
          state.ioState = IOStates::kRead;
        }
        break;
      }

      case IOStates::kRead: {
        // Assume: byteCount >= settingsSize

        int avail = state.client.available();
        if (avail <= 0) {
          return;
        }
        if (state.settingsSize > 0) {
          size_t rem = state.repeatSize -
                       (state.byteCount - state.settingsSize)%state.repeatSize;
          if (rem == state.repeatSize) {
            // Edge case where settingsSize >= repeatSize
            state.ioState = IOStates::kReadBlockSettings;
            break;
          }
          if (static_cast<size_t>(avail) >= rem) {
            avail = rem;
            state.ioState = IOStates::kReadBlockSettings;
          }
        }

        state.client.read(nullptr, avail);  // Skip
        state.byteCount += avail;
        break;
      }

      case IOStates::kWrite:
        send(state);
        return;
    }
  }
}
 
I got it compiling. There were two issues:
1. ntohl(), et al, may already be defined somewhere, so I wrapped those in #ifndef/#endif pairs.
2. I don't quite understand why yet, probably something to do with sometimes-weirdness of Arduino.h internal #defines or includes or reordering or forward declarations affecting things down the pipe (eg. when you include <random>, you need to #undef some variables like "#undef round" and "#undef abs" before a successful build can happen), but I changed the static utility functions to be inside an unnamed namespace, and then that seemed to fix it, perhaps because it adds another block? With much sarcasm: Perhaps Arduino.h redefines 'static' somewhere or does something weird with it. (I wouldn't be surprised.) (Or maybe it’s because of how Arduino automatically adds headers and/or forward declarations or reorders things… maybe?) Note: Unnamed namespaces provide internal linkage, much like static; it's an alternative form.

Here's code that compiles but is not tested:
C++:
// SPDX-FileCopyrightText: (c) 2022-2026 Shawn Silverman <shawn@pobox.com>
// SPDX-License-Identifier: AGPL-3.0-or-later

// IPerfServerEthernet implements an iPerf server for TCP traffic using the
// standard Ethernet library.
//
// Useful command: iperf -c <IP address> -i 1 -l 1460
// Other supported options:
// * -C (compatibility)
// * -r (tradeoff)
// * -d (dualtest)
//
// Specifying the -l (len) option with a value of 1460 appears to give
// better results when the server (the Teensy) is sending traffic back
// to the iperf client using the "tradeoff" option.
//
// With this command: `iperf -c <IP address> -i 1 -l 1460 -r`
// it appears that the QNEthernet stack can achieve about 94.9 Mbps in
// both directions. (Note: The `iperf3` command won't work.)
//
// Multiple connections are supported.
//
// This code was inspired by "lwiperf" by Simon Goldschmidt:
// https://git.savannah.nongnu.org/cgit/lwip.git/tree/src?h=STABLE-2_1_3_RELEASE
//
// Other references:
// * Dan Drown's iPerf experiments (June 25, 2020):
//   [Teensy 4.1 ethernet](https://blog.dan.drown.org/teensy-4-1-ethernet/)
//
// This file is part of the QNEthernet library.

// C++ includes
#include <algorithm>
#include <cstdint>
#include <cstring>
#include <iterator>
#include <utility>
#include <vector>

#include <Ethernet.h>

// -------------------------------------------------------------------
//  Configuration
// -------------------------------------------------------------------

constexpr uint32_t kDHCPTimeout = 15000;  // 15 seconds

constexpr uint16_t kServerPort = 5001;

// The settings are sent after every set of bytes of this size.
constexpr size_t kDefaultRepeatSize = 128 * 1024;  // 128 KiB

// -------------------------------------------------------------------
//  Utilities
// -------------------------------------------------------------------

namespace {

#ifndef ntohl
// Possibly reverses 4 bytes.
uint32_t ntohl(const uint32_t v) {
  if constexpr (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) {
    return __builtin_bswap32(v);
  } else {
    return v;
  }
}
#endif  // !ntohl

#ifndef ntohs
// Possibly reverses 2 bytes.
uint16_t ntohs(const uint16_t v) {
  if constexpr (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) {
    return __builtin_bswap16(v);
  } else {
    return v;
  }
}
#endif  // !ntohs

// Gets the Teensy 4's MAC address and fills in the given array.
void getMACAddress(uint8_t* const mac) {
  const uint32_t m1 = HW_OCOTP_MAC1;
  const uint32_t m2 = HW_OCOTP_MAC0;
  mac[0] = static_cast<uint8_t>(m1 >>  8);
  mac[1] = static_cast<uint8_t>(m1 >>  0);
  mac[2] = static_cast<uint8_t>(m2 >> 24);
  mac[3] = static_cast<uint8_t>(m2 >> 16);
  mac[4] = static_cast<uint8_t>(m2 >>  8);
  mac[5] = static_cast<uint8_t>(m2 >>  0);
}

}  // namespace

// -------------------------------------------------------------------
//  Types
// -------------------------------------------------------------------

enum Flags : uint32_t {
  kVersion1 = 0x80000000,
  kExtend   = 0x40000000,
  kRunNow   = 0x00000001,
};

// v1 header.
struct [[gnu::packed]] SettingsV1 {
  uint32_t flags;
  uint32_t numThreads;
  uint32_t port;
  uint32_t bufLen;
  uint32_t winBand;
  int32_t amount;  // Non-negative: bytes
                   // Negative: time (in centiseconds)

  // Fixes the endianness.
  void reorder() {
    flags = ntohl(flags);
    numThreads = ntohl(numThreads);
    port = ntohl(port);
    bufLen = ntohl(bufLen);
    winBand = ntohl(winBand);
    amount = ntohl(amount);
  }
};

// "Extended" header.
struct [[gnu::packed]] ExtSettings {
  // Extended fields
  int32_t type;
  int32_t length;
  uint16_t upperFlags;
  uint16_t lowerFlags;
  uint32_t versionUpper;
  uint32_t versionLower;
  uint16_t reserved;
  uint16_t tos;
  uint32_t rateLower;
  uint32_t rateUpper;
  uint32_t tcpWritePrefetch;

  // Fixes the endianness.
  void reorder() {
    type = ntohl(type);
    length = ntohl(length);
    upperFlags = ntohs(upperFlags);
    lowerFlags = ntohs(lowerFlags);
    versionUpper = ntohl(versionUpper);
    versionLower = ntohl(versionLower);
    reserved = ntohs(reserved);
    tos = ntohs(tos);
    rateLower = ntohl(rateLower);
    rateUpper = ntohl(rateUpper);
    tcpWritePrefetch = ntohl(tcpWritePrefetch);
  }
};

// This is the iPerf settings struct sent from the client.
struct [[gnu::packed]] Settings {
  SettingsV1 settingsV1;
  ExtSettings extSettings;
};

enum class IOStates {
  kReadSettingsV1,     // First settings
  kReadExtSettings,    // First settings
  kReadBlockSettings,  // Settings in front of a block
  kRead,
  kWrite,  // Clients use this state
};

// Keeps track of state for a single connection.
struct ConnectionState {
  ConnectionState(EthernetClient client, bool isClient)
      : remoteIP{client.remoteIP()},
        remotePort(client.remotePort()),
        client{client},
        ioState(isClient ? IOStates::kWrite : IOStates::kReadSettingsV1) {}

  // Put these before the moved client
  IPAddress remoteIP;
  uint16_t remotePort;

  EthernetClient client;
  bool closed = false;

  IOStates ioState;

  Settings settings;
  uint8_t settingsRaw[sizeof(Settings)];  // For raw comparisons
                                          // without having to
                                          // consider ordering

  size_t settingsSize = 0;
  size_t repeatSize = kDefaultRepeatSize;
  size_t byteCount = 0;
  uint32_t startTime = millis();
};

// -------------------------------------------------------------------
//  Main Program
// -------------------------------------------------------------------

// Digits buffer.
// Note: TCP sender buffer size is chosen as 4*MSS, where MSS is the
//       maximum segment size, and the MTU is assumed to be 1500
uint8_t kDigitsBuf[4*(1500 - 40) + 10];

// Keeps track of what and where belong to whom.
std::vector<ConnectionState> conns;

// The server.
EthernetServer server{kServerPort};

// Other buffers
uint8_t settingsBuf[sizeof(SettingsV1) + sizeof(ExtSettings)];

// Forward declarations
bool connectToClient(ConnectionState& state,
                     std::vector<ConnectionState>& list);
void processConnection(ConnectionState& state,
                       std::vector<ConnectionState>& list);

// Main program setup.
void setup() {
  Serial.begin(115200);
  while (!Serial && (millis() < 4000)) {
    // Wait for Serial
  }
  printf("Starting IPerfServer...\r\n");

  uint8_t mac[6];
  getMACAddress(mac);
  printf("MAC = %02x:%02x:%02x:%02x:%02x:%02x\r\n",
         mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

  printf("Starting Ethernet with DHCP...\r\n");
  if (!Ethernet.begin(mac)) {
    printf("Failed to start Ethernet\r\n");
    return;
  }

  printf("Waiting for local IP...\r\n");
  elapsedMillis timer;
  while ((Ethernet.localIP() == INADDR_NONE) && (timer < kDHCPTimeout)) {
    // Wait
  }
  if (Ethernet.localIP() == INADDR_NONE) {
    printf("Failed to get IP address from DHCP\r\n");
    return;
  }

  // Initialize the digits buffer
  for (size_t i = 0; i < sizeof(kDigitsBuf); ++i) {
    kDigitsBuf[i] = (i % 10) + '0';
  }

  // Start the server and keep it up
  printf("Starting server on port %u...", kServerPort);
  fflush(stdout);  // Print what we have so far if line buffered
  server.begin();
  printf("%s\r\n", server ? "done." : "FAILED!");
}

namespace {

bool isExtended(const ConnectionState& s) {
  return (s.settingsSize > 0) &&
         (((s.settings.settingsV1.flags &
            static_cast<uint32_t>(Flags::kExtend)) != 0));
}

bool isV1(const ConnectionState& s) {
  return (s.settingsSize > 0) &&
         (((s.settings.settingsV1.flags &
            static_cast<uint32_t>(Flags::kVersion1)) != 0));
}

bool isRunNow(const ConnectionState& s) {
  return (s.settingsSize > 0) &&
         (((s.settings.settingsV1.flags &
            static_cast<uint32_t>(Flags::kRunNow)) != 0));
}

bool isClient(const ConnectionState& s) {
  return s.ioState == IOStates::kWrite;
}

}  // namespace

// Main program loop.
void loop() {
  EthernetClient client = server.accept();
  if (client) {
    // We got a connection!
    IPAddress ip = client.remoteIP();
    uint16_t port = client.remotePort();
    conns.emplace_back(std::move(client), false);
    printf("Connected: %u.%u.%u.%u:%u\r\n", ip[0], ip[1], ip[2], ip[3], port);
    printf("Connection count: %u\r\n", conns.size());
  }

  std::vector<ConnectionState> list;  // Add new connections to here

  // Process data from each client
  for (ConnectionState& state : conns) {  // Use a reference so we don't copy
    if (!state.client.connected()) {
      printf("Disconnected: %u.%u.%u.%u:%u\r\n",
             state.remoteIP[0],
             state.remoteIP[1],
             state.remoteIP[2],
             state.remoteIP[3],
             state.remotePort);
      // First check to see if we need to open a connection back to
      // the client
      if (isV1(state) && !isRunNow(state)) {
        if (!isClient(state)) {
          connectToClient(state, list);
        }
      }
      state.closed = true;
      continue;
    }

    processConnection(state, list);
  }

  if (!list.empty()) {
    conns.insert(conns.cend(),
                 std::make_move_iterator(list.begin()),
                 std::make_move_iterator(list.end()));
    list.clear();
  }

  // Clean up all the closed clients
  size_t size = conns.size();
  conns.erase(
      std::remove_if(conns.begin(), conns.end(),
                     [](const ConnectionState& state) { return state.closed; }),
      conns.cend());
  if (conns.size() != size) {
    printf("Connection count: %zu\r\n", conns.size());
  }
}

// Connects back to the client and returns whether the connection was
// successful. This adds any new connection to the given list.
bool connectToClient(ConnectionState& state,
                     std::vector<ConnectionState>& list) {
  printf("Connecting back to client: %u.%u.%u.%u:%" PRIu32 "...",
         state.remoteIP[0],
         state.remoteIP[1],
         state.remoteIP[2],
         state.remoteIP[3],
         state.settings.settingsV1.port);

  EthernetClient client;
  if (!client.connect(state.remoteIP, state.settings.settingsV1.port)) {
    printf("FAILED.\r\n");
    return false;
  }
  printf("done.\r\n");

  list.emplace_back(std::move(client), true);
  ConnectionState& newState = list[list.size() - 1];
  newState.settings = state.settings;
  std::memcpy(newState.settingsRaw, state.settingsRaw, state.settingsSize);
  newState.settingsSize = state.settingsSize;
  newState.repeatSize = state.repeatSize;

  return true;
}

// Sends data until it can't fill the buffer.
void send(ConnectionState& state) {
  while (true) {
    if (state.settings.settingsV1.amount < 0) {
      // The session is time-limited
      uint32_t diff = millis() - state.startTime;
      uint32_t time = -state.settings.settingsV1.amount * 10;
          // Convert to milliseconds (from centiseconds)
      if (diff >= time) {
        printf("Closing client (time): %u.%u.%u.%u:%u\r\n",
               state.remoteIP[0],
               state.remoteIP[1],
               state.remoteIP[2],
               state.remoteIP[3],
               state.remotePort);
        state.client.stop();
        return;
      }
    } else {
      // The session is byte-limited
      if (state.byteCount >=
          static_cast<size_t>(state.settings.settingsV1.amount)) {
        printf("Closing client (bytes): %u.%u.%u.%u:%u\r\n",
               state.remoteIP[0],
               state.remoteIP[1],
               state.remoteIP[2],
               state.remoteIP[3],
               state.remotePort);
        state.client.stop();
        return;
      }
    }

    const uint8_t* buf;
    size_t len;

    int avail = state.client.availableForWrite();
    if (avail <= 0) {
      return;
    }
    size_t already;
    size_t size = std::min(state.settingsSize, state.repeatSize);
    if (state.byteCount < state.settingsSize) {
      already = state.byteCount;
    } else {
      already = (state.byteCount - state.settingsSize)%state.repeatSize;
    }
    if (already < size) {
      buf = &state.settingsRaw[already];
      len = size - already;
    } else {
      buf = &kDigitsBuf[already % 10];
      len = state.repeatSize - already;
    }
    len = std::min(len, static_cast<size_t>(avail));

    state.client.write(buf, len);
    state.byteCount += len;
  }
}

// Compares a signed available value with an unsigned size. This returns -1 if
// avail < size, zero if avail == size, or 1 if avail > size.
static int compareAvail(int avail, size_t size) {
  size_t a = static_cast<size_t>(avail);
  if ((avail < 0) || (a < size)) {
    return -1;
  }
  if (a == size) {
    return 0;
  }
  return 1;
}

// Processes data from a single connection. This adds any new
// connections to the given list.
void processConnection(ConnectionState& state,
                       std::vector<ConnectionState>& list) {
  while (true) {
    switch (state.ioState) {
      case IOStates::kReadSettingsV1: {
        if (compareAvail(state.client.available(), sizeof(SettingsV1)) < 0) {
          return;
        }

        // Read a SettingsV1
        SettingsV1 s;
        state.client.read(state.settingsRaw, sizeof(SettingsV1));
        state.byteCount += sizeof(SettingsV1);
        std::memcpy(&s, state.settingsRaw, sizeof(SettingsV1));
        s.reorder();

        // Set up the state
        if (s.flags == 0x30313233) {
          state.settingsSize = 0;
          state.ioState = IOStates::kRead;
          printf("%u.%u.%u.%u:%u: Older version\r\n",
                 state.remoteIP[0],
                 state.remoteIP[1],
                 state.remoteIP[2],
                 state.remoteIP[3],
                 state.remotePort);
        } else {
          state.settings.settingsV1 = s;
          state.settingsSize = sizeof(SettingsV1);
          if (isExtended(state)) {
            state.settingsSize += sizeof(ExtSettings);
            state.ioState = IOStates::kReadExtSettings;
          } else {
            state.ioState = IOStates::kReadBlockSettings;
          }
          state.repeatSize = state.settings.settingsV1.bufLen;
          if (state.repeatSize == 0) {
            state.repeatSize = kDefaultRepeatSize;
          }
          if (isV1(state) && isRunNow(state)) {
            connectToClient(state, list);
          }

          printf("%u.%u.%u.%u:%u: Settings:\r\n"
                "    flags=0x%08" PRIx32 "\r\n"
                "    numThreads=%" PRIu32 "\r\n"
                "    port=%" PRIu32 "\r\n"
                "    bufLen=%" PRIu32 "\r\n"
                "    winBand=%" PRIu32 "\r\n"
                "    amount=%" PRId32 "\r\n",
                state.remoteIP[0],
                state.remoteIP[1],
                state.remoteIP[2],
                state.remoteIP[3],
                state.remotePort,
                s.flags, s.numThreads, s.port, s.bufLen, s.winBand, s.amount);
        }

        break;
      }

      case IOStates::kReadExtSettings: {
        if (compareAvail(state.client.available(), sizeof(ExtSettings)) < 0) {
          return;
        }

        // Read an ExtSettings
        ExtSettings s;
        state.client.read(state.settingsRaw + sizeof(SettingsV1),
                          sizeof(ExtSettings));
        state.byteCount += sizeof(ExtSettings);
        std::memcpy(&s, state.settingsRaw + sizeof(SettingsV1),
                    sizeof(ExtSettings));
        s.reorder();

        // Do more setup
        state.settings.extSettings = s;

        printf("%u.%u.%u.%u:%u: ExtSettings:\r\n"
              "    type=%" PRId32 "\r\n"
              "    length=%" PRId32 "\r\n"
              "    flags=0x%04" PRIu16 "%04" PRIu16 "\r\n"
              "    version=%u.%u.%u.%u\r\n"
              "    rate=%" PRIu64 "\r\n"
              "    tcpWritePrefetch=%" PRIu32 "\r\n",
              state.remoteIP[0],
              state.remoteIP[1],
              state.remoteIP[2],
              state.remoteIP[3],
              state.remotePort,
              s.type, s.length, s.upperFlags, s.lowerFlags,
              static_cast<uint16_t>(s.versionUpper >> 16),
              static_cast<uint16_t>(s.versionUpper),
              static_cast<uint16_t>(s.versionLower >> 16),
              static_cast<uint16_t>(s.versionLower),
              (static_cast<uint64_t>(s.rateUpper >> 8) << 32) |
                  static_cast<uint64_t>(s.rateLower),
              s.tcpWritePrefetch);

        state.ioState = IOStates::kReadBlockSettings;

        break;
      }

      case IOStates::kReadBlockSettings: {
        size_t size = std::min(state.settingsSize, state.repeatSize);
        if (compareAvail(state.client.available(), size) < 0) {
          return;
        }

        // Read settings
        state.client.read(settingsBuf, size);
        state.byteCount += size;

        // The iperf code is hard to understand, so I'm unclear how to
        // handle settings mismatch; comment the following out for now:
        // // Compare with the existing settings
        // if (std::memcmp(settingsBuf, state.settingsRaw, size) != 0) {
        //   printf("%u.%u.%u.%u:%u: Settings error: bytes=%zu\r\n",
        //          state.remoteIP[0],
        //          state.remoteIP[1],
        //          state.remoteIP[2],
        //          state.remoteIP[3],
        //          state.remotePort,
        //          state.byteCount);
        //   state.client.stop();
        //   return;
        // }
        // What we see instead: 4-byte zero flags followed by ASCII digits

        if (size != state.repeatSize) {  // Stay here otherwise
          state.ioState = IOStates::kRead;
        }
        break;
      }

      case IOStates::kRead: {
        // Assume: byteCount >= settingsSize

        int avail = state.client.available();
        if (avail <= 0) {
          return;
        }
        if (state.settingsSize > 0) {
          size_t rem = state.repeatSize -
                       (state.byteCount - state.settingsSize)%state.repeatSize;
          if (rem == state.repeatSize) {
            // Edge case where settingsSize >= repeatSize
            state.ioState = IOStates::kReadBlockSettings;
            break;
          }
          if (static_cast<size_t>(avail) >= rem) {
            avail = rem;
            state.ioState = IOStates::kReadBlockSettings;
          }
        }

        state.client.read(nullptr, avail);  // Skip
        state.byteCount += avail;
        break;
      }

      case IOStates::kWrite:
        send(state);
        return;
    }
  }
}
 
Last edited:
Just another reminder that something changed with the iperf 2 protocol so, for example, the “-r” feature doesn’t work anymore. It’s on my to-do list to figure out.
 
Thanks! I'm not so familiar with those finer points of C++. This really helps.

I added just a couple lines to print the IP number, and then I ran it here with a Teensy 4.1 and W5500. Something isn't working quite right, but it seems like we're close. The test starts but something stalls and no more data flows. Will investigate soon.

Here's what I see on the Arduino Serial Monitor

Code:
Starting IPerfServer...
MAC = 04:e9:e5:0c:12:3c
Starting Ethernet with DHCP...
Waiting for local IP...
My IP number is 192.168.195.248
Starting server on port 5001...done.
Connected: 192.168.194.2:40542
Connection count: 1
192.168.194.2:40542: Settings:
    flags=0x40010080
    numThreads=1
    port=5001
    bufLen=1460
    winBand=0
    amount=-1000
192.168.194.2:40542: ExtSettings:
    type=0
    length=0
    flags=0x00000000
    version=2.2.1.1
    rate=0
    tcpWritePrefetch=0

and here's what iperf gives

Code:
> iperf -c 192.168.195.248 -i 1 -l 1460
------------------------------------------------------------
Client connecting to 192.168.195.248, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.194.2 port 40542 connected with 192.168.195.248 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-1.00 sec  47.5 KBytes   389 Kbits/sec
[  1] 1.00-2.00 sec   588 Bytes  4.70 Kbits/sec
[  1] 2.00-3.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 3.00-4.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 4.00-5.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 5.00-6.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 6.00-7.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 7.00-8.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 8.00-9.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 9.00-10.00 sec  0.000 Bytes  0.000 bits/sec
[  1] 0.00-20.51 sec  48.1 KBytes  19.2 Kbits/sec
 
I can have a look more tomorrow. Do you feel like trying out the original version with QNEthernet plus its W5500 driver? You should only need to uncomment “QNETHERNET_DRIVER_W5500” in the qnethernet_opts.h file.
 
I looked briefly at the Nucleo-F412 benchmark code. Looks like they're just receiving all incoming data without any parsing of iperf's info.

So I tried to create a similar very simple benchmark. I also added a Mbit/sec bandwidth print every second from the Teensy side.

Here's the code:

Code:
#include <Ethernet.h>

EthernetServer server(5001);
uint8_t rxbuf[16384];
static void getMACAddress(uint8_t* const mac);

void setup() {
  Serial.begin(115200);
  while (!Serial && (millis() < 4000)) ; // wait for serial monitor

  Serial.println("Simple iperf receive");
  uint8_t mac[6];
  getMACAddress(mac);
  if (!Ethernet.begin(mac)) {
    printf("Failed to start Ethernet\r\n");
    while (1) ; // stop here
  }
  Serial.println("Ready to receive, use commmand:");
  Serial.print("  iperf -i 1 -c ");
  Serial.println(Ethernet.localIP());
}

void loop() {
  EthernetClient client = server.available();
  if (client.connected()) {
    Serial.print("iperf test begin, from ");
    Serial.println(client.remoteIP());
    int sum = 0;
    elapsedMillis msec = 0;
    while (client.connected()) {
      int avail = client.available();
      if (avail < 0) break;
      if (avail == 0) continue;
      if (avail > (int)sizeof(rxbuf)) avail = sizeof(rxbuf);
      int num = client.read(rxbuf, avail);
      if (num > 0) {
        sum = sum + num;
        uint32_t ms = msec;
        if (ms > 1000) {
          Serial.print((float)sum * (float)(8.0 / 1000000.0));
          Serial.println(" Mbits/sec");
          msec -= 1000;
          sum = 0;
        }
      }
    }
    Serial.println("host disconnected");
    client.stop();
    Serial.println();
  }
}

// Gets the Teensy 4's MAC address and fills in the given array.
static void getMACAddress(uint8_t* const mac) {
  const uint32_t m1 = HW_OCOTP_MAC1;
  const uint32_t m2 = HW_OCOTP_MAC0;
  mac[0] = static_cast<uint8_t>(m1 >>  8);
  mac[1] = static_cast<uint8_t>(m1 >>  0);
  mac[2] = static_cast<uint8_t>(m2 >> 24);
  mac[3] = static_cast<uint8_t>(m2 >> 16);
  mac[4] = static_cast<uint8_t>(m2 >>  8);
  mac[5] = static_cast<uint8_t>(m2 >>  0);
}
 
Last edited:
I get 11.96 Mbit/sec Ethernet.h default 14 MHz SPI.

I get 26.26 Mbit/sec after editing libraries/Ethernet/src/utility/w5100.h for 30 MHz SPI.

These tests were done from my Linux desktop communicating on my LAN which uses 2 ethernet switches. I used a Teensy 4.1 and W5500 adaptor. This uses only the Ethernet.h library that shipped with Teensy software. Just copy the code from msg #23 into Arduino IDE and run iperf on your PC to reproduce this test.
 
This benchmark can run slightly faster if client.connected() is called only when no data is available to read. Code below.

Results with default 14 MHz SPI:

Teensy 4.1: 12.02 Mbit/sec
Teensy 3.6: 11.33 Mbit/sec
Teensy 3.2: 10.79 Mbit/sec
Teensy LC: 6.76 Mbit/sec


Code:
#include <Ethernet.h>

EthernetServer server(5001);
uint8_t rxbuf[4096];
static void getMACAddress(uint8_t* const mac);

void setup() {
  Serial.begin(115200);
  while (!Serial && (millis() < 4000)) ; // wait for serial monitor

  Serial.println("Simple iperf receive");
  uint8_t mac[6];
  getMACAddress(mac);
  if (!Ethernet.begin(mac)) {
    printf("Failed to start Ethernet\r\n");
    while (1) ; // stop here
  }
  Serial.println("Ready to receive, run this commmand on your computer:");
  Serial.print("  iperf -i 1 -c ");
  Serial.println(Ethernet.localIP());
}

void loop() {
  EthernetClient client = server.available();
  if (client.connected()) {
    Serial.print("iperf test begin, from ");
    Serial.println(client.remoteIP());
    int sum = 0;
    elapsedMillis msec = 0;
    while (1) {
      int avail = client.available();
      if (avail < 0) break;
      if (avail == 0) {
        if (client.connected()) continue;
        Serial.println("host disconnected");
        break;
      }
      if (avail > (int)sizeof(rxbuf)) avail = sizeof(rxbuf);
      int num = client.read(rxbuf, avail);
      if (num > 0) {
        sum = sum + num;
        uint32_t ms = msec;
        if (ms > 1000) {
          Serial.print((float)sum * (float)(8.0 / 1000000.0));
          Serial.println(" Mbits/sec");
          msec -= 1000;
          sum = 0;
        }
      }
    }
    client.stop();
    Serial.println();
  }
}

// Gets the Teensy 4's MAC address and fills in the given array.
static void getMACAddress(uint8_t* const mac) {
#ifdef __IMXRT1062__
  const uint32_t m1 = HW_OCOTP_MAC1;
  const uint32_t m2 = HW_OCOTP_MAC0;
  mac[0] = static_cast<uint8_t>(m1 >>  8);
  mac[1] = static_cast<uint8_t>(m1 >>  0);
  mac[2] = static_cast<uint8_t>(m2 >> 24);
  mac[3] = static_cast<uint8_t>(m2 >> 16);
  mac[4] = static_cast<uint8_t>(m2 >>  8);
  mac[5] = static_cast<uint8_t>(m2 >>  0);
#else
  static uint8_t dummymac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
  memcpy(mac, dummymac, 6);
#endif
}
 
Back
Top