Seeking some assistance with QNEthernet restart

shawn

Well-known member
Hello, all. I'm seeking some assistance, from anyone that might be so inclined, with an issue I just can't seem to solve when resetting QNEthernet.

I've noticed it myself before. A bunch of discussion about it can be found here:
https://github.com/ssilverman/QNEthernet/issues/31

In summary, if you call `Ethernet.end()` (then possibly waiting a bit), and then restart the Ethernet subsystem with `Ethernet.begin()`, data doesn't show up as often, producing what appears to be a drastic slowdown. Restarting several times in a row (a non-specific number of times) seems to fix it. Based on diving into this for a while, I suspect it has something to do with reinitializing the processor's Ethernet hardware (really the "MAC", but I'm hesitant to use that word because I don't want people confusing it with "MAC address"). If it can be shown to be something else, I'm open to that; this is just where my suspicions lie.

It appears that the problem happens somewhere in the lower layer and not in the lwIP stack. I've used printf's to monitor when packets arrive, and packets just don't show up, per what's going on in `enet_isr()` (in lwip_t41.c). At first I thought it was due to the PHY not being reset or restarted properly, so I played with various ways to do that, including in the `master` branch where you'll see some commits that affect this. Then I realized that it might also be the processor's Ethernet MAC.

In the sample code below, `kDoSNTP` is set to false. When you run this code, you'll note that DHCP requests (probably) take longer and longer. Setting `kDoSNTP` to true causes it to do an SNTP transaction after it gets an IP address. I've seen this version fail (kDoSNTP = true), however, I haven't seen it fail today. So there's something about sending traffic that makes it fail much, much less often.

Note: Perhaps my DHCP server doesn't like quickly-recurring requests, however I'm just using DHCP as part of the tests. I've seen reduced traffic come in when using a static IP, and without using DHCP. You might also see failures when running the unit tests. (See the contents of the test/ directory in the repo; these are runnable from PlatformIO.)

I've taken great pains to bring up and shut down the PHY and processor's Ethernet MAC properly. I've spent lots and lots of time going through example code, SDK code, the processor manual, and PHY manual (DP83825I). I've done oodles of experimenting. It means I've got lots of experience with how this all works, but it also means I likely have some tunnel vision. Maybe someone will see something I've missed.

In short, my suspicions are thus:
1. Probably the Ethernet MAC needs something else in addition to the reset process I already have.
2. Less likely: the PHY needs something else during its reset process.
3. Sending traffic at Ethernet start seems to improve things but doesn't fix it. In the issue linked above, you'll see a comment that adding that `dhcp_inform()` call after starting Ethernet improves the situation; its removal worsens the situation.

Links:
1. QNEthernet source
2. Processor reference
3. PHY reference
4. https://github.com/ssilverman/QNEthernet/issues/31

Here's the demo code:

Code:
// Demo code showing restart slowdown.
// https://forum.pjrc.com/threads/72608-Seeking-some-assistance-with-QNEthernet-restart

#include <QNEthernet.h>
#include <TimeLib.h>

using namespace qindesign::network;

constexpr uint32_t kDHCPTimeout = 30000;

// Constants for SNTP
constexpr uint16_t kNTPPort = 123;
constexpr uint32_t kSNTPTimeout = 20000;
constexpr uint32_t kEpochDiff = 2'208'988'800;  // 01-Jan-1900 00:00:00 -> 01-Jan-1970 00:00:00
constexpr uint32_t kBreakTime = 2'085'978'496;  // Epoch -> 07-Feb-2036 06:28:16

constexpr bool kDoSNTP = false;  // Whether to do an SNTP transaction

EthernetUDP udp;
uint8_t buf[48];

uint32_t dhcpFailCount = 0;
uint32_t dhcpTotalCount = 0;
uint32_t sntpFailCount = 0;
uint32_t sntpTotalCount = 0;

void setup() {
  Serial.begin(115200);
  while (!Serial && millis() < 4000) {
    // Wait for Serial
  }
  delay(2000);
  printf("Starting...\n");

  // Ethernet listeners

  Ethernet.onLinkState([](bool state) {
    printf("[Ethernet] Link %s\r\n", state ? "ON" : "OFF");
  });

  Ethernet.onAddressChanged([]() {
    printf("[Ethernet] Address changed\r\n");

    IPAddress ip = Ethernet.localIP();
    printf("    Local IP    = %d.%d.%d.%d\n", ip[0], ip[1], ip[2], ip[3]);
    ip = Ethernet.subnetMask();
    printf("    Subnet mask = %u.%u.%u.%u\n", ip[0], ip[1], ip[2], ip[3]);
    ip = Ethernet.gatewayIP();
    printf("    Gateway     = %u.%u.%u.%u\n", ip[0], ip[1], ip[2], ip[3]);
    ip = Ethernet.dnsServerIP();
    printf("    DNS         = %u.%u.%u.%u\n", ip[0], ip[1], ip[2], ip[3]);
  });

  uint8_t mac[6];
  Ethernet.macAddress(mac);
  printf("MAC = %02x:%02x:%02x:%02x:%02x:%02x\n",
         mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

  // Loop forever, bringing up and shutting down Ethernet

  int counter = 0;
  while (true) {
    printf("\r\n[Main] Attempt %d\r\n", ++counter);
    printf("[Main] Starting Ethernet with DHCP...");
    if (!Ethernet.begin()) {
      printf("Failed\r\n");
      continue;
    }
    printf("done.\r\n");

    printf("[Main] Waiting for IP...\r\n");
    elapsedMillis dhcpTimer;
    dhcpTotalCount++;
    if (!Ethernet.waitForLocalIP(kDHCPTimeout)) {
      printf("[Main] Failed to get IP address from DHCP\r\n");
      dhcpFailCount++;
    }
    printf("[Main] DHCP wait time: %" PRIu32 "ms\r\n",
           static_cast<uint32_t>(dhcpTimer));

    // Do an SNTP communiqué
    if (kDoSNTP && Ethernet.localIP() != INADDR_NONE) {
      sntpTotalCount++;
      udp.begin(kNTPPort);
      sendSNTP();
      printf("[Main] Sent SNTP request\r\n");

      printf("[Main] Waiting for SNTP reply...\r\n");
      elapsedMillis sntpTimer;
      bool failFlag = true;
      while (sntpTimer < kSNTPTimeout) {
        if (checkSNTP()) {
          failFlag = false;
          break;
        }
      }
      if (failFlag) {
        sntpFailCount++;
      }
      printf("[Main] SNTP wait time: %" PRIu32 "ms\r\n",
             static_cast<uint32_t>(sntpTimer));
    }

    printf("[Main] Ending Ethernet...\r\n");
    Ethernet.end();
    printf("[Main] Done ending Ethernet.\r\n");
    printf("[Main] Fail counts:\r\n"
           "    DHCP: %" PRIu32 "/%" PRIu32 "\r\n",
           dhcpFailCount, dhcpTotalCount);
    if (kDoSNTP) {
      printf("    SNTP: %" PRIu32 "/%" PRIu32 "\r\n",
             sntpFailCount, sntpTotalCount);
    }

    delay(1000);  // For posterity
  }
}

// Sends an SNTP request.
void sendSNTP() {
  memset(buf, 0, 48);
  buf[0] = 0b00'100'011;  // LI=0, VN=4, Mode=3 (Client)

  // Set the Transmit Timestamp
  uint32_t t = Teensy3Clock.get();
  if (t >= kBreakTime) {
    t -= kBreakTime;
  } else {
    t += kEpochDiff;
  }
  buf[40] = t >> 24;
  buf[41] = t >> 16;
  buf[42] = t >> 8;
  buf[43] = t;

  // Send the packet
  printf("[SNTP] Sending request to the gateway...");
  if (!udp.send(Ethernet.gatewayIP(), kNTPPort, buf, 48)) {
    printf("Failed\r\n");
  }
  printf("\r\n");
}

// Checks for a valid SNTP reply.
bool checkSNTP() {
  int size = udp.parsePacket();
  if (size != 48 && size != 68) {
    return false;
  }

  const uint8_t *buf = udp.data();

  // See: Section 5, "SNTP Client Operations"
  int mode = buf[0] & 0x07;
  if (((buf[0] & 0xc0) == 0xc0) ||  // LI == 3 (Alarm condition)
      (buf[1] == 0) ||              // Stratum == 0 (Kiss-o'-Death)
      !(mode == 4 || mode == 5)) {  // Must be Server or Broadcast mode
    printf("[SNTP] Discarding reply\r\n");
    return false;
  }

  uint32_t t = (uint32_t{buf[40]} << 24) |
               (uint32_t{buf[41]} << 16) |
               (uint32_t{buf[42]} << 8) |
               uint32_t{buf[43]};
  if (t == 0) {
    printf("[SNTP] Discarding reply\r\n");
    return false;  // Also discard when the Transmit Timestamp is zero
  }
  if ((t & 0x80000000U) == 0) {
    // See: Section 3, "NTP Timestamp Format"
    t += kBreakTime;
  } else {
    t -= kEpochDiff;
  }

  // Print the time
  tmElements_t tm;
  breakTime(t, tm);
  printf("[SNTP] Reply: %04u-%02u-%02u %02u:%02u:%02u\r\n",
         tm.Year + 1970, tm.Month, tm.Day,
         tm.Hour, tm.Minute, tm.Second);

  return true;
}

void loop() {
  // Nothing
}
 
You can also see the failure sometimes in the unit tests. Sometimes one of the tests fails to acquire a DHCP address and sometimes the UDP test fails because it doesn't receive an SNTP response within 20 seconds. To run the unit tests, go to the QNEthernet repo directory (assuming you've cloned it), and run the PlatformIO binary like so (assuming it's in your path, otherwise, run it directly):

Code:
pio test -v -f test_ethernet

Notes:
1. `platformio` is also a valid command.
2. The `-v` is "verbose" and prints all the test output messages.
3. The `-f` option "filters" which tests we want to run.
 
Back
Top