Brainstorming possible causes of Ethernet link loss

shawn

Well-known member
I’m looking for some thoughts because I’m stuck on how to even debug this. I can’t reproduce it, but a customer sees this from time to time.

So I have a project where everything runs fine over a network. It’s a lighting controller that receives lots of sACN UDP packets. Once in a while, the Ethernet activity just stops. That is, until the ribbon cable attached to the Teensy 4.1 is unplugged and re-plugged. Unplugging and re-plugging the Ethernet cable doesn’t seem to help, just unplugging and re-plugging the ribbon cable.

Based on some minimal debugging, I’ve confirmed that the system thinks that there’s no link. When the ribbon cable is reseated, the system regains the link. The strange part is reseating the Ethernet cable does not restore the link.

I’m trying to brainstorm possible reasons, and I would love others’ thoughts. Here’s what I have so far:
1. EMF
2. The switch/router drops the link (I’ve asked for logs, but I haven’t gotten them yet)
3. My Teensy 4.1 driver (inside QNEthernet)
 
Last edited:
@PaulStoffregen might know what the ribbon cable re-plug might do to the PHY chip?

Isn't there an enable pin to the chip? Can that be toggled off on to maybe trigger the same behavior/reset?

Is this happening on more than one device?
 
It’s happening on many devices. I’m going to check my link-related driver code and investigate the schematics.

Is it possible that EMF can be a cause of something like this, say with power supplies that are known to have bad radiating noise? With the particular power supply I'm thinking of (I don't have the model number just now Lumipro P-200S-24-ETL), just bringing it near 3-wire LEDs makes them fail and appear to cause data to be skipped to the next pixel.
 
Last edited:
Looking at this circuit, the Teensy Ethernet kit connects the Teensy ground to the Ethernet cable. That seems to me to be vulnerable to EMC … not familiar enough with good Ethernet practice to know if that’s standard or not. I always thought it was isolated by the magnetics up to however many kV they were specced to…
 
I'll add some more info about the hardware: Teensy 4.1, Ethernet Kit (PJRC version), double-high pins into an OctoWS2811, likely unshielded Ethernet cable for both the LED outputs and the network connection.
 
If I were to build up a couple alternate adapters with different capacitors, would you be able to give them a try in these problematic applications?
 
The magjacks used with the Ethernet module have a 1000pf capacitor as well as a 75 ohm resistor in series between the cable ground and the Teensy ground. This includes both the Cetus J1B1211 and the Link-PP JPJ4012AHNL. Paul's example schematic doesn't show that cap for some reason.

Here is the schematic from the Cetus spec sheet.
1769209611166.png

EMF/EMI could definitely cause the link to drop according to AI and your LED test seems to clearly show your power supply is radiating pretty badly and a likely culprit in my opinion.

Reseating the cable will cause the link to be renegotiated. There is a 0.1uF cap between the center taps on pins 2 & 5 to Teensy ground on the Ethernet module as well as the magjack magnetics which would be disconnected from the Teensy circuit when the jumper cable is removed and would stay in circuit when the Ethernet cable is removed. I have no clue why that would make a difference from an electrical standpoint with the PHY chip.

If you do use a shielded cable just be aware that the PJRC module does not ground the shield, so it would need to be grounded on the other end. A shielded cable that isn't grounded at either end can cause other issues.
 
@KenHahn - Thanks for these notes. Maybe my driver needs to be more aware of a need for link renegotiation. Maybe this can, at least partially, be mitigated in software.

@PaulStoffregen - If you have the parts, may I request 3 or 4 of them? (Or whatever you can supply, if not. I'll put them in more than one place.)
 
@KenHahn Do you know the reason why reseating the Ethernet cable won't cause auto-negotiation but reseating the ribbon cable will? I'm going through the DP83825i docs but nothing's jumping out at me.
 
I just ordered fresh PCBs from OSH Park, both the PJRC version and something very similar to SparkFun's version. I got only 3 pieces of each.
 
Update: I just added a driver_restart_auto_negotiation() function to the drivers (the Teensy 4.1 driver is the only one that does something), and I'm now calling this on link-down in the project (not in the driver). This is in the hopes that I actually get a link-down notification. Note that I haven't tried it out yet with the project. More updates forthcoming.

The gory details: That function sets bit 9 of the BMCR register to '1'. (Page 44 of the spec.)

The QNEthernet library branch is: restart-auto-neg.
 
Last edited:
Update: after implementing a call to driver_restart_auto_negotiation() when link-down is detected, the Teensy managed to stay on the network. So far. It’s a positive sign, but a few more days will tell us more.
 
Update: It's been a few days, and I haven't seen the Ethernet freeze since implementing the renegotiate-on-link-down fix. It seems to be working.

Here's some sample code (with a bonus abortAll() call):
C++:
#include <QNEthernet.h>

using namespace qindesign::network;

// Forward declare this
extern "C" void driver_restart_auto_negotiation();

void setup() {
  // ...some code...

  Ethernet.onLinkState([](bool state) {
    if (state) {
      printf("[Ethernet] Link ON\r\n");
    } else {
      printf("[Ethernet] Link OFF\r\n");

      // I also do this because Windows systems forget TCP connections
      // This causes the connections to take forever to close
      internal::ConnectionManager::instance().abortAll()

      // Help mitigate link drop when EMF
      driver_restart_auto_negotiation();
    }
  });
 
  // ...more code here...
 }
 
I just realized that all the driver functions are included in <QNEthernet.h>. There’s no need to do that forward declaration.
 
I've also got this problem now, and the solution mentioned above does not work for me. I can't get the link back up without reseating the ribbon cable to the network jack, or power cycling the Teensy.
I can never remember having this problem before now, but I've also got it to happen with the NativeEthernet library, so it's certainly not a QNEthernet regression.
 
A Friday update: I'm still in the process of acquiring parts to duplicate the setup. (The setup that was experiencing problems was already shipped to the show.) I have the same 24V power supply, same WS2814 24V LEDs, and same 12/24V-to-5V voltage converter. Now I'm waiting on some other parts that should arrive today. (USB micro cables with bare wires — I didn't feel like cutting up a cable.) Also, I have the Ethernet adapters with the different capacitor configurations.

Sometime in the next few days, I'll try to duplicate the issue and then, once I've done that, I'll see if the other Ethernet kits mitigate the problem.
 
I've also got this problem now, and the solution mentioned above does not work for me. I can't get the link back up without reseating the ribbon cable to the network jack, or power cycling the Teensy.
I can never remember having this problem before now, but I've also got it to happen with the NativeEthernet library, so it's certainly not a QNEthernet regression.
I wonder if restarting the PHY is the solution, rather than just restarting auto-negotiation. Once I can duplicate the problem, I'll experiment with that too.
 
@thomj I just added driver_reset_phy() to the latest in GitHub. Could you try that instead of driver_restart_auto_negotiation() and see if it solves the problem?
 
Last edited:
@thomj I just added driver_reset_phy() to the latest in GitHub. Could you try that instead of driver_restart_auto_negotiation() and see if it solves the problem?
I just had one instance of PHY reset reestablishing the link, where I'm pretty sure it would have died before. I will continue testing during the week.
 
Hi all,

I've been dealing with intermittent Ethernet link loss on a lighting installation and wanted to share my findings in case it helps others, and to ask if anyone has further ideas.

Hardware: Teensy 4.1 + PJRC Ethernet Kit + OctoWS2811 driving ~2744 WS2818 12V LEDs, receiving Art-Net unicast. A similar setup worked reliably on a project a couple of years ago. But I don't think it's electrical noise since I've gotten to the point of no LEDs plugged in and just the Teensy, Octo (w/ no LED connections) and the Teensy Ethernet Kit.

What I've ruled out:
  • Tested 3 Teensies, 3 Ethernet kits, multiple cables — same behaviour across all
  • USB-only heartbeat runs stable 30+ min, so MCU/USB path is fine
  • Repros on both NativeEthernet and QNEthernet, so not a library regression
  • Power/ground cleanup helped but didn't eliminate it
  • Direct Mac ↔ Teensy connection (USB-C dongle → Ethernet Kit, no switch) seems to drop more frequently — still testing whether a switch in the middle makes a meaningful difference
Failure pattern:Link drops either immediately or anywhere between ~1–7 min. It doesn't just drop once cleanly — it bounces: UP → DOWN → UP → DOWN repeatedly before eventually getting stuck DOWN permanently. Unplugging/replugging the Ethernet cable rarely resolves the issue. Sometimes, only reseating the ribbon cable or power cycling helps or giving everything a break and unplugging everything, letting it sit for a mysterious amount of time and trying again.

One observation: when the link light on the Ethernet Kit goes off, the software sometimes still shows packets arriving and the count still climbing... When the link light on the Ethernet Kit goes off, packets are sometimes still arriving and the count still climbing.

Updated findings — not just EMF: I originally suspected EMF from the LED strips and power supply as the main trigger, since LED load makes it worse and faster. But I've now reproduced the drops with no LEDs connected and no LED power supply at all, so EMF may a contributing factor but not the sole cause.

The most interesting part: if I set it aside and come back later, it works again... sometimes on the same Teensy that was stuck. It recovered on its own after ~5 minutes with nothing changed, no LEDs, just sitting idle. This suggests the PHY is entering some kind of stuck state and eventually re-negotiates on its own.

I've also noticed the connection light drop on the Teensy Ethernet Kit side, but the switch or Ethernet dongle continues to flash with packets sending/arriving.

Software fix so far: Using QNEthernet 0.35.0-snapshot with driver_reset_phy() in onLinkState plus a retry loop every 10s if the link stays down:

C++:
Ethernet.onLinkState([](bool state) {
  if (state) {
    Serial.println("[Ethernet] Link: UP");
    linkDown = false;
  } else {
    Serial.println("[Ethernet] Link: DOWN - resetting PHY");
    linkDown   = true;
    linkDownMs = millis();
    driver_reset_phy();
  }
});

// In loop():
if (linkDown && (millis() - linkDownMs > 10000)) {
  Serial.println("[Ethernet] Still down - retrying PHY reset");
  linkDownMs = millis();
  driver_reset_phy();
}

This recovers some drops that previously required physical intervention, but doesn't prevent the bouncing or the eventual stuck-DOWN state.

Has anyone seen or noticed more nuanced repeated patterns yet? Particularly the natural recovery after sitting idle? And would additional filtering/decoupling on the Ethernet kit help, or is there a deeper PHY reinit possible from software? I'm currently testing and letting my setup run with no lights at the moment, keeping an eye on how long the UP state persists.

Thanks to all for the help and the updated QNEthernet patch!

🙏🙏🙏
 
Thanks, @jshaw3. Welcome to the thread! :)

Here's an update: So I got all the equipment I thought would cause the issue, but I couldn't reproduce it. So I reached out to my customer and he said that the "Ethernet drop" happened when the sACN packet transmission was stopped. I tried this by sending sACN for a while and observing the LED outputs. Then I stopped the transmission. I still couldn't reproduce it. My next step was to connect the Teensy to one of those mini TP-Link doodads in client mode. Up until now, I've been testing directly to my laptop and also via an eero node. Next up is to try my with my own mini TP-Link. That's how the customer had his set up.
 
I have a similar problem since quite some time. This is on a custom pcb with a micromod teensy and the same TI PHY on ENET2. It (very) rarly happens that when I plug the Ethernet cable in, link won't come up, reseating the rj-45 cable always works (so far). I did various PCB revisions and never had problems on my older revision. On the older working designs, I used the full pi filter between +3.3V and +3.3VDAC (VDDA of the PHY) - that is the 10nF was populated. In my current revision (with the issue) the 10nF is not populated as I feared ringing between the 3 MLCC X7R caps. I was probably too panicking.

My magnetic is discrete (it's a PoE project). Another difference to the working board is that the current board did not populate a 10nF between/across the CTs of the cable-side transformer and bewteen RJ45 pin 4/5 to pin 7/8. The current board also has one 1000pF/2kV shared for both cable-side CTs (through 22nF and 75 Ohm each) - the working board has the same filter but a dedicated 1000pF/2kV for each cable-side CT to ground. The 0.1uF to ground are present on the PHY side for both CTs.
Apart from that the schematics and used parts are identical.

1773102171110.png


As for routing, there is also a change:
my current board (with the link issue) looks like this. Routing the diff pair is done on inner layer with ground plane outside (3*w distance) the trace pair on the same layer and ground plane on the next upper and next lower layer (continuous). The older good board had much uglier routing - but simply works.
1773102507596.png


The older good board had the transformer 90 degrees rotated in respect to the PoE power transformer that is nearby (and I know that this is quite radiating and dirty)

Software is same and uses my driver port to 2nd ENET on micromod teensy.

I'm posting this hoping to be helpful as I suspected the pi filter on 3V3 on the PHY being the main difference and the cap differences (however AI says, that should still work).

In my case, it always occurred after a power-on - not in midst of runtime and it was always working after a cable replug (even kept powered). Not sure if this is the same problem, but maybe wise to check the 3V3 filter which is different on the T4.1 board and not in reference with TI - maybe it needs the full filter to exactly such corner cases.
 
Back
Top