A few observations.
One, Ethernet is not the right way to do this. It is not designed for timing this accurate. There is a lot of overhead with TCP/IP: packet fragmentation, checksumming, reliability, etc. The use case for it is more asynchronous communications where you "eventually" get a packet through, and it's "good enough" whether it takes two milliseconds or two hundred. For industrial control there are other options that have much less latency, like CAN and SPI. If the boards share a common ground, you could just run a single line from the master to each slave. (BTW, the industry is trying to get away from "master/slave" nomenclature. Not here to judge the merits of that, merely communicating it.)
My understanding of PTP is that it answers the question "what time is it, really" but is not intended to directly drive these kinds of communications. To do this over TCP is even slower than UDP because it's designed for reliability, whereas UDP is less reliable but faster.
Two, not sure you need this topology at all. I don't know how many I/O lines each radar uses, or how much heavy lifting each Teensy is doing. If one Teensy can run all three radars (enough I/O lines + CPU + RAM) then I would do away with the setup in the diagram entirely. Teensy is capable of multithreading using timers. If your accuracy is +- 0.5 uSec, it's probably accurate enough. You can test with an oscilloscope.
If the timers are not accurate enough, or if they take too much CPU, you can also set up three waveforms in RAM (a binary image with 0 for low and 1 for high, with each bit representing 1 microsecond, or however fine-grained you want it) and use the DMA controller to continuously send them to the right pins. If the DMA controller is doing that, your CPU consumption from this activity is zero, and it will keep transferring the waveforms that drive the pins until you tell it to stop. The speed of each DMA channel is set by a clock and divisor register, and the accuracy is extremely high. Each DMA channel can run at an independent speed.
Depending on how the radar is set up, DMA may be the right answer for reading data from the radars as well. If it wants to transfer the data at exactly 11.5KHz, or at exactly 11.5MHz, or whatever, DMA will do that. On the other hand, if it has clock + data lines, you have a few options.
The first option is to write a "forever loop" that polls the clock pin for the active radar, and when it reaches the desired state (high or low, depending on the radar specifications) it samples that radar's data pin. If you have to do post-processing on the signal, you can do it after this, but as the post-processing gets more sophisticated it can take longer. That puts you at risk of missing input pulses, having to worry about instruction cycle timing, etc.
The second option is to plumb each radar's clock signal into an IRQ line. When the clock signal is at the state meaning "read data now", the IRQ "steals" the CPU and sends the program counter to an interrupt service routine. The ISR reads the data line, appends the value to the input buffer, and then returns control. In this way, you can experiment with post-processing, while remaining confident that you will never miss an input pulse. You still have to do everything fast enough, of course, but this gives you more protection against accidentally missing a pulse. Some little logic in your main loop can check to see if it's starting to fall behind the input data, and raise an error so that you know your post-processing routines have to run faster.
The third option, and I'm not sure if the chip supports it (probably?) is still DMA, and it transfers from the data pin to RAM when the IRQ line reaches the desired state. This is very similar to the ISR, except that it consumes zero CPU.
If you are doing this on a Teensy 4.1, my assumption is you have "all the time in the world" to do post-processing, so you can pick whatever method is best. Polling and filtering should be easily possible from a main loop, but with DMA - whether you have a clock line or not - you know the data is ALWAYS going to get transferred, no matter what, with zero CPU consumption. It's also potentially more energy-efficient because you are not spending any CPU time on polling the clock signal in a forever-loop.