USB MIDI is incredibly fast compared to regular serial MIDI. But the underlying USB protocol is quite complex, so the overall timing depends on many factors which are challenging to measure.
With both serial and USB, when you send a MIDI message from Teensy, it goes into a buffer until Teensy's hardware can actually transmit the data. With serial, the buffer works in a simple way. Each byte is moved to the UART which transmits 31250 bits per second, including a start and stop bit for each byte. Some serial ports have a small FIFO, so if the FIFO wasn't already filled when your code sent the message, its bytes might get copied into the FIFO right away. Otherwise it happens as the hardware is ready. But the (slow) speed is easy to estimate, pretty much determined only by the baud rate.
USB is far more complicated. Like with serial, your message goes into a buffer, which lets your program keep running. At 12 Mbit/sec (Teensy 3.x, LC, 2.0), up to 16 MIDI messages can fit into a single packet, and at 480 Mbit/sec (Teensy 4.x) up to 128 message can fit. If you call usbMIDI.send_now(), whatever messages you've written are turned into a USB packet and given to the USB hardware to send to your PC. If your message fills up the buffer, it is also immediately turned into a USB packet and given to the hardware. But if you don't send enough messages to fill up a USB package, and you don't call send_now(), the USB MIDI code on Teensy waits until the next USB 1ms frame (12 Mbit) or 125us micro-frame (480 Mbit) to create a USB packet. This adds some latency, but is allows for much better USB bandwidth utilization, because packing more messages into the same packet is far more efficient.
So if you want lowest possible latency, call usbMIDI.send_now() after you've written your USB message(s). But know that you are using the USB bandwidth less efficiently by doing this. If your code will send several messages, only call send_now() after the last one.
When Teensy turns your MIDI message(s) into a USB packet, that's still not the end of the story. All USB bandwidth is managed by the USB host (your PC). Teensy can't decide when to transmit the packet. That packet sits waiting until the USB host controller chip in your PC sends an IN token. Teensy's USB device controller automatically responds to the IN token by transmitting the packet (or it responds with a NAK token if no packet is waiting to transmit on the Teensy side - during normal idle times your PC is rapidly sending INs and getting NAKs from Teensy). So there is an unavoidable delay that depends on how quickly your PC's host controller sends those IN tokens. That delay can vary. At 12 Mbit speed, times in the 40 to 250us range are typical. Things are much faster at 480 Mbit, not only because of the higher bitrate but also because timing margins are much tighter in the USB spec for highspeed mode. If other bandwidth hungry devices like high res USB webcams are active on the same USB host controller, they can hog as much as 80% of the 1ms frame (12 Mbit) or 125us microframe (480 Mbit), during which time your PC's host controller can't send IN tokens to Teensy. However, a 12 Mbit/sec Teensy is plugged into a hub, it actually communicates with a transaction translator inside the hub, which then communicates with your PC at 480 Mbit, which does allow the hub to send IN tokens while the upstream 480 Mbit is busy. Some hubs have only a single TT, so if 2 or more slower USB devices are plugged into the same hub, only 1 can use 12 MBit at a time. Other hubs have a TT on every downstream port. So there are a *lot* of complicated factors that go into how quickly Teensy will get an IN token that prompts it to actually transmit the USB packet.
But the really bad news is the delays that can happen on the PC side once the packet is received. Windows often has little delay before it schedules whatever software is waiting for the delay to actually run. But the operating system scheduling delay can be as high as 16 ms if your system is busy with other work, or running in a lower power mode. Linux and MacOS are better, but they can also have delays running the waiting program if other software is consuming a lot of resources.
Typically USB MIDI latency works out to be much lower than serial MIDI. Of course the bandwidth is much higher and Teensy does utilize it very efficiently if you don't call send_now(). But if you want lowest latency, you definitely should call usbMIDI.send_now(). Just know you're giving up efficient transmit to get lower latency.
With serial there's nothing you can do. The baud rate is fixed at 31250. But at least it's simple to estimate how much time is needed until your message is sent.