On Teensy 3.1, when you use Serial.print(), your data is written into USB packet buffers. Unless there's no buffers available, which shouldn't really happen under normal circumstances, Serial.print() should always return quickly.
By default, there are 12 of these USB packet buffers, each capable of holding 64 bytes. Two are normally queued to receive incoming data, and if the PC has transmitted data that hasn't be received with Serial.read(), more of the buffers might be used up holding that data. Normally, about 8 to 10 buffers ought to be available for Serial.print().
All USB endpoints automatically have end-to-end flow control. If the PC isn't ready to receive, the data simply waits in those buffers until the PC is able to receive it. But on Windows, Linux and Mac, the USB serial driver almost always receives data as rapidly as possible and holds it in big buffers on the PC side until the PC-side program (eg, the Arduino Serial Monitor) actually reads it.
Exactly when the PC reads the packets depends on the USB host driver and host controller chip, and how they interact with the application software running on the computer. There are lots of combinations. As you might expect, the drivers on Windows aren't nearly as good as those on Linux and Macintosh. But on all 3, the tendency is to read as many packets as Teensy is capable of sending without any delay, then stop as soon at Teensy can't reply immediately with another packet. Some versions of Windows won't retry again until the next 1ms USB frame. Windows also won't try again quickly if the application software issues small-size read requests. Macintosh is particularly good about dealing with small size requests efficiently, and Linux is somewhere in the middle. All 3 work really well if the PC-side application uses large blocks. But even on Macs, the way the application software is written has a substantial effect on the USB bandwidth utilization. The effect is particularly severe on Windows.
On Teensy 3.1, the packets that get queued up by Serial.print() and Serial.write() are given to the USB hardware, which can have at most 2 packets ready to transfer. So after each packet is transmitted to the PC, an interrupt is needed to reconfigure the USB hardware with another packet from the queue. On USB, each 64 byte packet takes about 6.42 us, factoring in USB protocol overhead. If the USB interrupt is delayed by 2 of these 6.42 us packet times, another packet can't get ready to be sent, so the USB hardware will automatically answer the USB host controller chip with a NAK response. That's not the end of the world, but it does mean those queued packets will tend to sit in memory until the host controller decides to start asking for data again. With some computers and Microsoft Windows, that might be an entire millisecond.
Normally, you don't need to worry about all this stuff. The USB stack takes care of everything. Normally on Teensy 3.1, interrupts run very quickly, so the USB interrupt doesn't tend to get delayed by 12 us.