But, what about the teensy 3.0/3.1? How long can an ISR run at most before making the counter "drift"?
Teensy 3.x and LC have nested priority interrupts, so the answer depends on the priority level of the other ISR.
Priority numbers are 0 to 255, where lower numbers means higher priority. The Systick interrupt that updates millis is assigned priority 32. By default, Teensyduino defaults most interrupts for priority 128. USB is at 112. Hardware serial is 64. Audio library processing of data is done at 208. Non-interrupt code effectively runs at priority 256. If you use an interrupt, you can set it to any priority you like. If you don't, you'll get the 128 default.
As an example, assume the audio library is taking a LOT of CPU time at priority 208. Systick is able interrupt the audio interrupt, due to the hardware support for prioritized nesting. Nesting allows higher priority interrupts to function as if interrupts aren't blocked. Each interrupt only blocks other interrupts of equal or lower (higher numerical value) priority.
Of course, the main program or interrupt code can completely disable all interrupts. This is fairly common, but it's usually done for very short times. There are some exceptions, like Adafruit_Neopixel and DmxSimple, but as long as you're not running code which disables interrupts for lengthy times, or hogs CPU in a high priority interrupt, Systick can update the millis count.
On minor caveat is the number of nesting levels available. Even though the priority numbers are 0 to 255, they are in groups where all numbers within the group are equal. Cortex-M4 on Teensy 3.1 can support up to 16 levels of nesting, so priority levels 0 to 15 are all the same, 16 to 31 are the same, and so on. Cortex-M0+ on Teensy-LC supports 4 levels. If you configure an IntervalTimer for priority 48, on Teensy 3.1/3.2 it will not block Systick at 32, but on Teensy-LC level 48 is the same as 0 to 63, so it blocks Systick at 32 until it's done.
The priority interrupt nesting is one of the truly awesome features of ARM Cortex-M chips. It's all done automatically by hardware, so there's no extra overhead other than just configuring the priority levels. Quite a bit of thought and work as gone into establishing good defaults, so all the commonly used functions and libraries can work together with great compatibility, even when you push several of them to their performance limits.