As you scale up to an extremely large LED array, eventually a few technical problems will come into play.... other than the obvious issues of material cost, huge power requirements, and the sheer number of hours needed to build the system.
USB bandwidth is the main technical issue. With 30 Hz video, each Teensy gets 180 kbytes/sec, and if you play a 60 Hz progressive video, it’s 360 kbytes/sec. Teensy 3.0 can easily handle these speeds, because each runs completely separate, other than listening for the frame sync pulse. The main reason for the 1000 LEDs per Teensy guideline is so each board can easily accomplish its job with plenty of timing margin, even if there is some extra USB latency. If there is scheduling latency of other temporary software lag on the computer (eg, Java's garbage collection), and then it transmits one or more frames too rapidly, the timing margin greatly improves the whole system's robustness. The USB bandwidth plus timing margin is main reason for the 1000 LEDs per Teensy guideline. In theory, you might be able to connect more than 1000 and maybe still be ok, but at $19 each compared to at least a couple hundred dollars for the LEDs, you really will not save much connecting more LEDs per Teensy. I personally test OctoWS2811 with 960 LEDs connected to each Teensy, so I highly recommend you follow the 1000 LEDs per Teensy guideline for best performance and compatibility.
Somewhere around 20 to 40 Teensys, the total data flow is going to become a substantial part of USB 2.0 bandwidth. You probably want to keep the total under half of USB’s capacity. You might need USB 3.0, or add extra USB 2.0 cards. Typically all the USB 2.0 ports on a computer share the same 480 Mbit/sec bandwidth.
You must use good quality USB hubs.
Beware of single-TT USB HUBs. They will cause problems. Only Multi-TT hubs properly share bandwidth. A USB "TT" is the Transaction Translator (basically a buffer memory and some logic) which converts between USB speeds. Sadly, very few hubs actually say which type they are... the marketing is all based on size, shape & color. But there are indeed 2 types of USB hubs and it does make a dramatic difference for a system like this. You can check which type a HUB actually is with Linux's "lsusb -v" or Windows 7's device manager.
If the display is incredibly large (more than 8 meters wide) and you have Teensy 3.0s located on both the left and right sides, USB cable lengths might become an issue? You might need to locate the PC in the center behind the display and run 2 long cables to a hub on each side, or perhaps a hierarchy of USB hubs on each side? Especially if using USB 3.0 to the first tier, and perhaps USB 2.0 in the 2nd tier, and then those hubs driving the Teensy boards, be aware of the USB cable lengths and use good quality cables.
Another issue is transmitting the frame sync signal. At some point, perhaps even as few as 8 boards, you’ll probably need to use a 50 to 100 ohm resistor on the frame sync pin. This is more a function of the physical wire you use. Using a good quality cable like CAT5 twisted pair with the ground and frame sync on a pair will help. If the wire runs a different path than the ground (as I did in this 1920 LED demo), the signal can get ringing and other transmission line effects.
At huge sizes, the you’ll probably need to add buffer chips to transmit the frame sync pulse to each group of other boards. This too is pretty easily solved. However, with only a wire, it doesn’t matter which board is the master sending the pulse and which others receive it. That makes things very easy, where you can just plug the boards in any order and no worry which ones get assigned to which device names or com port numbers. If you add unidirectional buffer chips like a 74HC245, then it’ll become important to make sure the intended board is the master. The first board in the list in movie2serial’s setup() is the master. You’ll just need to make sure the port corresponding to the board with those buffers is listed first.
A really good, but perhaps overly conservative approach to the frame sync would involve using RS-485 on a twisted pair cable. Still, if you're connecting over $10000 in LEDs, this might be worth considering. The digital pins from Teensy 3.0, and even buffer chips like a 74HC245, aren't meant to drive really long cables. RS-485 signals are designed for fast performance over thousands of feet of cable. You would need to make 1 driver board and a receiver board for each Teensy, so this is adds quite a bit of work, but it automatically handles this next issue...
At the incredible amount of current consumed by tens of thousands of LEDs, another possible issue you may face is differences in the ground potential. For example, a system using 40 Teensy 3.0s would be driving approx 40000 LEDs, which could consume about 2000 amps of current at 5 volts when all are driving fully white. Obviously you'll locate many power supplies as close as possible to the strips they power. You'll use #22 to #18 wire between the power supplies and the LEDs (a 1 meter strip of 60 draws up to 3 amps). But it's important to keep in mind the whole system's grounding. When we draw schematics, we put ground symbols all over the place and assume they're all at exactly the same potential. But in reality, electrons flow through metal that have non-zero resistance. Certainly you should measure both AC and DC voltage between "ground" at distant locations as the LEDs change power usage. You may need to add extra ground wires to keep the whole system at the same ground potential, or within 100 to 200 mV, so the USB can work.
The huge power requirements should be obvious, but just to mention it again, 40000 LEDs using 80% efficient power supplies (which hopefully have inrush limiting and active power factor correction) will draw about 52 amps at 240 volts AC. This much power would normally be supplied by a 3-phase service, where you'd probably run the many power supplies in 3 groups at 208 or 277 volts, depending on the type of service.
The final problem, which I doubt would ever be an issue on any modern computer, might be lack of CPU time to process the original video material into the many streams for each board. The code might need to be rewritten in C. However, running on a MacBook Air (certainly not the fastest machine), the 1920 LED demo was using 8% CPU time. I didn’t investigate the CPU usage in detail, but it was approximately the same just running without sending to any boards, so most of that 8% is probably the video decode, screen updates, or other stuff Java and Processing are doing to play the video. Still, each Teensy needs some CPU time on the machine sending the video, so as you scale up, keep an eye on how much of your computer's resources the software is using.
These limits come into play for extremely large displays, probably involving over $10000 in LEDs. If you use OctoWS2811 for such an awesome project, you really must send me photos!!!