Here's a quick attempt to answer all the questions.
- Is this doable with only one Teensy 4.1 board or should I have three Teensy boards, 1 for each LED pole?
From a purely data processing point of view, just one Teensy 4.1 ought to be capable of controlling 9000 LEDs.
But due to the electrical challenges of so much power, already mentioned quite well, and just from an ease of construction and maintenance point of view, using 3 would be much better. More on that in a moment...
- How to mitigate the delays controlling such a big system? This needs to be run real time and synchronized to live music. Se delay must be under 50 ms.
As already mentioned, the 800 kbit/sec speed of addressable LEDs is the main bottleneck. You really must use parallel output. I'd recommend using OctoWS2811 with the Teensy4 pinlist feature to drive 12 parallel outputs. The should give you about 7.7ms LED update.
Everything else, at least on Teensy, is so much faster than the LED data rate that you (probably) don't need to worry. That was even usually true with the much slower Teensy 3.2 board so many people used. I can recall only a couple times people had speed problems from anything other than the LED comms speed over the ~10 years when Teensy 3.2 was the main board used. Inefficient reading from a SD card was one. If you read patterns from a SD card, read data in large chunks, ideally multiples of the 512 byte SD sector size. The other speed issues where complicated patterns that used floating point trig functions sin(), cos(), etc on every pixel, which could run fast enough on Teensy 3.2 with software floating point when the phase input was between 0 to 2*PI (360 degrees) but slowed quite a lot if the phase variable kept growing infinitely. Of course Teensy 4.1 has hardware FPU and runs about 11X faster for normal integer-based code... but if you create fancy flowing effects by using a many trig functions on each pixel, constrain your phase input to 360 degrees. Point is, LED 800 kbit/sec is almost always the bottleneck by such a huge margin that you usually don't have to worry about other stuff on the Teensy side.
The 2nd likely bottleneck I would anticipate, if the controlling PC runs Microsoft Windows, is the user process scheduling latency. Sometimes Windows will try to save power by reducing the rate it schedules things to run to as slow as 16ms. There are programs that keep Windows in a "multimedia" mode where it won't schedule programs slower than 1ms. If you use Windows, make sure you take care of this detail and test how things really perform when the PC runs for a long time without keyboard or mouse input.
- Basically there are sections that correspond to certain MIDI note. For example MIDI note 127 from channel 1 lights up pole 1, panel 1, 1 section.
- Is Teensy MIDI device via USB or do I need some extension? Basically just plug USB cable from computer and it can be seen as a MIDI device where to send MIDI notes from DAW?
In Arduino IDE, click Tools > USB Type and choose MIDI.
Detailed documentation:
https://www.pjrc.com/teensy/td_midi.html
- What do you think about the power management?
Delivering 3kW is a huge challenge. If possible, you should mount the power supplies behind the LEDs and use many pairs of wires as reasonably short as possible from the LEDs to the power supplies.
You might do better to use several smaller power supplies (with GND connected together) each rated for 150W or 200W rather than 1 big one rated for 1000W. Most power supplies really struggle if run at their max for sustained time. If your animations might do that, best to over-provision the power supplies by 20%.
- Wow to mitigate the possible problem to control ws2815 panels from Teensy, since the length from Teensy PIN to panels might be around 1-2 meters. If I would use 3 Teensy boards then the length from Teensy to panel would be very short since the board can be attached behind the panels, of course then the USB cables needs to be 2-3 meters long from computer.
Ground shift between the high power usage might be a major problem with USB. Doubly so if trying to run a Teensy pin or even buffer chip driving a line that long.
At least with USB you can buy isolators. The cheap ones will also have an advantage of limiting USB to only 12 Mbit/sec speed, which gives you less dependency on high quality cables.
The high-end alternative would be use of Ethernet rather than USB. It really is the excellent choice for large scale LED projects. Ethernet signals always pass through a transformer which elinimates the ground difference problems. The signals are rated to travel up to 100 meters. The market for Ethernet is extremely mature with low prices for high quality cables, switches and other gear.
- Is it smart to use the ws2815 panels or should I go with some kind of custom solution or something else? I'm seeking very bright LEDs that are suitable for music stage show for mid size club environment.
Whatever type of diffuse material you put in front of the LEDs (if any) will play a large role. This many LEDs at max brightness viewed directly indoors will be painfully too bright. You'll lose some of that brightness in exhance for a much more artistic look by placing material between the LEDs and people's eyes. I can't give you a conclusive answer. It really is a matter of experimentation to find the material and construction that will give you the artistic appearance you want.
- For code I am thinking to use FastLed library.
If possible, best to use OctoWS2811 directly.
FastLED can use OctoWS2811 as its output driver, but it does not (yet) support the pinlist feature. So if you use FastLED, plan on 8 parallel outputs. Maybe connect 6 and leave the last 2 unused. That'll give you about 15.4 ms update, where again the 800 kbit/sec LED communication speed is your main bottleneck. Not as fast as 12 outputs, but still above 60 Hz so I wouldn't worry about it too much.
Or alternative you could use WS2812Serial, which FastLED can also use as a driver.
DO NOT use the built in FastLED driver. It is a simple blocking approach. You will experience problems as you scale up to many LEDS, because it blocks interrupts you need for USB or Ethernet or even regular serial MIDI communication. For this many LEDs where you're depending on communication that could arrive at any moment (won't be 100% guaranteed to only arrive when you're not updating the LEDs) you really must use a non-blocking driver like OctoWS2811 or WS2812Serial.
Both of those libraries come with examples showing how to use them from FastLED, so in Arduino IDE just click File > Examples > OctoWS2811 and File > Examples > WS2812Serial and look for the example with a name indicating it's the one demonstrating FastLED usage.
- Any other pointers and advice? Is this doable at all?
Very doable. My main extra advice is to plan for problems. That's the main reason to use 3 Teensy boards. If you make each unit as self-contained as possible, if (when) something goes wrong with 1, hopefully the other 2 continuing to work will be good enough for a satisfying show. Again, Ethernet is pretty much the ideal way to communicate (at least electrically), as the magnetic signal coupling gives each part the best chance of continuing to work properly even when there's an electrical problem with the others.
But to use Ethernet you'll need to create software which sends packets (probably UDP) when the MIDI messages are seen. If you can do that software work, it's probably worth the huge electrical benefits of Ethernet. You could still use USB MIDI and have a 4th Teensy which hears the MIDI output from your PC and in turn transmits UDP packets on its Ethernet port. Then you'd connect it and other other Teensy Ethernet ports to an Ethernet switch. It's extra hardware, but you can keep each piece well isolated from the others. If (when) something goes wrong, you'll probably be glad to have it built as separate parts that are easy to swap (if you build spares) and get the show back up and running.