It depends on how optimized the drivers are and whether the devices do things like tri-state MISO properly.
Assuming you have done all things in the quoted document (pull-up resistors on the CS pins, verified the device tri-states MISO, and verified that the driver uses SPI.beginTranasaction and SPI.endTransaction), if the drivers are dumb, then it may not matter if if you have one SPI device or two. I suspect it will do one device and then another.
However, if you have the proper driver with the appropriate magic, then it will speed things up, because the driver queues things up to do transfers in the background (using DMA) while it is free to do other things.
Note, while I am familiar with things at a high level, for details you want the experts (KurtE, mjs513, defragster, and of course Paul S.).
The 3.x world seems to be very different from the 4.0 world.
In the 3.x world you could have 5 high speed options, spread over 9 pins. Note, some pins internally used the same ports, so you could use on or the other pin with high speed SPI, but not both. The pins are:
Note, if your device uses a D/C pin, that must also come the list of fast pins.
One of the canonical programs that does use dual SPI is uncannyEyes, which draws eyes on two 128x128 displays as fast as it can. In uncannyEyes, you need 3 fast pins (one for the CS for each display, and the third for the shared D/C pin). In my current configuration, I tend to use pins 23/22 as the two CS pins and 15 as the D/C pin, but I may be changing this to use a more standard CS pins 10/9.
The author of uncannyEyes has been working on newer versions that target other microprocessors (Rasberry Pi, and the Adafruit Monster M4SK which uses an ARM Cortex M4). IIRC, in the Teensy version, he does one display, waits until it is finished and then does the other display. In the newer version that uses 240x240 displays instead of 128x128 displays (i.e. 4 times the data), I believe he does both SPI streams in parallel (using 2 SPI ports). If that is ever ported back to the Teensy 3.5/3.6, it would likely then need separate D/C pins for each display.
While the Teensy 3.5/3.6 have multiple SPI buses, I think the other SPI buses are limited to a single fast pin. This doesn't help if you need both CS and D/C pins to be fast pins.
Teensy 4.0 is a lot more complicated. The SPI internals changed, and also the memory caching behavior is causing some issues. My sense is for ultimate speed, you will devices on separate SPI buses. In addition to the pins meant to be soldered with a ribbon cable to connect to a SD drive (SPI3), there is a second SPI bus on the Teensy 4.0. You would need to solder two wires (or use a breakout shield) to pads #26 and 27 and use pins 0/1 to use SPI2. These pads are at standard 0.1" (2.54mm) spacing and should easier to solder than the SPI3 pins (1mm).
You might want to read this thread:
But the discussion has sprawled out to other high volume threads:
Do you have pull-up resistors for the CS and D/C pins? Somebody else had a similar issue and pull-up resistors helped. For me, I was having screen corruption, and adding 2.2K pull-up resistors allowed me to raise the SPI bus speed when I was talking to two different displays. A pull-up resistor would be a resistor that goes between the pin and 3.3v in parallel to the normal wire. I've used 2.2K pull-up resistors, which I also use for i2c buses.