4 bit SDIO is used automatically by the SD library when you use BUILTIN_SDCARD rather than a pin number for CS. That's the easy part.
But the actual transfer rate depends on much more than just the raw communication speed between Teensy and the SD card. The card's command latency also plays a big factor. How much that matters depends on the software, and also on the card. Genuine Sandisk Ultra/Extreme and Samsung EVO are among the fastest, but beware of counterfeits which are usually the very worst performing hardware.
On software, you have 2 choices (other than writing everything from scratch). Both use 4 bit SDIO mode. The Arduino SD library is very simple, but its not known for high performance. It reads 1 sector at a time using a single buffer, so the performance is impacted greatly by the card's command latency. Arduino SD is also currently limited to 32 GB.
The other choice is SdFat. The downside to SdFat is a more complex API. Many of its example use C++ templates or other advanced syntax that is great if you're a C++ expert, not so wonderful if you're not. But SdFat gives you a lot more control over how things work. It supports multi-sector reads with large buffers, which allows the card to achieve much better performance.
As with all Arduino libraries, once the lib is installed (Arduino SD is always present), click File > Examples > {library} to get started.
You don't need to worry about 4 bit SDIO mode. Both libraries use it automatically. Which library you use, and if you go with the more complex SdFat, which functions and what buffer sizes you use are the things which ultimately matter for performance.