First, I should mention SPIFIFO is very new and might change. If you use it now, be prepared for possibly incompatible changes over the next few months.
SPIFIFO has a begin() function that takes the SS pin number and the desired bitrate. The main thing you need to know is SPIFIFO automatically handles the SS pin, so you do not use digitalWrite on any pin like you would with the normal SPI library.
SPIFIFO is faster if you use any of the natively supported SS pins (shown in green on the Teensy 3.0 pinout card). SPIFIFO will automatically work with any pin for SS, but the non-native pins are slower.
There are 2 functions to transmit, write() and write16(). Both take one input, either 8 or 16 bits, and an optional SPI_CONTINUE input. The first write causes SS to assert. Every write within a single transaction, except the last one, must have SPI_CONTINUE. That's how it knows to keep the SS pin asserted (low). This SPI_CONTINUE mimics the special SPI
extensions for Arduino Due, except Due doesn't have a FIFO.
You use the read() function to receive data from previous writes. You MUST call read() for every write() or write16(). It will return either 8 or 16 bits, depending on whether you previously called write() or write16(). Never call read() without previously doing a write().
Never get more than 4 writes ahead, because the FIFO is only 4 words deep. Usually you'll start with 2 or 3 writes, then do interleaved read & write, until your last write without SPI_CONTINUE. Then after that last write, you'll do 2 or 3 reads to finish up.
The key difference between SPIFIFO.write() and normal SPI.transfer() is when it returns to your code. With SPI.transfer(), the entire byte is transmitted and whatever came in is returned, so anything you do is while the SPI is idle. With SPIFIFO.write(), control returns quickly to your code while the actual bits are probably still being pumped out. You must later call read() to get whatever came in. Both use inline functions for low overhead, but if you keep 2 or 3 writes ahead of reading, SPIFIFO really reduces the dead times between actual bits clocking in and out.
Even if you only need to transmit and you don't care what voltage was on the MISO pin while you were transmitting, you really do need to have a read() that corresponds to every write().
The write16() function lets you send 2 bytes at once, and the read() that corresponds will return the 2 bytes that came in during that 2 byte write. At slower than 12 Mbit/sec, this make very little difference, but at 24 Mbit/sec, it saves some overhead both in software and the SPI peripheral that really can speed things up.
When using write16() to transmit 2 bytes at once, the format is
MSB first. If you combine two bytes, shift up by 8 bits the one you want transmitted first. If your data is already stored in a 16 bit variable, but the byte to transmit first is in the lower half, you can use __builtin_bswap16() to efficiently swap the 2 bytes within the 16 bit variable. Likewise, when read() returns the 2 bytes that were on the DIN pin during that write16, the MSB will have the byte that arrived first.
SPIFIFO also has a clear() function that will wipe any previous stuff from the FIFOs. In theory, you shouldn't need it. But in practice, if you've ever forgotten to do a read() on an previous transaction (or some other code using SPIFIFO did that), the clear() function starts you out with a clean slate. Only call it when you're pretty sure nothing is still happening on the SPI port, like right before you begin communicating. Don't use clear() as a lazy way to avoid read(), especially immediately after doing several write() calls. If the data is still being pumped out, clear() might truncate it. Always try to exactly balance every write() or write16() with a read(), even if you ignore the incoming data.
So far, the only example is in w5100.cpp in the Ethenet library. Of course, looking at SPIFIFO.h might help too?
If you use it, please let me know how it works for you?