96 MHz works great on Teensy 3.1. Teensyduino uses 96 MHz as the default speed. It's proven to be very stable.
Many people who've tried the higher overclocking options (by editing board.txt) report 120 MHz is also very stable. 144 MHz has been reported to experience rare crashes, even though it seems to work much of the time. These results aren't terribly surprising when you consider Freescale makes many other Kinetis microcontrollers using the exact same 90 nm silicon process, which are specified at 100 and 120 MHz.
Normally we talk of the "system clock", which is the speed of the ARM Cortex-M4 processor, system bus, RAM and DMA engine.
When configuring the chip, there are 3 clocks you must choose, all of which must be integer division of the PLL, which of course you also configure using a multiplier and divider from the 16 MHz crystal. Normally the system clock is the same as the PLL (divide by 1). The other 2 clocks are called the "Bus clock" and "Flash clock", and they normally run slower.
The Bus clock doesn't actually control the speed of the main memory bus. Why Freescale chose that name is a mystery to me. It really controls the speed of most peripherals. Freescale specifies the bus clock at 50 MHz maximum. Teensyduino sets it to 48 MHz when the processor runs at 96 MHz. I've done much less experimenting with overclocking the peripherals, partly because the integer division doesn't allow fine grained control, and partly because 48 MHz gives very good peripheral performance. But every indication points to the bus clock having quite a lot of margin for overclocking.
The Flash clock is the 3rd clock you must configure. Freescale specifies it at 25 MHz max. Teensyduino configures it at 24 MHz. Experimentation has shown the Flash clock has very little extra margin for overclocking. Teensy 3.1 seems to run well with the flash at 28 MHz. 33 MHz almost always crashes! 36 MHz can't possibly work.
24 MHz might seem very slow, but nearly all the flash-based microcontrollers on the market use similar speeds for their flash memory. Flash memory has a fundamental trade-off between speed versus density. Atmel, NXP and ST all uses similar clock divide ratios for their flash. Often this is called "wait states" or other terms, and some of them bury this info deep within the details of their flash memory chapter of their datasheets, probably because they don't want to give the impression their chips are slow. But the inescapable truth is flash memory is always much slower in all these single-chip ARM microcontrollers.
To allow code to run fast, they all use wide buses that feed buffers and small cache memory. Inside Teensy 3.1, the flash memory uses a 64 bit bus. So each flash memory fetch at 24 MHz is reading 8 bytes into the caches or one of several buffers, and the flash cache is designed to speculatively read ahead (assuming no branches) during cache hits. Of course, cache misses do cause the processor to wait, so it's far from perfect. That's why we have the "FASTRUN" keyword, to put speed critical code into the single-cycle RAM.
FWIW, the NAND flash memory chips inside extremely fast solid state disc drives use a similar scheme. An entire row (1 kbyte or more) is read into a RAM buffer inside the chip, with an access time of approx 1 us. You don't seem Samsung or Micron calling their chips 1 MHz. Then accesses within the buffer can be made very quickly. They do love to quote those speeds! The net speed is only fast because the flash has a slow but extremely wide bus into a fast buffer. Fast SSD use many chips in parallel, feeding a large RAM-based cache... all because the actual physical flash memory is fundamentally not very fast. Despite the marketing and specsmanship games, the fundamental nature of flash memory is high cell density and large total memory size comes at a cost of relatively slow access times.
During the nearly 2 years we've made Teensy 3.1, Freescale has shipped 3 different silicon revisions. The markings on the chip have been "1N36B", "2N36B" and "3N36B". I don't know when they released each one. I don't have any inside info from Freescale about the differences. I can tell you, even now, the chips we're receiving are a mix of 2N36B and 3N36B. I haven't rigorously tested each version, and I have only a small number of boards with the oldest 1N36B version. But so far, every indication is all 3 versions work about the same for overclocking. None of them allows the flash memory to run significantly faster.