I agree, Frank's
memoryboard with six 23LC1024 chips is probably the best solution. Maybe 9 seconds is enough for "what did he say?" ?
The memoryboard approach would be extremely easy too. Just run the audio into the delayExt object. Then use a mixer to route either the live or 9 second delayed signal to the output. Write a tiny program to change the mixer gains depending on the button. Very simple.
Doing this with a SD card might also be possible, but it'll be much harder. I believe it still should be possible....
The weak link will be the Arduino SD library. Several months ago I started an ambitious ground-up redesign of the SD library, to be usable from the main program and also interrupts, and to cache several sectors. However, that work is still at an early stage of development, with the main limitation being a lack of write support. Someday in a glorious but distant future, this work will lead to a far more capable SD library. But for now, you'll have to make do with the SD library we have. Or maybe bypass the library and FAT32 to directly access the SD card as raw sectors.
Because of the SD limitations, you won't be able to access the SD card from within the audio library. You'll need to use a pair of the queue objects to bring the streaming data in and out of your Arduino sketch. The good news is the queue objects automatically queue up arriving data if you (occasionally) need more time, and you certainly will on the occasions with the SD card takes longer to complete a write. The play queue also allows you to push in more than one packet of data quickly.
The SD library is faster if you always read 512 bytes at a time, of course aligned to the 512 byte sector boundaries. You'll want to structure your code to work with 512 byte chunks of data. Each of those corresponds nicely to 2 audio buffers.
You'll probably want to pre-allocate a huge 4 gigabyte file on the SD card. Perhaps you can arrange for the file to be contiguously allocated, which might let you completely bypass the filesystem to access it as raw sectors. But even if you go through the filesystem layer, creating the huge file in advance will save all the overhead of allocating space and writing changes to the FAT sectors.
How to both read and write from the file is a good question. Again, bypassing the filesystem (if the file isn't fragmented) and just reading and writing the raw sectors might be easiest. Using the SD library file operations, perhaps you can open the file twice, with one "File" for reading and and another for writing. If the file's size changes, that'll probably get terribly messed up. But maybe it'll work if the file's size and allocation doesn't change?
Anyway, as far as the audio library is concerned, this would involve a pair of the queue objects. You'd need to move the data between those and the media, which might not be too difficult, but it does mean writing code to rapidly move a lot of bytes around. That's quite a bit more complicated than using delayExt with the memoryboard and a trivial amount of code to reconfigure mixer gains to switch between the live and 9 second delayed signal.