I found this last night but have not used it.
Using the flash memory means having about 3 orders of magnitude less memory for sounds i.e. 1000x.
BTW, if you just want something to play fixed sounds right now, there is always the Dfrobot dfplayer (and the various clones):
- Mono dfplayer with SD card: https://www.dfrobot.com/product-1121.html
- Mono dfplayer with 8M flash: https://www.dfrobot.com/product-1741.html
- Stereo dfplayer with 128M flash: https://www.dfrobot.com/product-2232.html, note no busy pin
Obviously, we do need to make the system robust enough to handle using the full power of the audio system and do eyes at the same time. But if you wanted something immediately, there are options.