I just spent some time
reading about SPIFFS. This is exactly the type of
complexity I do NOT want.
For the audio library, we need files to be stored as continuous data. When an audio library object accesses a file on the flash, it will need only 2 numbers: the beginning address and the allocated length. As an object plays samples, or reads a wave table for synthesis, or pulls in more of an MP3 stream, we NEVER want to incur extra overhead of reading index sectors or other metadata, like SPIFFS requires.
When 128 samples need to be read from the flash, the most important goal is keeping our flash access to only those 256 bytes, for consistent and predictable timing. Remember, this will be embedded within some object like a sample player. Users will drag any number of them onto the GUI and click Export to get a sketch that might try to read 12 simultaneous streams. In such a case, 12 instances will end up pulling a total of 3072 bytes from the flash every 2.9 ms, which happens to be about 1/3rd of the total SPI bandwidth, not including slight protocol overhead required by the SPI flash chip.
Continuous, non-fragmented file allocation is essential. Every file's storage must be described by only 2 numbers, beginning address and total data length.
Tools that format the entire flash and decide what space each file gets can have complexity. Most files can be allocated and packed byte-wise, and others meant for recording or writing can be allocated to 4k sector boundaries. The allocation table can provide plenty of metadata about the file, though data type and a text name are probably all we need.
However, being able to quickly locate a file, with a minimum number of SPI transfer bytes, is valuable. An expected use case will be the main program searching for the file based on its text filename, while the library is accessing many other files. One or more SPI transactions will delay the audio update interrupt, so we want to minimize the time we're preventing the audio library from updating.
I fear this conversation has wandered pretty far from focusing on the real-time requirements of the audio library. Indeed there are lots of great general purpose file systems, like SPIFFS, that would be wonderful if the goal was file storage for general purpose computing.
Please remember, the goal here is CD quality audio, possibly a dozen or more polyphonic streams, on a low power microcontroller.