SPI Flash Allocation Table

PaulStoffregen

Well-known member
I would like to define a simple allocation table for the optional SPI Flash that can be used with the audio library. This thread is meant to discuss ideas for how this table could work.

My main goal is for the table to express at least 4 fields, for each section of the flash memory containing data:

  1. Beginning address
  2. Length of data
  3. Type of data
  4. File name (string)

The address and length should probably be 32 bit numbers.

Perhaps the type can be a 1 byte code? Or maybe 3 bytes corresponding to file name extensions would be more convenient?

How many bytes to allow in the filename is also a good question...

Maybe the 4096 bytes should be divided up into 16 or 32 byte sections, or quicker and easier access? Or maybe info should just be packed, requiring a linear parsing of all 4096 bytes to find anything at the end? Or maybe some format where a filename entry could span multiple sections would work?

As we develop more ways to read different types of data from the SPI flash (raw audio, mp3 encoded streams, special wavetable synthesis formats, etc), and later when GUI-based tools are developed to manage the memory, a well defined allocation table will make using the library much easier.
 
i'd vote for 16 byte sections / 8 character file names (for the sake of consistency) / 1 byte code for the data type (rather than 3)

at least, that should be enough to store those 4 items. in practice, i'd imagine, most projects will just parse that table during start up, putting the start addresses + corresponding lengths of applicable entries into an array; and they may read back the file- or wavetable name for displaying it on a character display or oled etc (if present).
 
Hi,
a SPI Flash Allocation Table is a good idea.

In my opinon,we need no filename - the data are written to the flash by some tool, so the user knows what is stored.
The filetype (3 Bytes?) is sufficiant (with only one byte we would need something like a translation table)
Then, one of the fields "Beginning" / "Length" is not needed.
- If we know the start address, we know the length with address_of_next_file - address
- If we know the length, we know where the next file begins

1. length (24 Bit (or 32?))
2. filetype (1 or 3 byte)
3. the filecontent

4. next header...

EDIT: If a user needs "random access" to the files, he can create his own table in a ram-array with a simple loop, in setup()
EDIT: Or, in "EEPROM"
 
Last edited:
good point about the start address/length thing. factoring in the W25Q256FV, length should probably be 32 bit.


we need no filename - the data are written to the flash by some tool, so the user knows what is stored.

true, but that is assuming (probably correctly) that in most use cases that tool would be run during set-up so the file names are indeed known to the user (or the user program, for that matter). generally speaking, i'd figure the main/only reason you'd want the filenames (by whatever means) is for displaying them in some UI (ie displaying "snare" or "chirps" or whatever rather than displaying something like "file nr. 4"), not for accessing the data. anyways, it's probably not mandatory in a/the allocation table, but then we can't always be sure how the data ended up on the flash?

The filetype (3 Bytes?) is sufficient (with only one byte we would need something like a translation table)

my reasoning in voting for the translation table was that there's probably going to be some arbitrariness involved anyways. for example, the hypothetical expanded wavetable format. assuming the API at one point will support more complex / band-limited tables (say 10 bit ones covering 10 octaves, ie 512/256/128 ... ), there's no official/obvious file type for this kind of thing. (one might opt for .wav, but that would render the "type of data" entry largely superfluous).
 
Ok..
How about a optional filename?
We need 1 BIT: 1=Filename existent, 0= no filename
Then, the filename can be in front of the data, with a trailing #0. As long as its needs to be.

Edit:
maybe we should reserve some bits for future extensions ? Metadadata (Samplerate for sound, bitmap dimensions...)?
 
Last edited:
Ok..
How about a optional filename?
We need 1 BIT: 1=Filename existent, 0= no filename
Then, the filename can be in front of the data, with a trailing #0. As long as its needs to be.

Edit:
maybe we should reserve some bits for future extensions ? Metadadata (Samplerate for sound, bitmap dimensions...)?

mmh, i know to little about what would/will be required for properly parsing the various formats but i'm still unclear about what purpose having the actual file extension would serve. ie wouldn't a/the arbitrary 1-byte mapping be mostly hidden from the user anyways? ie be dealt with by the play/update/consume functions of the respective classes? (eg. as in your Serial flash class? ie 0x01 = u-law 44100, 0x02 = u-law 22050, etc).

i guess i'm saying that for a function called (say) playFlashWav() it doesn't really matter whether it checks for 0x77/0x61/0x76 or some arbitrary byte. at least, it doesn't seem to make it easier for the average user. 256 options should cover some ground.

as an example, even now with the SD wav classes I think most use scenarios won't ever include a line such as
playWav.play("niceFile.wav"), but a pointer to the file name, and a list of names generated during start-up, because you don't always want to change the code when you swap the files on the SD card or stick to some naming convention to match the code. (as an aside, i actually find the way it's done in your mp3 lib quite useful, ie play(const size_t p, const size_t size), which allows for more control.)
 
Ok..
How about a optional filename?
We need 1 BIT: 1=Filename existent, 0= no filename
Then, the filename can be in front of the data, with a trailing #0. As long as its needs to be.

Edit:
maybe we should reserve some bits for future extensions ? Metadadata (Samplerate for sound, bitmap dimensions...)?

or no flag bit. for no name, just use one null byte where the optional file name goes.
 
I like the idea to make filenames optional.

Starting address should be 32 bits. Today chips larger than 16 MByte exist. I wonder whether a 4 GByte limit is something we'll eventually regret?

Maybe we could omit any sort of length field? That would push the requirement for indicating the length to the stored data. Can we live with that? Or would that make storing formats like MP3 more difficult than necessary?
 
Aac and mp3 need a length-information, because they are streamable and there is no "End-Of-Stream" indication.
The other audio-formats used by the codecs need no length, they have it internally stored, somewhere.
 
Last edited:
Well, it seems we need at least 9 bytes: 32 bits for address, 32 bits for length, and at least 1 byte for data type. Maybe 1 to 3 mores reserved bytes might be good, for some future unseen need? That's put the section size between 9 to 12 bytes. I'm really not concerned about making the section size a power of 2, but a multiple of 4 might make good sense.

For the data type byte, 0xFF and 0x00 should both mean "unused".

I can see at least 2 ways we might store the optional filenames. The filename could be packed into 1 or more sections preceding the section with the file type/address/length, where a specific type byte would indicate all the other bytes are part of the filename for the next file.

The other alternative would be writing the filename somewhere into the rest of the flash. 4 bytes could give the starting address for the string, which would be zero terminated. If no filename is assigned, the value 0 would be stored. With a few reserved bytes, this would probably push us to 16 bytes per file.
 
I'm involved with a project using 512MB NAND flash chips. And one with two such dies stacked inside one SMD package to yield 1GByte per chip.
Surprisingly inexpensive.
The parallel interface ones use an 8 bit address/data interface with Address Latch bit, etc.
The SPI version costs more than double, but needs fewer pins.
Spansion is a good vendor for these. One example:
www.spansion.com/Support/Datasheets/S34ML01G1_04G1.pdf

Some ARM chips have a GPIO mode for the parallel interface, with the ALE, RW, data, etc., all handled by port hardware. And one I'm working with has in their HAL library a driver that implements the complexities above I/O such as wear-leveling, etc.
 
Last edited:
There's a huge difference between NAND and NOR flash. With NOR flash chips, every bit is dependable, and every location in the entire chip can be randomly accessed with basically the same timing.

NAND flash provides incredible capacity, but at the cost of requiring ECC and wear leveling, which add considerable overhead. NAND flash doesn't have random access either. Once a row is transferred to the register, each byte from that row is 25 ns access time. But the time to fetch a different row can be up to 25 us (the "tR" spec on page 41).

With hardware ECC and quite a bit of RAM to cache data and defer wear leveling and other overhead to non-critical timing, NAND can be pretty incredible. But the same can be said for SD cards, which are basically NAND flash with a controller. Someday I'm going to put a lot of work into really improving the Arduino SD library, with a good caching layer and delayed write completing and lots of other performance improving ideas.

NOR flash is so much simpler, with true fast random access. That's why I put the option for a NOR flash chip on the bottom side of the audio board. When we're playing several sound clips simultaneously, fast seeking to any arbitrary within the entire memory is what matters most.
 
Yeah, these big Spansion chips have ECC (1 bit) and wear leveling and caches in the drivers for HAL.
There's also, in the HAL, a FAT32 file system that can use the big flash.

At some point, the 200KB of RAM, or so, in low end embedded chips might be inadequate.

I suppose the flash thumb drives, at their high capacity, use NAND flash. Not sure. I've not seen them used in small RAM MCUs, but perhaps there is a way.
 
Ok, here's what I'd like to see for the allocation table. This is still in the "request for comments" stage....

First 4096 bytes are reserved for the allocation table, with 16 bytes per file. This limits us to storing no more than 256 files in the SPI flash chip.

The 16 bytes are:

  • Byte 0: Data Type
  • Bytes 1-3: reserved (zeros)
  • Bytes 4-7: Address of data
  • Bytes 8-11: Length of data
  • Bytes 12-15: Address of filename, or zero if no name
The addresses are byte addresses within the flash, where 4096 is the first valid address, since 0 to 4095 are reserved for this table. Lengths are in bytes. All are stored LSB first.

If a filename is used, it must be stored in the normal flash memory (not the first 4096 bytes) as UTF8 text with a zero terminator. The length is determined by reading the bytes until zero is found.

For data types:

  • 0: unused
  • 1: audio clip in wav2raw or wav2sketch format
  • 2: raw 16 bit audio samples
  • 3: MP3 stream
  • 255: unused

Obviously, more data type will be added to this list as needed.
 
Last edited:
  • Byte 0: Data Type
  • Bytes 1-3: reserved (zeros)
  • Bytes 4-7: Address of data
  • Bytes 8-11: Length of data
  • Bytes 12-15: Address of filename, or zero if no name

just a thought: assuming there's going to be restrictions regarding the length of the filename, might one not expand the reserved bytes to 6 bytes (for potential future uses) if the address of the file name (if present) always was address_of_data - max_name_length ? so there'd be just one byte: zero = w/o name; non-zero = w/ name. (ie your option #1 above)

edit: or, as per stevech's suggestion, why no just reserve the 8 or 16 bytes preceding the data-address for a name and leave out the filename-byte?
 
Last edited:
0: unused
1: audio clip in wav2raw or wav2sketch format
2: raw 16 bit audio samples
3: MP3 stream
4: AAC
5: FLAC
6: WAV
7: more Audio...
16 : Bitmap...
32 : Fonts..

255: unused


We don't need such a table (and the need of documentation and support) with two bytes more and the "official" extension...

The Data: It is Flash-Sector-aligned ? When yes: Why ? :)
Would make sense if it is possible to fill free flashsectors later. Do we need a "sector-allocation-table" ?
Wear-levelling ?
 
Last edited:
..... address of the file name (if present) always was address_of_data - max_name_length ? so there'd be just one byte: zero = w/o name; non-zero = w/ name. (ie your option #1 above)

edit: or, as per stevech's suggestion, why no just reserve the 8 or 16 bytes preceding the data-address for a name and leave out the filename-byte?

Yes, we could choose to store the filename in the preceding slot in the allocation table, or in the main flash appended or prepended to the actual file data. So many choices!

Being able to easily ignore the filename could be nice.


We don't need such a table (and the need of documentation and support) with two bytes more and the "official" extension...

I'm happy to use 3 bytes instead of 1.

While that leaves us only a single reserved byte for future use, we could define that byte now to mean non-zero means the entire 16 byte slot is ignored. Then all 15 other bytes become available for some future extension.

The Data: It is Flash-Sector-aligned ? When yes: Why ? :)

This scheme allows byte-level addressing. Whatever tool writes the flash could limit itself to placing things on sector boundaries. But if it were to encounter an already-written flash from some other tool, the existing addresses might not be sector aligned.

Would make sense if it is possible to fill free flashsectors later. Do we need a "sector-allocation-table" ?
Wear-levelling ?

Those would be some pretty awesome features. Obviously this simple allocation table is more like a partition map than an actual filesystem. When/if such a filesystem is written, perhaps it could create one or more special files in the allocation table, to reserve the space it needs to manage which sectors are free, and persistent storage for wear leveling, and whatever else it needs? Or maybe it could use the 16 byte slots, with some non-zero value in the reserved byte?

While I certainly want to keep the door open to creating such an advanced filesystem, my main goal for this simple table is merely describing the contents of the flash. I intend to create a GUI front-end (and supporting program to run on the Teensy) which lets people reflash the chip with a set of files, including "empty" files to use as placeholders for fixed-length recording. I want to tool to be able to read the current contents, so it can preserve files when re-writing the whole flash, and to allow bringing recorded data back to the PC.
 
Yes, we could choose to store the filename in the preceding slot in the allocation table, or in the main flash appended or prepended to the actual file data. So many choices!

Being able to easily ignore the filename could be nice.

oh, sorry for being unclear -- i did mean the name might go into 8 or 16 byte before the actual data/in the main flash; then either name or data address would be redundant. but that indeed wouldn't work out without at least one byte for the file name. e.g.

Byte 0: Data Type
Byte 1: File name: yes/no
Bytes 2-7: reserved (zeros)
Bytes 8-11: Address of data
Bytes 12-15: Length of data

We don't need such a table (and the need of documentation and support) with two bytes more and the "official" extension...

not a big deal, i guess, but i still don't see the point of extensions. "wav" at the end of the day is entirely arbitrary, too. and the mere existence of an allocation table implies documentation and support. to reiterate me point above, approaching this in terms of "official" extensions seems (to me at any rate) to prioritise certain applications (namely those that come with 'official' extensions): 'streaming' flac, ogg, mp3, etc . that is nice but at least in a music context, they're rarely encountered. wavetables on the other hand, while there are some proprietary formats (e.g. .bvc, .wt, .wtx), generally don't fit into this picture. (then again, they might be identified by some arbitrary 3 byte code just as well)
 
not a big deal, i guess, but i still don't see the point of extensions.

I think the main idea is avoiding the maintenance of a special list of bytes for various data types.

Speaking as the guy who'd ultimately be maintaining the official list, there is some appeal to this idea. ;)
 
I think the main idea is avoiding the maintenance of a special list of bytes for various data types.

Speaking as the guy who'd ultimately be maintaining the official list, there is some appeal to this idea. ;)

fair enough ...

(even though, realistically speaking, i don't think a lot of people are ever going to write their own mp3/wav/aac/etc parsers, so ultimately that list might not be so very important. most audio stuff at any rate will work with/on with uncompressed data, no?

that said, i guess i might just be unclear as to the purpose of the allocation table, or how dumb or smart the libraries working with the SPI flash are supposed to be. taking ".wav" as an example: that's clearly ambiguous re sample and bit rate, mono or stereo etc. so i take the idea is to store the entire header (when most of the parsing could be done by the tool that's writes the data to the flash)? or just the data and a single byte prefix (as Frank's current RAW/flash library)?)
 
Would it be useful to have the items stored in SPI flash use a linked list, and thus have the metadata stored at/with each list member? Each link might contain the next item offset and the size of the prepended metadata.
This might also reduce the challenges in wear-leveling vs. a single-place for all metadata
 
Hi,
a SPI Flash Allocation Table is a good idea.

In my opinon,we need no filename - the data are written to the flash by some tool, so the user knows what is stored.
The filetype (3 Bytes?) is sufficiant (with only one byte we would need something like a translation table)
Then, one of the fields "Beginning" / "Length" is not needed.
- If we know the start address, we know the length with address_of_next_file - address
- If we know the length, we know where the next file begins

1. length (24 Bit (or 32?))
2. filetype (1 or 3 byte)
3. the filecontent

4. next header...

EDIT: If a user needs "random access" to the files, he can create his own table in a ram-array with a simple loop, in setup()
EDIT: Or, in "EEPROM"

Would it be useful to have the items stored in SPI flash use a linked list, and thus have the metadata stored at/with each list member? Each link might contain the next item offset and the size of the prepended metadata.
This might also reduce the challenges in wear-leveling vs. a single-place for all metadata

Hello, this is my first post.

I think Frank and stevech have the right idea with interleaving the content of the Flash allocation table with the file content.

0: address of entry for second file (32 bits)
4: data type of first file (1 byte)
5: zero-terminated filename for first file, with or without a dot and extension (just a zero means no filename)
6+length of filename: data in first file

address of entry for second file+0: address of entry for third file (32 bits)
address of entry for second file+3: data type of second file (1 byte)
address of entry for second file+5: zero-terminated filename for second file, with or without a dot and extension (just a zero means no filename)
address of entry for second file+6+length of filename: data in second file

address of entry for third file+0: etc.

etc.

address of entry for last file+0: address of entry for endstop (32 bits)
address of entry for last file+3: data type of last file (1 byte)
address of entry for last file+5: zero-terminated filename for last file, with or without a dot and extension (just a zero means no filename)
address of entry for last file+6+length of filename: data in last file

address of entry for endstop+0: zero (32 bits)

  • Scanning the Flash allocation table to find a file would not be much more difficult than with the table all in one block.
  • The data length in bytes would be (address of entry for next file) - (address of entry for this file) - (length of filename for this file) - 6.
  • Overhead for Flash allocation table is only 6 bytes plus filename for each file plus 4 byte endstop.

Paul
 
Last edited:
I like the idea about for this lookup table. I would advice against using a static-length allocation table. In my opinion it should be prevented by an integer at the front specifying the length of the allocation table or by using the linked list approach. This makes the system more flexible, both for larger and smaller storage devices.

We are now talking about storage of files, but I'm certain someone (or is it just me?) will use this unified system just to store some configuration parameters, allowing a flexible amount of entries (and overhead) would make this easier. Needless to say, it would be great if the software is independent of the hardware used.

Additionally, I would put a version number (for the future...) and magic value at the start to be able to identify that the medium is formatted as 'TeensyFS' ;)
 
Last edited:
Back
Top