SoundFont Decoder & File Size

jbjb

Member
I've been using the wavetable synthesis library on a Teensy 4.1 and I'm trying to get public domain soundfonts to fit on to the teensy's PROGMEM. I've been doing some research and trying to understand the library and the decoder, but I was wondering if anyone could offer any insights to the questions below.

Why does the a decoded soundfont expand in size so drastically between its 'sf2' version? For reference, I am looking at this rhodes soundfont (https://www.polyphone-soundfonts.com/documents/10-pianos/116-j-rhodes), who's samples are 44,1khz @ 16bits. Untouched, its ~13.2mb, and when run through the decoder, the file size is about 4x as large @ 65mb for the .cpp portion.

Why are samples converted into arrays of uint_32 ? Would it not make more sense to use a 16bit data type?

And finally, is it possibly to use expanded 4.1 flash memory to store the soundfont w/o having to modify the playback code?


Thanks!
 
Still no answer??
Ok I will try
Using 32bit is surely because the cpu is 32bit
and will then optimize the reads from progmem (better fitting in the cache?)
Also using 16bit access on a 32bit cpu slow things down.

About the size
For every 4 raw bytes
You will instead get
Code:
0x00000000,
which will take 11bytes
Which is (11/4=2.75) times bigger

Also if you where to use 16bit
Code:
0x0000,
That would be (7/2=3.5) times bigger


Don't really know why it's 4 times bigger
As in theory it should only be around 2.75x

The last question I don't really know how you mean?
* Playback of sf2 file from flash FS ?
 
I did also forget to say don't get fooled by the cpp size.
Think the convert tool do have flash usage, that will show how much space that particular soundfont takes, also notice that the current soundfont decoder is outdated and would not include the teensy 4.1 also it do not use progmem to make sure that the samples stay in progmem.
 
Thanks Manicksan! This is very helpful.

You are correct the 4x size was strange, so I dug in and noticed that if a sample is re-used for multiple key-ranges inside of the soundfont, the program decodes the sample for each key range, rather than re-use the sample. If I find the time to solve this issue, I will share my updated code.

---

I'm actually using your updated soundfont decoder for the Teensy 4 / 4.1 (https://github.com/manicken/SoundFontDecoder), and I realized my copy was quite stale. I re-ran the decoder after syncing to your repo and here's what I found the various file sizes to be.

Soundfont file:
6.8mb
"Controller" from the decoder size estimate:
5147kb

Decoded cpp
14.4mb
Compiled Info:

FLASH: code:45376, data:5160840, headers:9012 free for files:2911236
RAM1: variables:25280, code:42184, padding:23352 free for local variables:433472
RAM2: variables:44096 free for malloc/new:480192



This is surprising! You are correct, thats an incredible difference when compiled! And much closer than I anticipated to the estimated file size in the decoder.

--
And as to my final question, I was wondering if adding on extra flash memory (https://www.pjrc.com/store/psram.html // https://protosupplies.com/product/w25n02g-256mb-2g-bit-serial-flash-for-teensy/ )) could allow for additional storage for a soundfont.
 
So I am encountering an issue that is driving me nuts!!

I'm working with some soundfonts, trying to reduce their size to fit on the teensy, and the decoder is consistently hanging and massively exploding the file size. Still using https://github.com/manicken/SoundFontDecoder



- Soundfont (sf2) File Size: 8.2mb
- Run controller.py; select soundfont;
- One instrument, 30 samples, estimated at 133MB!!!!

any idea what could be going on here?

I tried this fork as well, but it estimates a ~8.2mb file, creates a cpp @ 14mb, and i get a memory overflow error of a few bytes... I know file size does not necessarily equal compiled size but I'm generally confused by the whole process. (https://github.com/netherwaves/Wavetable-Synthesis)

If anyone has any further insight into the Wavetable decoder and the ultimate file size for a soundfont, it would be greatly appreciated!!
 
ok quick follow up:

in decoder.py, samples can be decoded using their duration, 'end' or using the corresponding "bag's" "cooked_loop_end"; for whatever reason, there are some big discrepancies between these in my soundfont, causing some samples to have a crazy amount of "0's" added to them by the decoder.

after modifying Manicken's Sound Font Decoder to use the following calculation, the exported file size is in line with the other "forked" version above...
Code:
length_16 = bags[i].sample.duration
and that's it... make no reference to the 'cooked_loop_end' and 'sample.end' and it seems to work fine.

given that the 4.1 has ~8mb of flash, I guess it makes sense my '8.2mb' sound font just barely doesn't fit.

So my question is:


What is the difference between looking at "bag.sample", "instrument.sample", and "soundfont.sample"? When might a 'cooked_loop' be greater than a samples duration?
 
I will keep replying to myself as I learn more;

http://www.synthfont.com/SFSPEC21.PDF the soundfont spec if helpful in understanding the root of the terminology in wavetable synthesis, and sf2utils. ("
bag - A SoundFont data structure element containing a list of preset zones or instrument zones")

I think the core issue with the decoder is that it loops through the 'bags' aka the instrument zones for each instrument, with no check against the 'samples' for each instrument. So this can lead to a duplication of samples, causing increased memory usage.

Another is to ensure the use of loop_end/loop_start, or sample.duration as sample.end/start appear to offset as you move through the bags?
 
Good work

Another question that I have to the people in this forum
Is it possible to map a external(additional) flash memory into the address space and in such way make it possible for bigger sound fonts
just like it's possible with PSRAM
 
Thanks!
With PSRAM - I didn’t figure it out but thought this may be possible with this memfile library https://github.com/FrankBoesing/MemFile but welcome any other ideas

I also have a few more notes in regards to the decoder:

Key ranges - they seem to only support one sample per “key range”; it is not possible to have overlapping ranges. Is this true? If so, seems like a good feature to fix.
 
Back
Top