beermat
Well-known member
Has anyone had any experience with using TLFS for memory allocations, instead of smalloc? I have to declare up-front that I am not deeply knowledgable on this topic, but I've spent a day hacking around and in my experiments and specific use cases, I find it is significantly faster, without encountering any issues, though I'm only using it in a couple of use cases and haven't tried testing more broadly (or know how to). So I thought I'd sanity-check with the more experienced people on here, as it might be of interest, given the focus on SDRAM / PSRAM speed for some forum members. When I find numbers that are signifcant improvements, my first instinct is to not trust them, and ask experts
Basically, in a nutshell, I was trying out Lottie animations under LVGL, which uses a bundled version of ThorVG library to do the animations. They are nice but heap-hungry and I don't have a lot of spare heap, and the code base is too extensive, too complex and too 3rd party to try to modify. I discovered I could declare signatures for new, delete, malloc, etc. and divert them to use my own methods, so I thought I could use my PSRAM for heap, so I did that and implemented the same calls to smalloc that extmem_alloc, etc. call (bypassing extmem_* calls as they are thin wrappers that don't handle realloc/free with NULL pointers). That worked, but was visibly slow compared to standard heap access, even after I tweaked out the 'always zero the returned allocated memory for malloc', as per a different forum post. I then grabbed the TLFS source, and quickly implemented that instead, using an EXTMEM array (1024 * 1024) as the pool, and I found it to be significantly faster, dare I say on par with regular heap calls to DMAMEM?
Here are some timings I took - I run two Lottie animations, and the ThorVG software bangs away at the heap like it's going out of fashion, lots of new/delete/malloc/calloc/free calls per second, so I wrapped each call with a timer and dumped the deltas per second to the console. For this test, I ran EXTMEM at 132.9 MHz, although it seems I can push that to 198 Mhz without issue....but anyway, here are representative single second snapshots:
(Update: I ran the CPU at 816 Mhz for these tests)
smalloc: 133 Mhz:
malloc - count: 4063, time: 295937, avg time: 72.84µS
calloc - count: 468, time: 63450, avg time: 135.58µS
realloc - count: 2271, time: 324345, avg time: 142.82µS
free - count: 7639, time: 8247, avg time: 1.08µS
tlfs: 133 Mhz:
malloc - count: 9538, time: 7523, avg time: 0.79µS
calloc - count: 1044, time: 3988, avg time: 3.82µS
realloc - count: 5328, time: 17876, avg time: 3.36µS
free - count: 18388, time: 13006, avg time: 0.71µS
As you can see, the cumulative time in the snapshot second for smalloc was ~692 milliseconds, vs tlfs which was ~42 milliseconds! I can see other libs I have running funnelling through this, such as NativeEthernet for TLS calls, without issue, so I am wondering - is this something worth exploring by me / the crowd, because if it's implementable and I'm not missing something here, it's a significant speed up.
Basically, in a nutshell, I was trying out Lottie animations under LVGL, which uses a bundled version of ThorVG library to do the animations. They are nice but heap-hungry and I don't have a lot of spare heap, and the code base is too extensive, too complex and too 3rd party to try to modify. I discovered I could declare signatures for new, delete, malloc, etc. and divert them to use my own methods, so I thought I could use my PSRAM for heap, so I did that and implemented the same calls to smalloc that extmem_alloc, etc. call (bypassing extmem_* calls as they are thin wrappers that don't handle realloc/free with NULL pointers). That worked, but was visibly slow compared to standard heap access, even after I tweaked out the 'always zero the returned allocated memory for malloc', as per a different forum post. I then grabbed the TLFS source, and quickly implemented that instead, using an EXTMEM array (1024 * 1024) as the pool, and I found it to be significantly faster, dare I say on par with regular heap calls to DMAMEM?
Here are some timings I took - I run two Lottie animations, and the ThorVG software bangs away at the heap like it's going out of fashion, lots of new/delete/malloc/calloc/free calls per second, so I wrapped each call with a timer and dumped the deltas per second to the console. For this test, I ran EXTMEM at 132.9 MHz, although it seems I can push that to 198 Mhz without issue....but anyway, here are representative single second snapshots:
(Update: I ran the CPU at 816 Mhz for these tests)
smalloc: 133 Mhz:
malloc - count: 4063, time: 295937, avg time: 72.84µS
calloc - count: 468, time: 63450, avg time: 135.58µS
realloc - count: 2271, time: 324345, avg time: 142.82µS
free - count: 7639, time: 8247, avg time: 1.08µS
tlfs: 133 Mhz:
malloc - count: 9538, time: 7523, avg time: 0.79µS
calloc - count: 1044, time: 3988, avg time: 3.82µS
realloc - count: 5328, time: 17876, avg time: 3.36µS
free - count: 18388, time: 13006, avg time: 0.71µS
As you can see, the cumulative time in the snapshot second for smalloc was ~692 milliseconds, vs tlfs which was ~42 milliseconds! I can see other libs I have running funnelling through this, such as NativeEthernet for TLS calls, without issue, so I am wondering - is this something worth exploring by me / the crowd, because if it's implementable and I'm not missing something here, it's a significant speed up.
Last edited: