USBHost_t36: allocate MIDI tx/rx buffers non-statically?

doctea

Member
Not sure I'm posting this in the right place, if there is somewhere better then please let me know!

I've been working on a project (on Teensy 4.1) for a while and am really starting to hit up against RAM/flash limitations, but I still want to squeeze some more code into the flash.

I've used a bunch of obvious tricks to gain back lots of memory so far, but using the Inspect Memory feature in platformio is now point me at the tx/rx buffers of the MIDIDevices as being a good potential way to free up some flash, since the buffers are currently allocated statically, given that each MIDIDevice_BigBuffer uses over 4KB for buffers and I allow for up to 10 USB MIDI devices to be connected to my Teensy.

I've attempted to move the buffer allocation into the MIDIDeviceBase constructor - a patch for this showing what I've done here: https://github.com/doctea/USBHost_t...SBHost_t36:maybe_fixed_4_stable_dynamic.patch

The change basically just removes the static declaration of the buffers and calls calloc() in the constructor instead. I tried first with malloc() but had the same problem.

Although this compiles and runs, it very quickly (within a few seconds of sending data) causes multiple USB MIDI devices to miss or receive garbled messages.

Is there something obvious I'm missing about this approach that might be causing this not to work?

(I've also had to make several modifications to the USBHost_t36 library to get MIDI clock to work, and also some other changes that I believe have improved stability running multiple USB MIDI devices. These changes are in the branch on my fork on github: https://github.com/doctea/USBHost_t36/tree/maybe_fixed_4_stable -- and I've left my app running and controlling multiple MIDI devices for over 12 hours with no problems using this branch.)

Thanks to anyone who may be able to cast some light on this or suggest another way to achieve it!
 
Midi - is not something I have done much with. (Read that more or less none at all)
So, Paul is better to answer most/all of this.

Some thoughts on this. In most cases we have avoided adding in dynamic memory calls like malloc, or ones that can use optional external memory, like extmem_malloc(), as there are many of the developers who want to have everything statically allocated, or at least the option for that.

So, in some cases, for some objects, there might be options for you to set your own buffer and if you do not, then the code could resort to calling the memory allocator. And there could be many ways to do that.

But assuming you have the option to choose where the memory comes from, there are some other possible implications on how things work depending on where the memory is located.

That is if the memory used is in the RAM1 area (https://www.pjrc.com/store/teensy41.html#memory)
It is tightly coupled and works great with things like DMA without having to worry about it.

If instead your allocations are in: RAM2 or PSRAM, then there are other things, that need to be taken into account.

DMA operations appear to work better if their buffers are
32 byte aligned.
That is why you will see in several other libraries that do DMA operations. Example ILI9488_t3,
when we allocate the frame buffer, we will ask for 32 extra bytes, then we need and then we will actually use the address returned rounded up to the next 32 byte boundary.

Cache and DMA
- Memory that is allocated out of RAM2 or PSRAM are slower than RAM1(TCM) - As such they use the hardware memory cache capabilities of the board. And the cache is not configured to do always write through to physical RAM. But instead, do it ls lazy, when necessary.

Why is this important? Because DMA operation work directly from the physical memory and know nothing of the cache.
So that is other code when you are about to do a DMA operation out from the Teensy, we explicitly call function to flush out the
cache to memory. with code that looks something like:
Code:
if ((uint32_t)_pfbtft >= 0x20200000u)  arm_dcache_flush(_pfbtft, CBALLOC);

Likewise, if you do something like a DMA operation that comes in to the TEENSY, then we also need to tell the Teensy to disregard what it has in it's cache for that memory locations, such that the other code that read in the data will get the updated values.
Where code might call arm_dcache_delete(ptr, count).

Note: These operations work on a 32 byte blocks of memory. So for example if you call that delete operation on a section of memory which overlaps some other variables not read in from DMA, you may have just tossed away the actual new values for those memory locations.
So in those cases where this is possible, you may want to call arm_dcache_flush_delete() instead, which is a bit slower, but it will write out what is in
the cache to the physical memory first before it deletes the cached new values.

The reason I mention all of this, is there may be code in MIDI that uses DMA or similar, where this might be hitting you

Hope that makes sense

Kurt
 
Hi Kurt, many thanks for your thorough and speedy reply! That certainly gives me something to think about, and makes clear why my naive approach to it may not be working! It seems likely to be quite a complicated problem to solve, so I'll take another look within my code and other libraries to see if I can make savings there, before I start messing with something so complex :).

Thanks again!
 
I have a related problem that you may have insight into:

I soldered 5 pins for the USB host cable onto my Teensy 4.1 and I plugged in a midi adapter - It requires power and when plugged into a computer lights up -
Nothing lit up!

Then I plugged a usbC cable - host cable to a synth.... Nothing

I am not sure how to check if midi is working.

Help ?
 
Back
Top