Does memcpy() flush and delete caches?

mborgerson

Well-known member
I've been working with the PXP and CSI to collect and process camera images. Recently, I was writing a test pattern into a DMAMEM or EXTMEM buffer, then processing it with the PXP. It seems necessary to use an arm_dcache_flush_delete() before processing the buffer with the PXP. Otherwise, data written to the buffer may not be properly read by the PXP. Will I have the same issue if I transfer the test pattern to the buffer with memcpy()?

My question is this: Does memcpy test its source and destination addresses to see if they fall outside the DTCM on T4.X systems and manage the caches if necessary? Or is this cache management the responsibility of the sketch programmer?
 
No, memcpy() does not flush cache. That's your responsibility, which usually is only needed if using DMA or creating specific memory test programs.
 
No, memcpy() does not flush cache. That's your responsibility, which usually is only needed if using DMA or creating specific memory test programs.

I get this, and I understand that you might not want to add tests for memory source and destination regions to all memcpy() calls, as the function has been written in assembly for maximum performance.

However, there are times with some peripheral devices where it is not immediately evident that DMA is happening under the hood. The CSI and PXP come to mind.
I note that the usb_serial code in the Teensy4 cores library calls arm_dcache_flush_delete() quite often. I guess I'll have do the same with CSI and PXP code.

I see that the usb_serial code doesn't bother to check the memory location before calling arm_dcache_flush_delete(). Does that mean that the overhead for calling the cache management function is lower than the time taken to test the memory address and skip the call if not needed? Or perhaps the call is always needed as the serial buffers are in DMAMEM. (If so, were cache management calls made when the data was transferred to those buffers?)
 
I get this, and I understand that you might not want to add tests for memory source and destination regions to all memcpy() calls, as the function has been written in assembly for maximum performance.

memcpy() is widely used by non-driver libraries and programs. Killing the huge benefits of caching for all normal uses would have a terrible performance hit.


I note that the usb_serial code in the Teensy4 cores library calls arm_dcache_flush_delete() quite often. I guess I'll have do the same with CSI and PXP code.

Yes, exactly.
 
memcpy() is widely used by non-driver libraries and programs. Killing the huge benefits of caching for all normal uses would have a terrible performance hit.
AHA! If you trash the cache on every reference to EXTMEM, you lose the performance benefit you would get for every transfer from within the cache. This shows why cache management requires careful analysis. Restricting arm_dcache_flush_delete() to just those situations where you know there will be subsequent DMA transfers sounds like a good policy. I can work with that. Knowing WHY I am doing this lowers my stress level.
 
Back
Top