Thanks @defragster,
I had not posted that test case yet, but was pass one of some testing I have been doing.
It was also suggested that I go ahead and post some some stuff from email as well... Minus the email types stuff:
Short version: It would be great if some of these cache functions were described in more details somewhere. Maybe they are, but I did not
find anything using google... Probably the best one was:
https://www.nxp.com/docs/en/application-note/AN12042.pdf
<Warning start of rambling
>
As I mentioned earlier in this thread, I really don’t understand enough about how these function work:
Yes I can read the code like:
Code:
__attribute__((always_inline, unused))
static inline void arm_dcache_delete(void *addr, uint32_t size)
{
uint32_t location = (uint32_t)addr & 0xFFFFFFE0;
uint32_t end_addr = (uint32_t)addr + size;
asm volatile("": : :"memory");
asm("dsb");
do {
SCB_CACHE_DCIMVAC = location;
location += 32;
} while (location < end_addr);
asm("dsb");
asm("isb");
}
But I have no clue on what exactly SCB_CACHE_DCIMVAC = location does...
Yes I can read imxrt.h and see: #define SCB_CACHE_DCIMVAC (*(volatile uint32_t *)0xE000EF5C)
So I know it is writing the location out to some specific memory location, which somehow deletes that location and inference on how this function works 32 bytes from the cache.
And the code implies that it needs to align the address passed in to 32 byte boundary…
What happens if you pass in address that is not aligned to 32 bit address? Nothing? Those bits have special meaning to this function? Still clears that same 32 bytes? Only to end of that 32 byte page? 32 bytes from that location? …
How much of a time hit is there to do the flush and delete versus just delete. My guess is … it depends on which memory… DMAMEM or EXTMEM…
Also probably depends on if the memory actually is in the cache? Does not also depend on how many bytes have been written to within that 32 bytes?
I would assume that: doing a flush followed by a delete is slower than flush_delete
(SCB_CACHE_DCCMVAC = location; SCB_CACHE_DCIMVAC = location; ) versus (SCB_CACHE_DCCIMVAC = location;
Again using Frank’s example like: arm_dcache_delete(p2, 100);
We know that this will clear 4 pages with one or two of them not full 32 bytes. Now my assumption is that if he is asking to delete 100 bytes from cache he must be doing this for some DMA operation, so you need all 100 bytes cleared from the cache.
So on forum I posted one possible fix that simply says if either the address or the size are not multiples of 32 bytes it calls off to flush_delete… Using the KISS…
Question is would it be better to do a more complete change? Something like:
Code:
__attribute__((always_inline, unused))
static inline void arm_dcache_delete(void *addr, uint32_t size)
{
uint32_t location = (uint32_t)addr & 0xFFFFFFE0;
uint32_t end_addr = (uint32_t)addr + size;
asm volatile("": : :"memory");
asm("dsb");
if (location != (uint32_t)addr) SCB_CACHE_DCCMVAC = location; // make sure it is flushed first if unaligned
if (end_addr & 0x1f) SCB_CACHE_DCCMVAC = (end_addr & 0xFFFFFFE0); // make sure flush if end unaligned
do {
SCB_CACHE_DCIMVAC = location;
location += 32;
} while (location < end_addr);
asm("dsb");
asm("isb");
}
But this can probably be improved on, like this does possible flush and first and last pages if they are unaligned, but what if they are on same page?
Also maybe better to do flush/delete here, but then need to update start/end of loop code…
So question is, does adding this complexity speed up things over simply calling flush_delete? It may very likely depend on how big… That is if I want to read in the whole frame buffer for a 320x240*2 bytes display than probably. But for his 100 byte case, it might be curious.
In a follow up message, I extended the possible change to the function, to maybe something like:
Code:
__attribute__((always_inline, unused))
static inline void arm_dcache_delete(void *addr, uint32_t size)
{
uint32_t location = (uint32_t)addr & 0xFFFFFFE0;
uint32_t end_addr = (uint32_t)addr + size;
asm volatile("": : :"memory");
asm("dsb");
if (location != (uint32_t)addr) {
SCB_CACHE_DCCIMVAC = location; // make sure it is flushed and delete it
location += 32; // don't process this one again.
}
if (end_addr & 0x1f) {
end_addr &= 0xFFFFFFE0;
// see if there was a first write that it did not cover this one as well
if (end_addr > location) SCB_CACHE_DCCIMVAC = end_addr; // flush and delete it and decrement size back to not do this page again.
}
while (location < end_addr);
{
SCB_CACHE_DCIMVAC = location;
location += 32;
}
asm("dsb");
asm("isb");
}
Note: All of these updates were typed in on the fly so not compiled yet nor tested...
End rambling ��