BUG in arm_dcache_delete

Frank B

Senior Member
Code:
// Delete data from the cache, without touching memory
//
// Normally arm_dcache_delete() is used before receiving data via
// DMA or from bus-master peripherals which write to memory.  You
// want to delete anything the cache may have stored, so your next
// read is certain to access the physical memory.
__attribute__((always_inline, unused))
static inline void arm_dcache_delete(void *addr, uint32_t size)
{
    uint32_t location = [COLOR=#ff0000][B](uint32_t)addr & 0xFFFFFFE0;[/B][/COLOR]
    uint32_t end_addr =[B] (uint32_t)addr + size;[/B]
    asm volatile("": : :"memory");
    asm("dsb");
    do {
        SCB_CACHE_DCIMVAC = location;
        location += 32;
    } while (location < end_addr);
    asm("dsb");
    asm("isb");
}

Had a crash.. and had to debug some code...
The marked red line leads to using a start adress that may be lower than the adress requested.
So, data got deleted, unintentionally, and my program crashed.

+ you should use the "&" for the end_addr, where it is missing..

Please fix that.

It's a killer for any mem that got allocated with malloc(), and kills the heap management.
 
Last edited:
Attention, same bug is in arm_dcache_flush_delete() and arm_dcache_flush().
In arm_dcache_flush() it may not lead to crash, but it's wrong, too.
 
@frank/@Paul - I am curious if there is some documentation (RM or??) on the SCB_CACHE_DCIMVAC
I know from source code that it is: #define SCB_CACHE_DCIMVAC (*(volatile uint32_t *)0xE000EF5C)

My guess is says to delete that 32 byte page from the cache without first writing it's contents out to memory...

So again pardon this random rambling... But this time hopefully short ;)

But what I wonder is:
a) If I set this memory location to a value whose bottom 5 bits is not 0... What does it do.. Still 32 bytes starting at that location, the rest of that logical page starting at that offset? error?
b) If you use the flush/delete does your code work?
c) If you are running into this issue, than I assume one can run into similar issue at the end... That is if the addr+len does not end at the end of some 32 byte boundary, the rest of that boundary is deleted as well?

Again Interesting problem.
 
I believe that it needs 32 byte alignment. If so, the code is correct, but arm_dcache_delete() should only be called after a flush or with a 32 byte aligned address pointing to a buffer that is a multiple of 32 bytes long. Making the bug a lack of a warning comment.

Wouldn't it be better to use the equivalent CMSIS-Core routines (which do document the alignment requirement)?
 
Indeed with just the code as a reference seeing in he while();:
Code:
 location += 32;

Along with the start addr location manipulation suggests the code needs 32 byte alignment.

So starting up to 31 bytes early and ending to include the end addr from size requested, plus perhaps 31 bytes, would be enforcing that and perhaps preventing a failure to function, if not a fault.
 
Of course it works with aligned data.
The problem is unaligend data.
Malloc returns unaligend data.
Yes you can do workarounds in your code.
3rd party code does not have Teensy workarounds.

So why not just fix the bug?
 
Last edited:
If it only works on 32 bytes - as @luni said the fix is to document the behavior.

Caller must assure any data area of interest is 32 byte aligned. Allocate extra and self align as needed.
 
and yes, a workaround means, that if you want to malloc a single byte (malloc (1)) you have to malloc 32
Not your code only, 3rd party code, too.

It must be a joke.
"You can't use malloc() without manual alignment of the returned pointer afterwards, if you use the Teensy provided cache management".
I'm not seeing this anywhere ...
 
@all: I think you don't see the real problem. I thought it was easy....
Ok.
Invested more time and wrote a short demo for you:
Code:
void setup() {
 Serial.begin(8600);
 if (CrashReport) {
  Serial.println(CrashReport);
  CrashReport.clear();
 }
 void *p1 = malloc(100);
 void *p2 = malloc(100);
 arm_dcache_delete(p2, 100);

 int *p3 = malloc(100); //<- yes i know there is no nullptr-check. but trust me, teensy 4 has 300 bytes free. really!
 Serial.printf("%x %x %x", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
 *p3 = 7;
}

void loop() {}

Run this.


Hey, you even make digitalWrite(4700, 1) work in a way that it does not crash. But for malloc, its ok..´?

Code:
CrashReport:
  A problem occurred at (system time) 6:55:58
  Code was executing from address 0x100
  CFSR: 82
    (DACCVIOL) Data Access Violation
    (MMARVALID) Accessed Address: 0x0 (nullptr)
      Check code at 0x100 - [COLOR=#ff0000]very likely a bug![/COLOR]
      Run "addr2line -e mysketch.ino.elf 0x100" for filename & line number.
  Temperature inside the chip was 38.17 °C
  Startup CPU clock speed is 600MHz
  Reboot was caused by auto reboot after fault or bad interrupt detected

Perhaps we can just remove all the dumb checks that make Arduino slow. - i.e. from digitalWrite - it would be way faster without it!
We just have to document that digitalWrite(4700,1) crashes.Genious!
There are hundrets of such checks. A tremendous speedup.
We should tell this others. All the checks for buffer overflows and nullpointers are not needed too! Best would be, to check nothing. We don't need faster PCs.

And hey, it saves us 3 minutes not to edit the buggy lines in the Teensy core. A simple "+ 32" and a "&" in the next line. Think a bit longer about arm_dcache_flush_delete() and arm_dcache_flush() ;-)

Back to serious: There is no reason not to fix it.
 
Last edited:
@all: I think you don't see the real problem. I thought it was easy....

No Frank - until the posted sketch it was not clear what was happening. Even then added prints needed to show where it is Crash'ing for sure.

From reading it seemed what was wrong was the boundary extension was 'Destroying' memory contents unexpectedly - and that was causing the Cra.

Code:
ONE:: 20203068 202030d0 63
[B]TWO:: 20203068 202030d0 63[/B]
CrashReport:
  A problem occurred at (system time) 23:49:51
  Code was executing from address 0x26FC
  CFSR: 82
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x1690AB14
  Temperature inside the chip was 30.58 °C
  Startup CPU clock speed is 600MHz
  Reboot was caused by auto reboot after fault or bad interrupt detected

ONE:: 20203068 202030d0 63
TWO:: 20203068 202030d0 63

From this version:
Code:
void setup() {
  Serial.begin(8600);
  if (CrashReport) {
    Serial.println(CrashReport);
    CrashReport.clear();
  }
  void *p1 = malloc(100);
  void *p2 = malloc(100);
  int *p3 = 99;
  Serial.printf("ONE:: %x %x %x\n", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  delay(100);
  arm_dcache_delete(p2, 100);
[B]  Serial.printf("TWO:: %x %x %x\n", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  delay(100);
[/B]
[B]  p3 = malloc(100); //<- yes i know there is no nullptr-check. but trust me, teensy 4 has 300 bytes free. really![/B]
  Serial.printf("THREE:: %x %x %x\n", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  delay(100);

  Serial.printf("%x %x %x", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  *p3 = 7;
}

void loop() {}
 
From this version:
Code:
void setup() {
  Serial.begin(8600);
  if (CrashReport) {
    Serial.println(CrashReport);
    CrashReport.clear();
  }
  void *p1 = malloc(100);
  void *p2 = malloc(100);
[COLOR=#ff0000]  int *p3 = 99;[/COLOR]
[...]
}

void loop() {}

You don't want to do that.
 
You don't want to do that.

Just setting the pointer to a known value? I didn't write anything there?

Same result:
Code:
[CODE]
ONE:: 20203068 202030d0 0
TWO:: 20203068 202030d0 0
CrashReport:
  A problem occurred at (system time) 0:3:4
  Code was executing from address 0x26FC
  CFSR: 82
	(DACCVIOL) Data Access Violation
	(MMARVALID) Accessed Address: 0x1790AB14
  Temperature inside the chip was 29.34 °C
  Startup CPU clock speed is 600MHz
  Reboot was caused by auto reboot after fault or bad interrupt detected

ONE:: 20203068 202030d0 0
TWO:: 20203068 202030d0 0
[/CODE]
 
Hm. re-thinking about the other both, arm_dcache_flush_delete() and arm_dcache_flush(), i'd probably leave them as they are, as they do not destroy things and it's wanted that everything gets flushed down to RAM.
Edit: oh, well....wait.. they may flush a cache line too less. right? (or am I'm thinking wrong..)

@Tim: To exactly what address points p3 at this line? ;-)
Edit: It crashes due to an other reason.
 
Last edited:
Tim, the problem is, in my example, that p3 is not valid.
But there is no obvious reason for that.

The heap is corrupted.
Remove the cache delete and everything is fine.
A miracle for most users.
I'm pretty sure you can make it crash without ever writing anything to the adresses where the pointers point to.
 
Hm. re-thinking about the other both, arm_dcache_flush_delete() and arm_dcache_flush(), i'd probably leave them as they are, as they do not destroy things and it's wanted that everything gets flushed down to RAM.

Yes, was confused how the write before delete cases caused any problem.

@Tim: To exactly what address points p3 at this line? ;-)
Edit: It crashes due to an other reason.

I removed the assignment - that gave the pointer an initial value on declaration {int *p3 = 99;} - and the fault was the same. Faulted at the same code address. That code addr2line gives ??:?

Assuming the problem is that stepping backwards for the cache boundary start is messing up the linked list for malloc() mgmt for *p2 and allocating for *p3 causes a crash trying to follow the list?

No Crash with ::
Code:
  char *p2 = (char *)malloc(100);
// ...
  arm_dcache_delete((void *)((uint32_t)[B][U]&p2[0x1e][/U][/B]&0xfffffe0), 100); // bad delete size and location - though it shows saving the bytes before *p2 lets it work

If that is the case then malloc() just needs to give alloc *ptr's on 32 byte boundaries (for 1062's ??), or anyone smart enough to use arm_dcache_delete() needs to have a documentation note to enlighten them.

@all : I saw a note about better addr2line from a newer version?
 
@Frank - the crash is in Allocating for *p3 because the link list gets corrupted by the delete. It never gets past the printf("THREE::..." >> See p#14 output
>> Without that HACK EDIT of the start ADDRESS

Here is the full code for above:
Code:
void setup() {
  Serial.begin(8600);
  if (CrashReport) {
    Serial.println(CrashReport);
    CrashReport.clear();
  }
  void *p1 = malloc(100);
  char *p2 = (char *)malloc(100);
  int *p3;
  Serial.printf("ONE:: %x %x %x\n", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  delay(100);
  [B]arm_dcache_delete((void *)((uint32_t)&p2[0x1e]&0xfffffe0), 100);[/B]
  Serial.printf("TWO:: %x %x %x\n", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  delay(100);

  p3 = [B][COLOR="#FF0000"]malloc(100);[/COLOR][/B] //<- yes i know there is no nullptr-check. but trust me, teensy 4 has 300 bytes free. really!
 [B] Serial.printf("THREE:: %x %x %x\n", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);[/B]
  delay(100);

  Serial.printf("%x %x %x", (intptr_t) p1, (intptr_t) p2, (intptr_t) p3);
  delay(100);
  *p3 = 7;
}

void loop() {}
 
I'm pretty sure you can make it crash without ever writing anything to the adresses where the pointers point to.

Indeed :)
Code:
void setup() {
 Serial.begin(8600);
 if (CrashReport) {
  Serial.println(CrashReport);
  CrashReport.clear();
 }
 void *p1 = malloc(100);
 void *p2 = malloc(100);
 Serial.printf("%x %x\n", (intptr_t) p1, (intptr_t) p2);
 arm_dcache_delete(p1, 100);
 Serial.printf("%x %x\n", (intptr_t) p1, (intptr_t) p2); //p2 is still valid. of course.
 if (p2) free(p2); //<- [COLOR=#ff0000]crash[/COLOR] for no obvious reason. Note: This free() can be  be hundrets of lines later. Try to debug such things.
}

void loop() {}

I think this example is better than my first one. And p2 is checked, it's perfectly valid.
 
Last edited:
As written *p2 was trashed as soon as the dcache_delete started to write - because it had to back up from the pointer to the prior 32B boundary.

malloc() stores REF data just before the returned pointer. That ref data helps manage memory, and when trashed nothing works.

After delete, try viewing at address of *p1 through *p2 allocation with :: github.com/KurtE/MemoryHexDump

Zzzzz's
 
Exactly :) the heap is corrupted.
Try to debug that, without a debugger, when the free() happens 500 lines later. Probably in a library.

Its a serious problem.
It wasn't that easy to find the issue when I debugged my code with printf..
 
Last edited:
Now, sleep well, take a look again tomorrow, think of all kinds of checks.. null pointers.. digitalWrite..and tell me again that it is an documentation issue.
Ok. You can declare any bug as doc. Issue..
 
One last goodie:
Code:
void setup() {
 memcpy(0,0,0); // or memset, or...
 Serial.println("Bubu");
}

void loop() {}
What happens.. and why? :)
 
No doubt that arm_dcache_delete() is under-documented and dangerous when misused with non 32 byte aligned buffers. Do you have a better version?

Off the top of my head:

if (alignment or size isn't right)
arm_dcache_flush_delete()
else
...
 
Yes mentioned above.
Start: add +32
End: add the "and"
Use a while loop instead of do..while.
This may delete less than wanted - but that is OK.

For the other functions: just make sure they flush enough cache lines.

I could post new code for all three.
But.. no.. I've lost interest to make PRs for some time. I'm just a user. I can just say please "fix it" :)
Will use the time to play guitar.
 
Last edited:
Back
Top