Teensy 3, hard fault due to SRAM_L and SRAM_U boundary

Status
Not open for further replies.
On Teensy 3, there is 16 KB of RAM available total, according to the reference manual, the SRAM is split into SRAM_L and SRAM_U

The Teensy 3 linker script simply states that

Code:
RAM  (rwx) : ORIGIN = 0x1FFFE000, LENGTH = 16K

I have a bit of code that will malloc a piece of memory, by sheer coincidence, this bit of memory will occupy a bit of both regions. The memory contains an array of a struct, actually a list of files

This is a sample of my debug output as I am using memcpy to populate the file's information

Code:
file 1960 memcpy 0 addr 0x1FFFFCC0...done
file 1970 memcpy 1 addr 0x1FFFFCE6...done
file 1980 memcpy 2 addr 0x1FFFFD0C...done
file misc2 memcpy 3 addr 0x1FFFFD32...done
file 1990 memcpy 4 addr 0x1FFFFD58...done
file ABC_YA~1 memcpy 5 addr 0x1FFFFD7E...done
file AMBSTR~1 memcpy 6 addr 0x1FFFFDA4...done
file amoeba memcpy 7 addr 0x1FFFFDCA...done
file commod64 memcpy 8 addr 0x1FFFFDF0...done
file CUBES_~1 memcpy 9 addr 0x1FFFFE16...done
file EXPRIBB memcpy 10 addr 0x1FFFFE3C...done
file finlshot memcpy 11 addr 0x1FFFFE62...done
file GRAF_4KS memcpy 12 addr 0x1FFFFE88...done
file haduken memcpy 13 addr 0x1FFFFEAE...done
file intro memcpy 14 addr 0x1FFFFED4...done
file kickstar memcpy 15 addr 0x1FFFFEFA...done
file KS_HAL~1 memcpy 16 addr 0x1FFFFF20...done
file KS_TOGET memcpy 17 addr 0x1FFFFF46...done
file KSFINA~1 memcpy 18 addr 0x1FFFFF6C...done
file KSHELLOS memcpy 19 addr 0x1FFFFF92...done
file KSNINT memcpy 20 addr 0x1FFFFFB8...done
file lavapixe memcpy 21 addr 0x1FFFFFDE...

 Exception Handler, source: 1
r0: 0x00000000, r1: 0x1FFFFA80, r2: 0x20000002, r3: 0x00000000, r12: 0x20001D8F
LR: 0x0000482F, PC: 0x0000484C, PSR: 0x61000000,

notice how the memcpy failed for 0x1FFFFFDE , near the edge of the boundary.

then, I replaced memcpy with my own version, a trick to get my compiler to force 8 bit writes instead of 32 bit writes, it then works fine

Code:
_PTR memcpy_safe(void* dest, void* src, int cnt)
{
     int start = (int)dest;
     int end = start + cnt;
     if (start <= 0x1FFFFFFF && end >= 0x20000000) // if memory lies over SRAM_L and SRAM_U boundary
     {
          volatile uint8_t useless = 0; // force 8 bit copy instead of optimized 32 bit copy
          volatile uint8_t c;
          volatile int i = 0;
          volatile uint8_t* src_p = (uint8_t*)src;
          volatile uint8_t* dest_p = (uint8_t*)dest;
          for (i = 0; i < cnt && useless == 0; i++) {
               c = src_p[i];
               dest_p[i] = c;
          }
     }
     else
     {
          return memcpy(dest, src, cnt);
     }
}

it works fine

Code:
file KSFINA~1 memcpy 18 addr 0x1FFFFF6C...done
file KSHELLOS memcpy 19 addr 0x1FFFFF92...done
file KSNINT memcpy 20 addr 0x1FFFFFB8...done
file lavapixe memcpy 21 addr 0x1FFFFFDE...done
file leon memcpy 22 addr 0x20000004...done
file lilypad memcpy 23 addr 0x2000002A...done
file LOGOAN01 memcpy 24 addr 0x20000050...done
file LOGOAN02 memcpy 25 addr 0x20000076...done

so I then run this bit of code

Code:
volatile uint32_t* foo;
foo = (volatile uint32_t*)0x1FFFFFFF; // right at the boundary between SRAM_L and SRAM_U
*foo = 0x12345678; // perform a 32 bit write

bam! instant crash hard fault. this almost confirms my suspicion that the crash is caused by the boundary between SRAM_L and SRAM_U.

so my question is: is cross-boundary operations like this disallowed on K10 and K20 hardware?
how can I make sure that the compiler is aware of this? I am using GCC 4.7.2

(I have posted the exact same question to the Freescale forum as well)
 
In the latter code, a 32 bit write is attempted on an odd address. I'd think that would not be allowed anywhere in the address space.
The ARM targeting compiler would align a real variable, right? Or use byte operations for a packed struct.
 
In the latter code, a 32 bit write is attempted on an odd address. I'd think that would not be allowed anywhere in the address space.
The ARM targeting compiler would align a real variable, right? Or use byte operations for a packed struct.

I am not aware if that is a feature of GCC

I am getting these faults for other operations on various members of the structure.

What if my code makes GCC forget it's a struct member? There are many situations that could cause my scenario.
 
Interesting. I've never had this actually hit me. I'll try this on the teensy 3.1 and let you know if I get something similar.
By the way, you might be able to get better performance on your memcpy hack if you use regular memcpy along with the 8bit one.
e.g. check if you are on a 4byte boundary, if not, 8bit copy 'till you are, then use regular memcpy if there is anything left.
 
Last edited:
by the way

Code:
volatile uint32_t* foo;
foo = (volatile uint32_t*)(0x1FFFFFFF - 4);
*foo = 0x12345678;

works without crashing

I plan on resolving the issue by making sure that the end of my struct to meet the end of the SRAM region. It will be a very dirty hack but it's a temporary fix. A cleaner solution is to design a wrapper for malloc such that it accounts for the sizes of individual data element sizes.
 
A better way if you use malloc is to malloc 3 bytes more than what you need, then use an aligned offset as the pointer...
You can also specify that you want certain alignments. Normally GCC on arm will align at 16bit boundaries.
You can change this by using __attribute__ ((aligned (4)))

For malloc, Something like this:

Code:
volatile uint32_t* bar=malloc(what_i_need + 3);
volatile uint32_t* foo= bar + (bar & 3);

Would give you 32bit alignment, always.

If you always use the same buffer, you can do this:

Code:
volatile uint32_t junk[2048] __attribute__ ((aligned (4))); // 8192 bytes, this will cross the boundary on teensy 3.0.
volatile uint32_t* foo = junk;
:cool:
 
I just thought of another possible trick... fragment the heap at that point. Basically, start by mallocing until one of the mallocs crosses the offending area.
Then free all the ones that do not.
This would have the effect of making the malloc arena split.
you can actually do this very easily...

First, malloc a couple of bytes.
Second, Using the address from malloc, calculate the size needed to almost reach the offending area, and malloc it.
Third malloc enough bytes that crosses the boundary.
Fourth, free the first two mallocs.

Code:
basically you will end up with this
[first malloc][second malloc][offending area malloced][free ram][stack]
and then
[----------free ram---------][offending area malloced][free ram][stack]
 
thanks xxxajk

I have already started to write my own malloc that handles this, it is basically the same idea you have, my code will ensure that the boundary of my struct meets the boundary of the two SRAM regions

I will test it and report back

(edit: maybe a simpler solution is to just check if the next allocation is close to the boundary, and allocate at the boundary if a cross is detected? this wastes some space but it's not too much waste)

Meanwhile, does anybody think this is a design flaw? it may not be classified as a "bug", but it sure is annoying, it feels like dealing with the PIC's "memory bank" architecture, which is why PIC have their own compiler.

I do not have any knowledge of silicon design, but just out of curiosity, would a solution be to extend each region by 3 bytes, and map those 3 bytes together, such that the write operations does not cause a hard fault?

This is a difficult bug to troubleshoot, I bet somebody could encounter a similar situation and not realize that the SRAM was divided unless they look really deep into the reference manual and also have a good understanding of ARM assembly instructions. I would hope that people who design silicon would take this sort of situation into consideration.
 
Last edited:
This one hit me SO HARD!

I have allocated a buffer space for records of varying sizes. Each record has its own write routine and is optimized to use either 8, 16 or 32 bit writes.

Due to its size, the the buffer crossed the memory boundary at 0x20000000. Depending on the type of record AND all of the PREVIOUS records written, a cross-boundary-write might be 8, 16 or 32 bits. In practice, SOMETIMES it crashed and sometimes didn't. Took me a week to find out that it's the 32-bit cross-boundary writes.

And, yes, a 16 bit write to 0x1fffffff crashes as well. The funny thing is that READS DO WORK!

The code below illustrates the issue:
Code:
void dumpBorder() {
  uint8_t *ptr = (uint8_t*) 0x1ffffff8;
  Serial.print((uint32_t) ptr,HEX); Serial.print(' ');
  for (int i=0; i<12; i++) {
    uint8_t data=*ptr++;
    if (data<16) 
      Serial.print('0');
    Serial.print(data,HEX);
    Serial.print(' ');
  }
  Serial.println();
}

void setup() {
  while (!Serial) {}
  Serial.begin(115200);

  // tell us that you're alive
  pinMode(LED_BUILTIN,OUTPUT);
  digitalWriteFast(LED_BUILTIN,HIGH);
  delay(1000);
  digitalWriteFast(LED_BUILTIN,LOW);
  
  Serial.println("Start");
  
  volatile uint32_t* ptr = (volatile uint32_t*) 0x1ffffff8;

  Serial.print("Clearing mem:              ");
  *ptr++ = 0; *ptr++ = 0; *ptr = 0;
  dumpBorder();
  
  Serial.print("Writing 32bit @0x1ffffffc: ");
  *((volatile uint32_t*) 0x1ffffffc) = 0x01020304;
  dumpBorder();
  
  Serial.print("Reading 32bit @0x1ffffffb: ");
  ptr = (volatile uint32_t*) 0x1ffffffb;
  Serial.println(*ptr,HEX);

  Serial.print("Writing 32bit @0x1ffffffb: "); 
  *ptr = 0x11121314;
  dumpBorder();
  
  ptr = (volatile uint32_t*) 0x1ffffffd;
  Serial.print("Reading 32bit @0x1ffffffd: "); 
  Serial.println(*ptr,HEX);


  Serial.print("Writing 32bit @0x1ffffffd: "); 
  *ptr=0x21222324;  // <-- crash!
  dumpBorder();

  Serial.print("Writing 16bit @0x1fffffff: ");
  *((volatile uint16_t*) 0x1fffffff) = 0x3132;  // <-- crash!
  dumpBorder();     

  Serial.println("Done!"); 
}


void loop() {}

This isn't a theoretical issue, it can happen anytime and depends on conditions that probably are out of control of the average programmer. Depending on WHERE your variable is located in memory, it might give you a bus fault or not...

Code:
// This is for Teensy 3.1 only (requires 64K of RAM)
uint8_t buf[30000];  // comment this out and it works!
uint8_t buf2[10000];

void setup(void) {
  while (!Serial) {}
  Serial.begin(115200);

  // tell us that you're alive
  pinMode(LED_BUILTIN,OUTPUT);
  digitalWriteFast(LED_BUILTIN,HIGH);
  delay(1000);
  digitalWriteFast(LED_BUILTIN,LOW);

  Serial.print("Buf:  ");              // comment this out and it works!
  Serial.println((uint32_t)&buf,HEX);  // comment this out and it works!
  Serial.print("Buf2: ");
  Serial.println((uint32_t)&buf2,HEX);
  
  Serial.println("Filling Array!");
  uint8_t *bytePtr=buf2;
  *bytePtr++='x';
  uint32_t *longPtr=(uint32_t*)bytePtr;
  for (int i=0; i<2499; i++) {
    *longPtr++=0x12345678;
  }
  
  Serial.println("Done!");

}

void loop(void) {}

OK, so I hear you saying: "Just don't use any unaligned writes, they aren't very efficient anyway!" . And I DON'T agree on that. I have my buffer. Depending on the previous records, my write pointer may be aligned or not. If it's aligned, 32 bit writes are the most efficient way to get my data written. If not, yes, it might consume two bus cycles or not, depending on the memory architecture. And unaligned writes just work on almost any address - except for the RAM boundary. So, why shouldn't I use them?

My suggestion is to somehow avoid that a single allocated object crosses the RAM boundary at 0x20000000. From what I see from Paul's linker map, in order to give as much free memory as possible, he allocates all available memory in a single block, so the linker doesn't know about that boundary.

Is there a way to tell the linker that there are two different segments of RAM but that they should be used for the same type of data, just avoiding a cross-segment allocation for a single object? If not, I strongly suggest to CLEARLY DOCUMENT THE ISSUE - but where?

I understand that for well-aligned objects, such as an uint32_t[] or even an uint8_t[] that is guaranteed to never be read unaligned, there is no such need. But, I believe that my current use case might not be the only one that runs into this problem. I clearly can imagine the sector buffer of an SD card, Ethernet frame buffers and other stuff that might be written unaligned.

For now, I just allocate a smaller buffer as DMAMEM (which puts it even before .data and .bss) and verify that its end address is below 0x20000000. As of now, that works.

Any other suggestions?
 
I must admit, this issue fell off my radar while working so much on the audio library. It is still buried on a long list of stuff, which I'll (maybe) get to someday...

My gut feeling is that (maybe) code could be added in the fault handler to check for this situation and emulate the access, then gracefully return. It'll be slow, but this problem is rare and only occurs at 1 place in memory, and only when accessing memory unaligned, which normally can't happen, unless you typecast pointers and ignore the compiler warnings (eg, breaks strict aliasing, etc).
 
My biggest fear would be stack crossing the boundary with a frame that triggers it. That would totally suck. I'm not even sure if you could recover from that one... i should set the stack misaligned a bit above it and see if I can trigger it. Could be fun.
Another thing to consider here are structs that are packed... those could indeed hit this as well.

Seriously, this isn't Paul's fault. Don't blame him, don't flame him.
It is a chip bug or feature. It is probabbly documented in the errata, or in the datasheet for the chip.

This perhaps could even have a use as a feature, where you could watch your stack without using any CPU at all.
For example, if the stack creeps too low, or the heap climbs too high, possibly trip the fault? Hmmm.
 
Thanks - as always - for your quick reply, Paul. I believe that the audio library might be subject to this problem as well - as soon as you start to write audio file headers. Apart from that I agree that the problem only occurs in very specific situations. Whether they're frequent or not is another question.

Handling this in the bus fault handler is a challenge that I'll leave to you (or to Thomas Roell - the guy who wrote the software debugger stack). That would be a very elegant solution, permitting to allocate objects bigger than half of the available RAM.

I haven't tested it yet, but can this occur on DMA transfers as well or are those guaranteed to be aligned?

From what I see on the GNU linker page, it's not possible to define two memory regions with the same properties. So, for now, I would just like to suggest to document it somewhere.

Maybe in the forum? Done! ;)
 
Thanks xxxajk for your feedback as well. No, I would never blame Paul for this. This is an issue that Freescale needs to sort out.

I definitely consider Paul's work amazing, superb and exceptional. Just to make that point... Paul knows that, I've told him more than once. ;)
 
I haven't tested it yet, but can this occur on DMA transfers as well or are those guaranteed to be aligned?

Like so many things, it depends on how you write your code.

By default, the compiler always aligns variable to their size. So if you create an array if int32_t, it'll be aligned to a 32 bit boundary. If you create an array of uint8_t, it can have any starting address.

There's an attribute you can use to force non-default alignment.

The dangerous case is where you do something like this:

Code:
char myBigArray[5000];
unsigned int *ptr;

ptr = (unsigned int *)(myBigArray + 12);

This is legal code, but if you have warnings turned on, it'll give you a strict aliasing warning. When you access memory through that pointer, even though the offset is 12 bytes, the array could be allocated anywhere in memory, because it's an 8 bit data type.

Cortex-M4 has hardware which detects unaligned access. When you access the memory anywhere other than the boundary, the chip actually generates 2 accesses on the 32 bit bus. It's slower, but normally you never notice you're causing the hardware to work twice as hard.

It's generally not a good practice to do this, even if it works. Some other ARM chips, like Cortex-M0, do not support ANY unaligned access. If your code is ever ported to those other ARM chips, it'll crash.
 
My biggest fear would be stack crossing the boundary with a frame that triggers it. That would totally suck.

Yes, that would suck. But the compiler always aligns all local variables (and all global / static ones too), to prevent unaligned access.

I could be wrong, but I'm pretty sure the only way to hit this issue involves casting pointers from smaller to larger types, without taking precautions for assuring proper memory alignment.
 
Last edited:
I haven't tested it yet, but can this occur on DMA transfers as well or are those guaranteed to be aligned?

I'm not sure what happens if you configure the DMA controller for 16 or 32 bit address, but give it addresses with the lowest bit set. I've never tried that.
 
xxxajk mentioned packed structs, so I tried it:

Code:
// I'm out of T3.1s, so this one is for a 3.0

struct __attribute__ ((__packed__)) ps {
  uint8_t m_byte;
  uint16_t m_word;
  uint32_t m_long;
};

#define ARRAY_SIZE (1500)  // 1500 * 7 bytes = 10500 bytes
struct ps test[ARRAY_SIZE];

void setup() {
 
  while (!Serial) {}
  Serial.begin(115200);
  
  Serial.println("Start");

  for (int i=0; i<ARRAY_SIZE; i++) {
    Serial.println((uint32_t)(&test[i].m_long),HEX); delay(10);
    test[i].m_long=0x12345678;
  }  
  Serial.println("Done");
  
}

void loop() {
}

Surprisingly, it works, which made me look at the assembler code. Here's the interesting section:

Code:
000004fc <setup>:
     4fc:       e92d 41f0       stmdb   sp!, {r4, r5, r6, r7, r8, lr}
     500:       4a16            ldr     r2, [pc, #88]   ; (55c <setup+0x60>)
     502:       7813            ldrb    r3, [r2, #0]
     504:       2b00            cmp     r3, #0
     506:       d0fc            beq.n   502 <setup+0x6>
     508:       4815            ldr     r0, [pc, #84]   ; (560 <setup+0x64>)
     50a:       4916            ldr     r1, [pc, #88]   ; (564 <setup+0x68>)
     50c:       4e16            ldr     r6, [pc, #88]   ; (568 <setup+0x6c>)
     50e:       f7ff ffe3       bl      4d8 <_ZN5Print7printlnEPKc>
     512:       2500            movs    r5, #0
     514:       f04f 0807       mov.w   r8, #7
     518:       f240 57dc       movw    r7, #1500       ; 0x5dc
     51c:       fb08 6405       mla     r4, r8, r5, r6
     520:       2300            movs    r3, #0
     522:       1ce1            adds    r1, r4, #3
     524:       2210            movs    r2, #16
     526:       480e            ldr     r0, [pc, #56]   ; (560 <setup+0x64>)
     528:       f001 fea6       bl      2278 <_ZN5Print11printNumberEmhh>
     52c:       480c            ldr     r0, [pc, #48]   ; (560 <setup+0x64>)
     52e:       f001 fe95       bl      225c <_ZN5Print7printlnEv>
     532:       200a            movs    r0, #10
     534:       f001 fde4       bl      2100 <delay>
//////////////////////////////
     538:       2378            movs    r3, #120        ; 0x78
     53a:       70e3            strb    r3, [r4, #3]
     53c:       2356            movs    r3, #86 ; 0x56
     53e:       7123            strb    r3, [r4, #4]
     540:       3501            adds    r5, #1
     542:       2334            movs    r3, #52 ; 0x34
     544:       7163            strb    r3, [r4, #5]
     546:       2312            movs    r3, #18
     548:       42bd            cmp     r5, r7
     54a:       71a3            strb    r3, [r4, #6]
//////////////////////////////////
     54c:       d1e6            bne.n   51c <setup+0x20>
     54e:       4907            ldr     r1, [pc, #28]   ; (56c <setup+0x70>)
     550:       4803            ldr     r0, [pc, #12]   ; (560 <setup+0x64>)
     552:       f7ff ffc1       bl      4d8 <_ZN5Print7printlnEPKc>
     556:       e8bd 81f0       ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
     55a:       bf00            nop
     55c:       20001467        .word   0x20001467
     560:       20001680        .word   0x20001680
     564:       00002f44        .word   0x00002f44
     568:       1fffea60        .word   0x1fffea60
     56c:       00002f4a        .word   0x00002f4a

Check this out! Instead of using a single 32 bit store instruction, the compiler produces code to store byte by byte. Quite inefficient. But it prevents the boundary overrun.

Conclusion: Packed arrays are NOT a problem. Casted pointers are.
 
Last edited:
That reminds of an old quote from when I read USENET from Henry Spenser, who said "If you lie to the compiler, it will get its revenge".
 
That is good to know, because I have a lot of data structures that actually demand packing... like USB protocol stuff ;-) Thanks for looking into it.
 
That reminds of an old quote from when I read USENET from Henry Spenser, who said "If you lie to the compiler, it will get its revenge".

C'mon, Michael I'm not lying to the compiler. I just allocate a block of memory and use the most efficient way to write my data to it. Using unaligned 32-bit-writes still is faster than treating every possible misalignment in code. I have mixed 8, 16 and 32 bit data structures that I want to store as fast as possible:

Code:
class LogRecord {

public:
  LogRecord(uint16_t recType) {
    length=sizeof length+sizeof timeStamp+sizeof recordType;
    timeStamp = micros();
    recordType = recType | 0x8000;
  }

  void setTimeStamp(uint32_t time, bool isValidTimeStamp) {
    timeStamp = time;
    recordType = isValidTimeStamp ? recordType & 0x7FFF : recordType | 0x8000;
  }

  virtual uint8_t* put(uint8_t* p) {
    uint16_t* p16 = (uint16_t*)p;
    *p16++=length;
    uint32_t* p32 = (uint32_t*)p16;
    *p32++=timeStamp;
    p16 = (uint16_t*)p32;
    *p16++=recordType;
    return (uint8_t*)p16;
  }
  inline uint8_t getLength() {return length; }

protected:
  void setLength(uint8_t len) { length = len; }
  virtual ~LogRecord(){};

  uint16_t length;
  uint32_t timeStamp;
  uint16_t recordType;

};

class EventRecord : public LogRecord {
public:
  EventRecord(uint32_t time, bool isValidTimeStamp, uint32_t evt) : LogRecord(LOG_EVENT, time, isValidTimeStamp) {
    event = evt;
    setLength(length+sizeof(evt));
  }

  EventRecord() : LogRecord(LOG_EVENT) { setLength(length+sizeof(event)); event=0; }

  EventRecord(uint32_t evt) : LogRecord(LOG_EVENT) {
    event = evt;
    setLength(length+sizeof(evt));
  }

  void setEvent(uint32_t evt) { this->event = evt; }

  virtual uint8_t* put(uint8_t* p) {
    uint32_t* p32=(uint32_t*)LogRecord::put(p);
    *p32++=event;
    return (uint8_t*)p;
  }


protected:
  uint32_t event;

};

... more derived classes ...

I can't see that writing LogRecords with
Code:
ptr = rec->put(ptr)
means to cheat the compiler. I agree that this code isn't very portable (endianness), but treating a block of memory as a bunch of byte is nothing unusual or illegal. Even Java has a ByteBuffer object for this.

If the block of memory were bigger, I might consider memcpy. But for the very few bytes I have (12-16 per record), direct pointer writes seem to be far more efficient. But Freescale tells us that two adjacent blocks of memory is something different than a contiguous block. At least when you want to write to it.
 
I'm still convinced this problem if pretty rare. But obviously it does happen, since at least 3 people have reported it.

I'm also hopeful I might be able to come up with some code in the fault handler to automatically (but slowly) do the required operation and return to the program.
 
C'mon, Michael I'm not lying to the compiler. I just allocate a block of memory and use the most efficient way to write my data to it. Using unaligned 32-bit-writes still is faster than treating every possible misalignment in code. I have mixed 8, 16 and 32 bit data structures that I want to store as fast as possible.

I can't see that writing LogRecords with
Code:
ptr = rec->put(ptr)
means to cheat the compiler. I agree that this code isn't very portable (endianness), but treating a block of memory as a bunch of byte is nothing unusual or illegal. Even Java has a ByteBuffer object for this.

If the block of memory were bigger, I might consider memcpy. But for the very few bytes I have (12-16 per record), direct pointer writes seem to be far more efficient. But Freescale tells us that two adjacent blocks of memory is something different than a contiguous block. At least when you want to write to it.

I believe the underlying architecture does not allow word accesses that spans segments boundaries, so the compiler is correct to always generate the slow code when it is dealing with unaligned accesses for packed structures. Granted in the Teensy environment, it only occurs in one location, but still the compiler still needs to be safe. I've seen this same thing occurring at the end of cache boundaries, and page boundaries over the years with various machines.
 
If we assume that the compiler allocates with proper alignment and that all code accesses variables naturally, can boundary crossing problems be avoided when:

1) memcpy-armv7m.S is changed not to define __ARM_FEATURE_UNALIGNED
2) memmove() is also verified or replaced not to use unaligned access
3) -mno-unaligned-access is added to the compiler flags (this affects generated inlines of memcpy, etc.)

or are there still cases where data allocated across the boundary will cause a hang?
 
Status
Not open for further replies.
Back
Top