Hi,
I've had a similar problem when compiling with -Os and LTO. I think the root cause for these startup issues is the way the stack pointer is setup in ResetHandler(). If the compiler inserts some stack or frame handling code before __asm__ volatile("mov sp, %0" :: "r" ((uint32_t)&_estack));, the stack or frame setup gets messed up later on. If the compiler actually inserts stack-related code depends on optimization settings, function inlining during LTO, if startup_early_hook() is used, etc. In most cases the startup code just works fine.
But to resolve this, you IMHO have to setup the stack pointer in assembler before calling a C function. What works for me with several compiler optimization settings is the following reset handler:
Code:
__attribute__((section(".startup"), naked, noreturn))
void ResetHandler(void) {
__asm__ volatile("str %1, [%0] \n\t" :: "r" (&IOMUXC_GPR_GPR17), "r" (&_flexram_bank_config) : "memory");
__asm__ volatile("str %1, [%0] \n\t" :: "r" (&IOMUXC_GPR_GPR16), "r" (0x00200007) : "memory");
__asm__ volatile("str %1, [%0] \n\t" :: "r" (&IOMUXC_GPR_GPR14), "r" (0x00AA0000) : "memory");
__asm__ volatile("dsb" ::: "memory");
__asm__ volatile("isb" ::: "memory");
__asm__ volatile("msr msp, %0" :: "r" (&_estack) : "memory");
__asm__ volatile("dsb" ::: "memory");
__asm__ volatile("isb" ::: "memory");
ResetHandlerC();
}
It initializes DTCM, sets the stack pointer and then calls the original reset handler code (renamed to ResetHandlerC() here).
If that makes sense to you guys, I can create a pull request. But before merging it should be reviewed and tested by others, too 
Timo