Question to GCC experts

Frank B

Senior Member
Hi,

i had issues with writing and DMA-access to a large array, 150KB
The reason is the SRAM_L/SRAM_U boundary, where no unaligned access is allowed. Here, unaligned means 16BIT accesses to 0x1FFFFFFE (crash...).
This is a big problem , because it is essential that the code can access this address. Any other access, or an "if" inside the inner loop would make the code much slower, so : no way.

I found a solution, with a new linker-file ( added "SCREEN" (.dmascreen))
Code:
MEMORY
{
    FLASH (rx) : ORIGIN = 0x00000000, LENGTH = 1024K
    RAM  (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 106K
    SCREEN (rw) : ORIGIN = 0x2000A800, LENGTH = 150K
}


SECTIONS
{
    .text : {
        . = 0;
        KEEP(*(.vectors))        
        /* TODO: does linker detect startup overflow onto flashconfig? */
        . = 0x400;
        KEEP(*(.flashconfig*))
        *(.startup*)
        *(.text*)
        *(.rodata*)
        . = ALIGN(4);
        KEEP(*(.init))
        . = ALIGN(4);
        __preinit_array_start = .;
        KEEP (*(.preinit_array))
        __preinit_array_end = .;
        __init_array_start = .;
        KEEP (*(SORT(.init_array.*)))
        KEEP (*(.init_array))
        __init_array_end = .;
    } > FLASH = 0xFF

    .ARM.exidx : {
        __exidx_start = .;
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
        __exidx_end = .;
    } > FLASH
    _etext = .;

    .usbdescriptortable (NOLOAD) : {
        /* . = ORIGIN(RAM); */
        . = ALIGN(512);
        *(.usbdescriptortable*)
    } > RAM

    .dmabuffers (NOLOAD) : {
        . = ALIGN(4);
        *(.dmabuffers*)
    } > RAM
        
    .usbbuffers (NOLOAD) : {
        . = ALIGN(4);
        *(.usbbuffers*)
    } > RAM

    .data : AT (_etext) {
        . = ALIGN(4);
        _sdata = .; 
        *(.fastrun*)
        . = ALIGN(4);
        *(.data*)
        . = ALIGN(4);
        _edata = .; 
    } > RAM

    .noinit (NOLOAD) : {
        *(.noinit*)
    } > RAM
    
    .bss : {
        . = ALIGN(4);
        _sbss = .;
        __bss_start__ = .;
        *(.bss*)
        *(COMMON)
        . = ALIGN(4);
        _ebss = .;
        __bss_end = .;
        __bss_end__ = .;
    } > RAM
    
    .dmascreen (NOLOAD) : {
        *(.dmascreen*)
    } > SCREEN
    
    _estack = ORIGIN(RAM) + LENGTH(RAM);
}
and using __attribute__ ((section(".dmascreen"), used)) for the array.
Ok.... but i really really don't like this solution: It is a arduino-project, and some day i want to publish it. I really don't want that the user has to install a new linkerfile that influences all his other projects.
In addition, the size-tool reports wrong values after compiling.

My Question:
Is there a better way to make sure that the array is above the ram-boundary ?
 
Last edited:
Hi,

i had issues with writing and DMA-access to a large array, 150KB
The reason is the SRAM_L/SRAM_U boundary, where no unaligned access is allowed. unaligned means 16BIT accesses to 0x1FFFFFFE area are not allowed.
This is a big problem for my current code, because it is essential that it can access this address. Any other access, or an "if" inside the inner loop would make the code much slower, so : no way.

.....

My Question:
Is there a better way to make sure that the array is above the ram-boundary ?

I'm not an GCC expert, but why not ensuring the DMA addresses are aligned such that such crossing the boundary will not happen, or did I misunderstand the problem?
I guess you have control on data size and memory at compile time.
 
Ok, a bit more in detail:
DMA reads the array and transfers it via SPI to the display. No problem at all (sometimes, not reproducable, it gets a "hickup" and reads a zero from the boundary, but that's no big issue)
The bigger problem is, that i have a very tight loop, that has to be - and is - very fast, that does 16-Bit writes to the array. The write near the boundary results in a crash. I can't write bytes, or 32-bit-aligned - both is slower.

Edit: A bit less details :) : This is for my C64-Emulator - i really need "every single cycle" to make it fast enough. My goal is to emulate the whole C64, with all its chips, in a single Teensy 3.6.. the SID-Emulation + Videochip need most of the Teensy-CPU-Time, so there's not much time left for the rest (6510 CPU, CIA1+CIA2, PLA...) . But at the moment, it looks really good...(YEAH, i have the VIC-emulation working, with all its complicated features - SID works "standalone" with the audio-lib, too - still has to be integrated -, the whole rest wokrs, too, but the CIAs need some more work to improve the emulation )
 
Last edited:
Yes. I thought the same. I'm 99% sure :) the rest %1 ok.. maybe a bug.. but the before, the array(16-BIT elements) was "DMAMEM" which is 32-Bit aligned, and it crashed. I would be surprised if there was a bug..
What i know, too, for sure, is that the code really crashes at the boundary.

Then , 0x1FFFFFFE is only 16-Bit aligned, not 32.

Edit: Remember the Audio-lib problem some years ago ? It was similar - only 16-Bit accesses that time, 16-bit aligned, but it crashed.
 
Last edited:
Code:
void setup() {
  delay(1000);
  Serial.print("Hello World\n");
  volatile uint16_t * p;
  p =(uint16_t*) 0x1FFFFFFE;
  *p++=65535;
  Serial.print("No Crash\n");
  *p++=65535;
  Serial.print("No Crash\n");
  
}

GRRR.. does work.. so tell me what's wrong...:-( i guess i leave it as it is, and when i publish the code, i'll leave it for others to find the problem.. does the compiler funny things ? my code uses no pointers...
DMASCREEN uint16_t screen[HEIGHT][WIDTH] - then i use only indexes..

maybe i try it with a older compiler, next week (i use gcc 6)
 
Last edited:
Back
Top