Linker wastes ram

Frank B

Senior Member
TD wastes some RAM.
Is there a way to tell the linker not to insert these huge gaps?
Here is an excerpt from a map file
Code:
20000840 B _sbss
20000840 00000001 b completed.1
20000844 00000018 b object.0
2000085c 00000004 B scale_cpu_cycles_to_microseconds
20000860 00000004 B systick_cycle_count
20000864 00000004 B systick_millis_count
[COLOR=#ff0000]20000868 00000004 B systick_safe_read
20000c00[/COLOR] 000002c0 B _VectorsRam
20000ec0 00000001 B external_psram_size
20000ec4 00000010 B extmem_smalloc_pool
20000ed4 00000004 b s_hotCount
20000ed8 00000004 b s_hotTemp
20000edc 00000004 b s_hot_ROOM
20000ee0 00000004 b s_roomC_hotC
20000ee4 00000008 b endpoint0_buffer
20000eec 00000004 b endpoint0_notify_mask
20000ef0 00000008 b endpoint0_setupdata
20000f00 00000020 B endpoint0_transfer_ack
20000f20 00000020 B endpoint0_transfer_data
[COLOR=#ff0000]20000f40 00000004 b endpointN_notify_mask
20001000[/COLOR] 00000280 B endpoint_queue_head
20001280 00000008 b reply_buffer
20001288 00000001 b sof_usage
20001289 00000001 B usb_configuration
2000128a 00000001 B usb_high_speed
2000128b 00000001 b usb_reboot_timer
2000128c 00000004 B usb_timer0_callback
20001290 00000004 B usb_timer1_callback
20001294 00000004 b rx_available
20001298 00000010 b rx_count
200012a8 00000001 b rx_head
200012ac 00000010 b rx_index
200012bc 00000009 b rx_list
200012c6 00000002 b rx_packet_size
200012c8 00000001 b rx_tail
200012e0 00000100 b rx_transfer
200013e0 00000002 b tx_available
200013e2 00000001 b tx_head
200013e3 00000001 b tx_noautoflush
[COLOR=#ff0000]200013e4 00000002 b tx_packet_size
20001400 00000080 b tx_transfer[/COLOR]
20001480 00000008 B usb_cdc_line_coding
20001488 00000001 B usb_cdc_line_rtsdtr
2000148c 00000004 B usb_cdc_line_rtsdtr_millis
20001490 00000004 B EventResponder::firstYield
20001494 00000004 B EventResponder::lastInterrupt
20001498 00000004 B EventResponder::firstInterrupt
2000149c 00000001 B EventResponder::runningFromYield
200014a0 00000004 B EventResponder::lastYield
200014a4 00000001 b yield::running
200014a5 00000001 b calibrating
200014a8 00000020 B HardwareSerial::s_serials_with_serial_events
200014c8 00000001 B HardwareSerial::s_count_serials_with_serial_events
20001500 B _ebss
And this is just "blink" :) - and only the worst are marked red. The first marked is almost 1 kB!

also, something like this:
Code:
20001298 00000010 b rx_count
200012a8 00000001 b rx_head
200012ac 00000010 b rx_index
is pretty inefficient (paces a 1-bytes variabale between larger ones that are aligned) And would work better if the default "sort by size" would do its job..(sorting seems to be disabled somewhere, or does not work for a unknown reason)
Ok, you could make extra sections for the vectors etc, but this is no general solution.
Is there any way to tell the linker to do it smarter and sort it?
The known "fix" to add "-fno-common" to gcc seems not to help.

Edit: Note, this was with GCC10, but the 5.4 is not much different.
Edit: No, I don't believe this is a bug in GCC/LD - Looks more like wrong usage by Teensyduino.
 
Last edited:
Not really (or i have a mistake somewhere)
Code:
20001000 00000280 B endpoint_queue_head
20001400 000002c0 B _VectorsRam
[...]
Code:
20001880 000000[COLOR=#ff0000]08 [/COLOR]b endpoint0_setupdata
20001888 000000[COLOR=#b22222]01 [/COLOR]b completed.1
2000188c 000000[COLOR=#b22222]18 [/COLOR]b object.0
200018a4 00000004 B scale_cpu_cycles_to_microseconds

also, it sorts by alignment. not sure wether this is correct (is it, perhaps?) .. i know, some vars (like Vectors) need a special, huge alignment.

Overall, it is even worse for blink with gcc10 and needs a kB more.
 
Last edited:
I've been intending to someday try "--sort-section=alignment".

Here's my best guess for editing boards.txt

Code:
teensy41.build.flags.ld=-Wl,--gc-sections,--relax,--sort-section=alignment "-T{build.core.path}/imxrt1062_t41.ld"

If anyone tries this, please let me know it if makes a worthwhile improvement? And do programs still work properly?
 
Yes, it is worse (for blink & gcc 10) (and with -fno-common added)

Not sure.. would'nt be a sort by size (with consideration of special alignments) more efficient? <- have to think about that...
And, I thought, sort by size was the default?
Anyway, as soon we have huge alignments, it is not that easy :) And simple sorting is not smart enough. The gaps could be filled, which, with simple sorting, can not be archieved.
I'd guess a gcc in 2021 can be smarter..
 
Last edited:
Funny that 920 byte gap:
Code:
20000868 00000004 B systick_safe_read
20000c00 000002c0 B _VectorsRam

Comes from the micros() use of :: __LDREXW(&systick_safe_read);

Maybe there is a better way to express that, or maybe that has special support requirements?
 
Funny that 920 byte gap:
Code:
20000868 00000004 B systick_safe_read
20000c00 000002c0 B _VectorsRam

Comes from the micros() use of :: __LDREXW(&systick_safe_read);

Maybe there is a better way to express that, or maybe that has special support requirements?

That's due to the required 1024 byte alignment of _VectorsRam. But yes, the systick variable could be placed elsewhere. It doesn't need to be a global variable and can be on the stack.
 
Last edited:
With using
Code:
void (* volatile _VectorsRam[NVIC_NUM_INTERRUPTS+16])(void) = [I]{unused_interrupt_vector[/I]};
_VectorsRam will be in the data section and, starts at 0x200000 (perhaps by accident ; - its the first var in the program in ".data")
This is a little bit better. Still there is a var (endpoint_queue_head) which is aligned (4096).. the both together are the problem.

i'd suggest an own section for _VectorsRam at the start of .data
 
Last edited:
Back
Top