backtrace

flok

Active member
Hi,

How can I get a backtrace?
Not neccessarily in a debugger, just via serial.
Addresses are enough (I can get line numbers and such via e.g. ~/.platformio/packages/toolchain-gccarmnoneeabi-teensy/bin/arm-none-eabi-addr2line -e .pio/build/Teensy4_1/firmware.elf 0x20037d38 ).

I tried __builtin_extract_return_addr(__builtin_return_address(0)) but this returns a value that does not seem to make sense (to me and the tooling).
To elaborate: I'm facing a problem with my software where:

C++:
uint32_t session::get_inc_datasn(const uint32_t data_sn_itt)
{
        if (data_sn_itt == this->data_sn_itt) {
                printf("%p\r\n", __builtin_extract_return_addr(__builtin_return_address(0)));
                return data_sn++;
        }

counts in steps of 1 on Linux and also on an ESP32 while on the Teensy4.1 it counts in steps of 4(!).

With that printf, I see 4 different callers:

Code:
0x16a3
0x16b3
0x16c7
0x16dd

Doing the same on Linux (or the ESP32) shows only 1 caller
Code:
0x560c884af31b
.

With some more printf magic, I see that there's only 1 caller:
Code:
gen_data_in_pdu 0xb199
0x16ad
0x16bd
0x16d1
0x16e7
.

Oh hang on, those ARE valid addresses!

Code:
~/.platformio/packages/toolchain-gccarmnoneeabi/bin/arm-none-eabi-gdb .pio/build/Teensy4_1/firmware.elf
...
Reading symbols from .pio/build/Teensy4_1/firmware.elf...
(gdb) bt
No stack.
(gdb) info symbol 0x16ad
iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int) + 153 in section .text.itcm

Code:
(gdb) display /32i 0x16a3
4: x/32i 0x16a3
   0x16a2 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+142>:    vmax.u16    d4, d10, d1
   0x16a6 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+146>:    mov    r0, r9
   0x16a8 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+148>:    bl    0xb874 <session::get_inc_datasn(unsigned long)>
   0x16ac <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+152>:    mov    r10, r0
   0x16ae <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+154>:    mov    r0, r4
   0x16b0 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+156>:    bl    0x14d8 <iscsi_pdu_scsi_cmd::get_Itasktag() const>
   0x16b4 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+160>:    mov    r1, r0
   0x16b6 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+162>:    mov    r0, r9
   0x16b8 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+164>:    bl    0xb874 <session::get_inc_datasn(unsigned long)>
   0x16bc <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+168>:    mov    r11, r0
   0x16be <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+170>:    mov    r0, r4
   0x16c0 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+172>:    bl    0x14d8 <iscsi_pdu_scsi_cmd::get_Itasktag() const>
   0x16c4 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+176>:    mov    r1, r0
   0x16c6 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+178>:    mov    r0, r9
   0x16c8 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+180>:    mov.w    r11, r11, lsl #8
   0x16cc <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+184>:    bl    0xb874 <session::get_inc_datasn(unsigned long)>
   0x16d0 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+188>:    mov    r3, r0
   0x16d2 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+190>:    mov    r0, r4
   0x16d4 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+192>:    and.w    r11, r11, #16711680    ; 0xff0000
   0x16d8 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+196>:    mov    r4, r3
   0x16da <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+198>:    bl    0x14d8 <iscsi_pdu_scsi_cmd::get_Itasktag() const>
   0x16de <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+202>:    mov    r1, r0
   0x16e0 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+204>:    mov    r0, r9
   0x16e2 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+206>:    bl    0xb874 <session::get_inc_datasn(unsigned long)>
   0x16e6 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+210>: 
    ldr    r1, [pc, #180]    ; (0x179c <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+392>)
   0x16e8 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+212>:    lsrs    r2, r0, #24
   0x16ea <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+214>:    lsrs    r4, r4, #8
   0x16ec <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+216>:    orr.w    r2, r2, r10, lsl #24
   0x16f0 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+220>:    and.w    r4, r4, #65280    ; 0xff00
   0x16f4 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+224>:    orr.w    r2, r2, r11
   0x16f8 <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+228>:    strd    r7, r8, [sp, #40]    ; 0x28
   0x16fc <iscsi_pdu_scsi_data_in::gen_data_in_pdu(session*, iscsi_pdu_scsi_cmd const&, blob_t const&, unsigned int, unsigned int)+232>:    ldr    r7, [r5, #4]

As you see, the assembly code is calling get_inc_datasn 4 times! But why?!



Ooooooh wait, I think I understand!

This is the offending line:

C++:
pdu_data_in->DataSN     = htonl(s->get_inc_datasn(reply_to_copy.get_Itasktag()));

I used the htonl code from https://forum.pjrc.com/index.php?threads/suggestion-for-ethernet-library-improvement.53132/

C++:
#define htonl(x) ( ((x)<<24 & 0xFF000000UL) | \
                   ((x)<< 8 & 0x00FF0000UL) | \
                   ((x)>> 8 & 0x0000FF00UL) | \
                   ((x)>>24 & 0x000000FFUL) )

Now what happens in that #define: (x) gets evaluated... 4 times!

Replaced by:

C++:
uint16_t HTONS(const uint16_t x)
{
        return (x << 8) | (x >> 8);
}

uint32_t HTONL(const uint32_t x)
{
        return (HTONS(x) << 16) | HTONS(x >> 16);
}
 
Last edited:
But why?!

The memory map on page 36 in the reference manual might help.

You can see the ITCM memory begins at address 0. It is shown as 512K, but really that is the maximum size. The 512K FlexRAM is partitioned where some becomes ITCM and the rest becomes DTCM, starting at address 0x20000000.

Unless you use the special memory keywords, by default your compiled code is allocated in the ITCM and your variables are allocated in DTCM. So for most functions, you can expect they will have addresses between 0 to 0x7FFFF.

Code defined with "FLASHMEM" will be located in the 0x6000000 range, and variables with "DMAMEM" will be allocated in the 0x20200000 OCRAM2 range.

You might look at the memory description on the Teensy 4.1 page. It's not as detailed at the reference manual, as it's meant to be more of an explanation than a detailed reference. But this is where the Teensy-specific details are covered. (if your browser doesn't take you directly to the memory info, scroll down to "Memory" or look for the intra-page navigation links near the top)


Another minor detail you might notice with function address is the LSB might be set. Traditional ARM supports 2 instruction sets, called ARM and Thumb. Cortex-M7 only supports Thumb instructions, but the ABI is defined by architectures which support both. The LSB indicates which type of instruction to execute. ARM instructions are 32 bits and must always be aligned to 32 bit boundaries. Thumb instructions are 16 or 32 bit and must be aligned to 16 bit boundaries. LSB=0 means to execute in ARM mode, and LSB=1 means to execute in Thumb mode. It will always be 1 on Teensy or any other Cortex-M microcontroller. Sometimes the GNU tools will show you the real address, other times you'll see something like 0x16d1, which of course should be understood to mean the function is really at 0x16d0 and the "1" simply means Thumb mode which is always the case on Teensy.
 
Last edited:
Paul, thank you for the clarification!
I got it all working now (mostly; but enough for my proof of concept).
 
Replaced by:

C++:
uint16_t HTONS(const uint16_t x)
{
        return (x << 8) | (x >> 8);
}

uint32_t HTONL(const uint32_t x)
{
        return (HTONS(x) << 16) | HTONS(x >> 16);
}
I would say this is definitely better than the original code (htonl/htons/ntohl/ntohs are functions on most platforms rather than macros) but could go one step further by using gcc's __builtin_bswap16/__builtin_bswap32 to leverage the CPU's byteswap instructions.
 
Back
Top