Profiling using gprof on Teensy 4

Status
Not open for further replies.

ftrias

Well-known member
I created a project to partially support gprof-style profiling of applications on Teensy 4. This shows how much time is spent in each function. In theory, this could also work on Teensy 3, but it doesn't have enough memory to do it well so I didn't implement that yet. This is the repository: https://github.com/ftrias/TeensyProf

It will only support the histogram feature of gprof. The Teensyduino arm cross-compiler does not support more advanced features (but it could! ...if Paul includes gprof executable and libc_p.a). You will need to make a few changes to the Arduino setup described in the README.md file.

Overview
-------------

The profiler samples the current instruction every 1 millisecond. That is at each millisecond it looks to see what function is currently running and keeps a counter of how many times this happens. Over time, this gives an approximate value of how much time each function consumes.

On desktops, gprof will also keep track of which function calls which, but that is not supported on Teensy 4.

After a while, a file with the function counters is sent out to the serial port. A python program is listening on the serial port for this and writes out "gmon.out". This file is then cross-referenced by gprof with the original executable to generate a table of execution times.

For an interesting overview of gprof: http://wwwcdf.pd.infn.it/localdoc/gprof.pdf

Output of Tests Example
------------------

Code:
Flat profile:

Each sample counts as 0.001 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
 50.00      5.08     5.08                             stall_3()
 33.33      8.46     3.38                             stall_2()
 16.66     10.15     1.69                             stall_1()
  0.01     10.15     0.00                             endpoint0_receive

Edit: Explained serial port.
 
Last edited:
Sounds interesting, I'll give it a try.

The output you show is generated by the Linux/Mac/Win executable which is listening on a serial port right?
 
Sounds interesting, I'll give it a try.

The output you show is generated by the Linux/Mac/Win executable which is listening on a serial port right?

I should have explained more. When compiling ends, it copies the elf file to /tmp/build.elf. Then the program uploads and runs as normal. When the profiling ends, the Teensy (using the library) sends a specially formatted data to the serial port. There is a python program listening to the serial port on the desktop. It detects the special codes and writes out the gmon.out file. Then it executes gprof on it cross-referencing the /tmp/build.elf binary.
 
I created a project to partially support gprof-style profiling of applications on Teensy 4. This shows how much time is spent in each function. In theory, this could also work on Teensy 3, but it doesn't have enough memory to do it well so I didn't implement that yet. This is the repository: https://github.com/ftrias/TeensyProf

It will only support the histogram feature of gprof. The Teensyduino arm cross-compiler does not support more advanced features (but it could! ...if Paul includes gprof executable and libc_p.a). You will need to make a few changes to the Arduino setup described in the README.md file.

Overview
-------------

The profiler samples the current instruction every 1 millisecond. That is at each millisecond it looks to see what function is currently running and keeps a counter of how many times this happens. Over time, this gives an approximate value of how much time each function consumes.

On desktops, gprof will also keep track of which function calls which, but that is not supported on Teensy 4.

After a while, a file with the function counters is sent out to the serial port. A python program is listening on the serial port for this and writes out "gmon.out". This file is then cross-referenced by gprof with the original executable to generate a table of execution times.

For an interesting overview of gprof: http://wwwcdf.pd.infn.it/localdoc/gprof.pdf

Output of Tests Example
------------------

Code:
Flat profile:

Each sample counts as 0.001 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
 50.00      5.08     5.08                             stall_3()
 33.33      8.46     3.38                             stall_2()
 16.66     10.15     1.69                             stall_1()
  0.01     10.15     0.00                             endpoint0_receive

Edit: Explained serial port.

I forgot to mention this other modification:

Modify `imxrt1062.ld` in the directory Arduino.../Contents/Java/hardware/teensy/avr/cores/teensy4. All references to `.text.itcm` must be changed to `.text`. Gprof expects the code to be in a segment named `.text`.
 
Unsatisfied with the limits of this library, I decided to do a more complete implementation of gprof. I used this previous work by Erich Styger as a guide: https://mcuoneclipse.com/2015/08/23/tutorial-using-gnu-profiling-gprof-with-arm-cortex-m/. Clearly, I have too much time on my hands.

The result works for Teensy 3 & 4 and is in: https://github.com/ftrias/TeensyGProf

It works similarly to my last attempt. The main difference is that it also implements the call hierarchy and thus you can see which function calls which and the total cumulative time used by functions. This is very helpful.

Setup is a little more involved and requires modifying Teensyduino files `boards.txt`, `platform.txt` and `imxrt1062.ld`. See README.md for details.

`imxrt1062.ld` is modified to put all the code in a segment called `.text`. gprof is hard-coded to expect this because it's an almost universal convention. Teensy 3 does this. But for some reason, Teensy 4 does not.

Please post here if you try. I'm curious to know how it works for people.

Here is a partial output of the Tests example.

Code:
#include "TeensyGProf.h"

int stall_3() {
  volatile int x;
  for(int i=0; i<300; i++) {
    for(int j=0; j<10000; j++) {
      x = 0;
    }
  }
  return x;
}
int stall_2() {
  volatile int x;
  for(int i=0; i<200; i++) {
    for(int j=0; j<10000; j++) {
      x = 0;
    }
  }
  return x;
}
int stall_1() {
  volatile int x;
  for(int i=0; i<100; i++) {
    for(int j=0; j<10000; j++) {
      x = 0;
    }
  }
  return x;
}

void runtests() {
  stall_1();
  stall_2();
  stall_3();
}

void setup() {
  Serial.begin(115200);
}

long start = millis();

void loop() {
  if (start) Serial.println("loop");
  runtests();
  if (start && millis() - start > 10000) {
    gprof_end();
    start = 0;
  }
}

OUTPUT:
Code:
Flat profile:

Each sample counts as 0.001 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 49.28      4.96     4.96      110     0.05     0.05  stall_3()
 32.83      8.26     3.30        1     3.30     3.30  stall_2()
 16.44      9.91     1.65        1     1.65     1.65  stall_1()
  0.61      9.97     0.06                             _mcount_internal
  0.57     10.03     0.06                             usb_transfer_status
  0.11     10.04     0.01   210058     0.00     0.00  yield
  0.05     10.04     0.01   210167     0.00     0.00  EventResponder::runFromYield()
  0.04     10.05     0.00                             __gnu_mcount_nc
  0.03     10.05     0.00                             micros
  0.02     10.05     0.00                             usb_serial_available
  0.01     10.05     0.00                             usb_serial_class::available()
  0.01     10.06     0.00                             delay
  0.01     10.06     0.00                             gprof_end
  0.00     10.06     0.00    10289     0.00     0.00  MillisTimer::runFromTimer()
  0.00     10.06     0.00    10289     0.00     0.00  MillisTimer::enableTimerInterrupt(bool)
  0.00     10.06     0.00    10289     0.00     0.00  MillisTimer::disableTimerInterrupt()
  0.00     10.06     0.00    10288     0.00     0.00  systick_isr
  0.00     10.06     0.00      110     0.00     0.09  runtests()
  0.00     10.06     0.00      110     0.00     0.00  Print::println()
  0.00     10.06     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z7stall_3v
  0.00     10.06     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
	   else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this
	   function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
	   the function in the gprof listing. If the index is
	   in parenthesis it shows where it would appear in
	   the gprof listing if it were to be printed.


Copyright (C) 2012-2015 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.


		     Call graph (explanation follows)


granularity: each sample hit covers 8 byte(s) for 0.01% of 10.06 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     98.5    0.00    9.91                 loop [1]
                0.00    9.91     110/110         runtests() [2]
-----------------------------------------------
                0.00    9.91     110/110         loop [1]
[2]     98.5    0.00    9.91     110         runtests() [2]
                4.96    0.00     110/110         stall_3() [3]
                3.30    0.00       1/1           stall_2() [4]
                1.65    0.00       1/1           stall_1() [5]
-----------------------------------------------
                4.96    0.00     110/110         runtests() [2]
[3]     49.3    4.96    0.00     110         stall_3() [3]
-----------------------------------------------
                3.30    0.00       1/1           runtests() [2]
[4]     32.8    3.30    0.00       1         stall_2() [4]
-----------------------------------------------
                1.65    0.00       1/1           runtests() [2]
[5]     16.4    1.65    0.00       1         stall_1() [5]
-----------------------------------------------
                                                 <spontaneous>
[6]      0.6    0.06    0.00                 _mcount_internal [6]
-----------------------------------------------
                                                 <spontaneous>
[7]      0.6    0.06    0.00                 usb_transfer_status [7]
-----------------------------------------------
                0.00    0.00   43836/210058      delay [11]
                0.01    0.00  166222/210058      usb_serial_write [9]
[8]      0.2    0.01    0.01  210058         yield [8]
                0.01    0.00  210167/210167      EventResponder::runFromYield() [10]
-----------------------------------------------
                                                 <spontaneous>
[9]      0.1    0.00    0.01                 usb_serial_write [9]
                0.01    0.00  166222/210058      yield [8]
-----------------------------------------------
                0.01    0.00  210167/210167      yield [8]
[10]     0.0    0.01    0.00  210167         EventResponder::runFromYield() [10]
-----------------------------------------------
                                                 <spontaneous>
[11]     0.0    0.00    0.00                 delay [11]
                0.00    0.00   43836/210058      yield [8]
-----------------------------------------------
                                                 <spontaneous>
[12]     0.0    0.00    0.00                 __gnu_mcount_nc [12]
-----------------------------------------------
                                                 <spontaneous>
[13]     0.0    0.00    0.00                 micros [13]
-----------------------------------------------
                                                 <spontaneous>
[14]     0.0    0.00    0.00                 usb_serial_available [14]
-----------------------------------------------
                                                 <spontaneous>
[15]     0.0    0.00    0.00                 gprof_end [15]
-----------------------------------------------
                                                 <spontaneous>
[16]     0.0    0.00    0.00                 usb_serial_class::available() [16]
-----------------------------------------------
                0.00    0.00   10288/10288       gprof_systick_isr [37]
[17]     0.0    0.00    0.00   10288         systick_isr [17]
                0.00    0.00   10289/10289       MillisTimer::runFromTimer() [87]
-----------------------------------------------
                0.00    0.00   10289/10289       systick_isr [17]
[87]     0.0    0.00    0.00   10289         MillisTimer::runFromTimer() [87]
                0.00    0.00   10289/10289       MillisTimer::disableTimerInterrupt() [89]
                0.00    0.00   10289/10289       MillisTimer::enableTimerInterrupt(bool) [88]
-----------------------------------------------
                0.00    0.00   10289/10289       MillisTimer::runFromTimer() [87]
[88]     0.0    0.00    0.00   10289         MillisTimer::enableTimerInterrupt(bool) [88]
-----------------------------------------------
                0.00    0.00   10289/10289       MillisTimer::runFromTimer() [87]
[89]     0.0    0.00    0.00   10289         MillisTimer::disableTimerInterrupt() [89]
-----------------------------------------------
                0.00    0.00     110/110         Print::println(char const*) [113]
[90]     0.0    0.00    0.00     110         Print::println() [90]
-----------------------------------------------
                0.00    0.00       1/1           __libc_init_array [123]
[91]     0.0    0.00    0.00       1         _GLOBAL__sub_I__Z7stall_3v [91]
                0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [92]
-----------------------------------------------
                0.00    0.00       1/1           _GLOBAL__sub_I__Z7stall_3v [91]
[92]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [92]
-----------------------------------------------
 
Last edited:
Wanted to try in steps and started with the following (I'm using a makefile & vsCode not the Arduino IDE)

Board: Teensy 3.2
Changed/added compile flags: -O0 -gp

Result:
Compiling a test sketch, your library and linking does work without problems.
But: The program seems to badly mess up something. The Teensy vanishes from the USB bus and a simple blink does not work anymore. Need to press the program button to revive the Board.

If I link with -gp it complains about the missing libc_p (which was expected I think)

Any ideas?
 
Wanted to try in steps and started with the following (I'm using a makefile & vsCode not the Arduino IDE)

Board: Teensy 3.2
Changed/added compile flags: -O0 -gp

Result:
Compiling a test sketch, your library and linking does work without problems.
But: The program seems to badly mess up something. The Teensy vanishes from the USB bus and a simple blink does not work anymore. Need to press the program button to revive the Board.

If I link with -gp it complains about the missing libc_p (which was expected I think)

Any ideas?

One theory: I believe this happens when you use the "-pg" option in the link stage. This is why the "boards.txt" file adds a new parameter "profile" with "-O0 -pg" and then only adds it in "platform.txt" in the compile stage of C++ files. If you just add it to the "optimize" parameter in "boards.txt" it won't work.
 
Unfortunately not. I only set the -pg for the compiler not the linker

Here the makefile. See the section Flags and Defines.
Code:
#******************************************************************************
# Generated by VisualTeensy (https://github.com/luni64/VisualTeensy)
#
# Board              Teensy 3.2 / 3.1
# USB Type           Serial
# CPU Speed          96 MHz (overclock)
# Optimize           Faster
# Keyboard Layout    US English
#
# 24.08.2019 19:29
#******************************************************************************
SHELL            := cmd.exe
export SHELL

TARGET_NAME      := profTest
BOARD_ID         := TEENSY31
MCU              := mk20dx256

LIBS_SHARED_BASE := C:\Users\lutz\Documents\Arduino\libraries
LIBS_SHARED      := TeensyGProf

LIBS_LOCAL_BASE  := lib
LIBS_LOCAL       :=

CORE_BASE        := C:\PROGRA~1\Arduino\ARDUIN~1.8\hardware\teensy\avr\cores\teensy3
GCC_BASE         := C:\PROGRA~1\Arduino\ARDUIN~1.8\hardware\tools\arm\bin
UPL_PJRC_B       := C:\PROGRA~1\Arduino\ARDUIN~1.8\hardware\tools
UPL_TYCMD_B      := C:\toolchain\TyTools
UPL_JLINK_B      := C:\PROGRA~2\SEGGER\JLINK_~1

#******************************************************************************
# Flags and Defines
#******************************************************************************
FLAGS_CPU   := -mthumb -mcpu=cortex-m4 -fsingle-precision-constant
FLAGS_OPT   := -O0
FLAGS_COM   := -g -pg -Wall -ffunction-sections -fdata-sections -nostdlib -MMD
FLAGS_LSP   :=

FLAGS_CPP   := -fno-exceptions -felide-constructors -std=gnu++14 -Wno-error=narrowing -fno-rtti
FLAGS_C     :=
FLAGS_S     := -x assembler-with-cpp
FLAGS_LD    := -Wl,--gc-sections,--relax,--defsym=__rtc_localtime=$(shell powershell [int](Get-Date -UFormat +%s)[0]) -T$(CORE_BASE)/mk20dx256.ld -lstdc++

LIBS        := -larm_cortexM4l_math -lm

DEFINES     := -D__MK20DX256__ -DTEENSYDUINO=145 -DARDUINO=10807
DEFINES     += -DF_CPU=96000000 -DUSB_SERIAL -DLAYOUT_US_ENGLISH

CPP_FLAGS   := $(FLAGS_CPU) $(FLAGS_OPT) $(FLAGS_COM) $(DEFINES) $(FLAGS_CPP)
C_FLAGS     := $(FLAGS_CPU) $(FLAGS_OPT) $(FLAGS_COM) $(DEFINES) $(FLAGS_C)
S_FLAGS     := $(FLAGS_CPU) $(FLAGS_OPT) $(FLAGS_COM) $(DEFINES) $(FLAGS_S)
LD_FLAGS    := $(FLAGS_CPU) $(FLAGS_OPT) $(FLAGS_LSP) $(FLAGS_LD)
AR_FLAGS    := rcs

#******************************************************************************
# Colors
#******************************************************************************
COL_CORE    := 
COL_LIB     := 
COL_SRC     := 
COL_LINK    := 
COL_ERR     := 
COL_OK      := 
COL_RESET   := 

#******************************************************************************
# Folders and Files
#******************************************************************************
USR_SRC     := src
LIB_SRC     := lib
CORE_SRC    := $(CORE_BASE)

BIN         := .vsteensy/build
USR_BIN     := $(BIN)/src
CORE_BIN    := $(BIN)/core
LIB_BIN     := $(BIN)/lib
CORE_LIB    := $(BIN)/core.a
TARGET_HEX  := $(BIN)/$(TARGET_NAME).hex
TARGET_ELF  := $(BIN)/$(TARGET_NAME).elf
TARGET_LST  := $(BIN)/$(TARGET_NAME).lst

#******************************************************************************
# BINARIES
#******************************************************************************
CC          := $(GCC_BASE)/arm-none-eabi-gcc
CXX         := $(GCC_BASE)/arm-none-eabi-g++
AR          := $(GCC_BASE)/arm-none-eabi-gcc-ar
OBJCOPY     := $(GCC_BASE)/arm-none-eabi-objcopy
SIZE        := $(GCC_BASE)/arm-none-eabi-size
OBJDUMP     := $(GCC_BASE)/arm-none-eabi-objdump
UPL_PJRC    := "$(UPL_PJRC_B)/teensy_post_compile" -test -file=$(TARGET_NAME) -path=$(BIN) -tools="$(UPL_PJRC_B)" -board=$(BOARD_ID) -reboot
UPL_TYCMD   := $(UPL_TYCMD_B)/tyCommanderC upload $(TARGET_HEX) --autostart --wait --multi
UPL_CLICMD  := $(UPL_CLICMD_B)/teensy_loader_cli -mmcu=$(MCU) -v $(TARGET_HEX)
UPL_JLINK   := $(UPL_JLINK_B)/jlink -commanderscript .vsteensy/flash.jlink

#******************************************************************************
# Source and Include Files
#******************************************************************************
# Recursively create list of source and object files in USR_SRC and CORE_SRC
# and corresponding subdirectories.
# The function rwildcard is taken from http://stackoverflow.com/a/12959694)

rwildcard =$(wildcard $1$2) $(foreach d,$(wildcard $1*),$(call rwildcard,$d/,$2))

#User Sources -----------------------------------------------------------------
USR_C_FILES      := $(call rwildcard,$(USR_SRC)/,*.c)
USR_CPP_FILES    := $(call rwildcard,$(USR_SRC)/,*.cpp)
USR_S_FILES      := $(call rwildcard,$(USR_SRC)/,*.S)
USR_OBJ          := $(USR_S_FILES:$(USR_SRC)/%.S=$(USR_BIN)/%.o) $(USR_C_FILES:$(USR_SRC)/%.c=$(USR_BIN)/%.o) $(USR_CPP_FILES:$(USR_SRC)/%.cpp=$(USR_BIN)/%.o)

# Core library sources --------------------------------------------------------
CORE_CPP_FILES   := $(call rwildcard,$(CORE_SRC)/,*.cpp)
CORE_C_FILES     := $(call rwildcard,$(CORE_SRC)/,*.c)
CORE_S_FILES     := $(call rwildcard,$(CORE_SRC)/,*.S)
CORE_OBJ         := $(CORE_S_FILES:$(CORE_SRC)/%.S=$(CORE_BIN)/%.o) $(CORE_C_FILES:$(CORE_SRC)/%.c=$(CORE_BIN)/%.o) $(CORE_CPP_FILES:$(CORE_SRC)/%.cpp=$(CORE_BIN)/%.o)


# User library sources  (see) https://github.com/arduino/arduino/wiki/arduino-ide-1.5:-library-specification
LIB_DIRS_SHARED  := $(foreach d, $(LIBS_SHARED), $(LIBS_SHARED_BASE)/$d/ $(LIBS_SHARED_BASE)/$d/utility/)  # base and /utility
LIB_DIRS_SHARED  += $(foreach d, $(LIBS_SHARED), $(LIBS_SHARED_BASE)/$d/src/ $(dir $(call rwildcard,$(LIBS_SHARED_BASE)/$d/src/,*/.)))                           # src and all subdirs of base

LIB_DIRS_LOCAL   := $(foreach d, $(LIBS_LOCAL), $(LIBS_LOCAL_BASE)/$d/ $(LIBS_LOCAL_BASE)/$d/utility/ )    # base and /utility
LIB_DIRS_LOCAL   += $(foreach d, $(LIBS_LOCAL), $(LIBS_LOCAL_BASE)/$d/src/ $(dir $(call rwildcard,$(LIBS_LOCAL_BASE)/$d/src/,*/.)))                             # src and all subdirs of base

LIB_CPP_SHARED   := $(foreach d, $(LIB_DIRS_SHARED),$(call wildcard,$d*.cpp))
LIB_C_SHARED     := $(foreach d, $(LIB_DIRS_SHARED),$(call wildcard,$d*.c))
LIB_S_SHARED     := $(foreach d, $(LIB_DIRS_SHARED),$(call wildcard,$d*.S))

LIB_CPP_LOCAL    := $(foreach d, $(LIB_DIRS_LOCAL),$(call wildcard,$d/*.cpp))
LIB_C_LOCAL      := $(foreach d, $(LIB_DIRS_LOCAL),$(call wildcard,$d/*.c))
LIB_S_LOCAL      := $(foreach d, $(LIB_DIRS_LOCAL),$(call wildcard,$d/*.S))

LIB_OBJ          := $(LIB_CPP_SHARED:$(LIBS_SHARED_BASE)/%.cpp=$(LIB_BIN)/%.o)  $(LIB_CPP_LOCAL:$(LIBS_LOCAL_BASE)/%.cpp=$(LIB_BIN)/%.o)
LIB_OBJ          += $(LIB_C_SHARED:$(LIBS_SHARED_BASE)/%.c=$(LIB_BIN)/%.o)  $(LIB_C_LOCAL:$(LIBS_LOCAL_BASE)/%.c=$(LIB_BIN)/%.o)
LIB_OBJ          += $(LIB_S_SHARED:$(LIBS_SHARED_BASE)/%.S=$(LIB_BIN)/%.o)  $(LIB_S_LOCAL:$(LIBS_LOCAL_BASE)/%.S=$(LIB_BIN)/%.o)

# Includes -------------------------------------------------------------
INCLUDE          := -I./$(USR_SRC) -I$(CORE_SRC)
INCLUDE          += $(foreach d, $(LIB_DIRS_SHARED), -I$d)
INCLUDE          += $(foreach d, $(LIB_DIRS_LOCAL), -I$d)

# Generate directories --------------------------------------------------------
DIRECTORIES :=  $(sort $(dir $(CORE_OBJ) $(USR_OBJ) $(LIB_OBJ)))
generateDirs := $(foreach d, $(DIRECTORIES), $(shell if not exist "$d" mkdir "$d"))


#******************************************************************************
# Rules:
#******************************************************************************

.PHONY: directories all rebuild upload uploadTy uploadCLI clean cleanUser cleanCore


all:  $(TARGET_LST) $(TARGET_HEX)

rebuild: cleanUser all

clean:   cleanUser cleanCore cleanLib
	@echo $(COL_OK)cleaning done$(COL_RESET)

upload: all
	@$(UPL_PJRC)

uploadTy: all
	@$(UPL_TYCMD)

uploadCLI: all
	@$(UPL_CLICMD)

uploadJLink: all
	@$(UPL_JLINK)


# Core library ----------------------------------------------------------------
$(CORE_BIN)/%.o: $(CORE_SRC)/%.S
	@echo $(COL_CORE)CORE [ASM] $(notdir $<) $(COL_ERR)
	@"$(CC)" $(S_FLAGS) $(INCLUDE) -o $@ -c $<

$(CORE_BIN)/%.o: $(CORE_SRC)/%.c
	@echo $(COL_CORE)CORE [CC]  $(notdir $<) $(COL_ERR)
	@"$(CC)" $(C_FLAGS) $(INCLUDE) -o $@ -c $<

$(CORE_BIN)/%.o: $(CORE_SRC)/%.cpp
	@echo $(COL_CORE)CORE [CPP] $(notdir $<) $(COL_ERR)
	@"$(CXX)" $(CPP_FLAGS) $(INCLUDE) -o $@ -c $<

$(CORE_LIB) : $(CORE_OBJ)
	@echo $(COL_LINK)CORE [AR]  $@ $(COL_ERR)
	@$(AR) $(AR_FLAGS) $@ $^
	@echo $(COL_OK)Teensy core built successfully &&echo.

# Shared Libraries ------------------------------------------------------------
$(LIB_BIN)/%.o: $(LIBS_SHARED_BASE)/%.S
	@echo $(COL_LIB)LIB  [ASM] $(notdir $<) $(COL_ERR)
	@"$(CC)" $(S_FLAGS) $(INCLUDE) -o $@ -c $<

$(LIB_BIN)/%.o: $(LIBS_SHARED_BASE)/%.cpp
	@echo $(COL_LIB)LIB  [CPP] $(notdir $<) $(COL_ERR)
	@"$(CXX)" $(CPP_FLAGS) $(INCLUDE) -o $@ -c $<

$(LIB_BIN)/%.o: $(LIBS_SHARED_BASE)/%.c
	@echo $(COL_LIB)LIB  [CC]  $(notdir $<) $(COL_ERR)
	@"$(CC)" $(C_FLAGS) $(INCLUDE) -o $@ -c $<

# Local Libraries -------------------------------------------------------------
$(LIB_BIN)/%.o: $(LIBS_LOCAL_BASE)/%.S
	@echo $(COL_LIB)LIB  [ASM] $(notdir $<) $(COL_ERR)
	@"$(CC)" $(S_FLAGS) $(INCLUDE) -o $@ -c $<

$(LIB_BIN)/%.o: $(LIBS_LOCAL_BASE)/%.cpp
	@echo $(COL_LIB)LIB  [CPP] $(notdir $<) $(COL_ERR)
	@"$(CXX)" $(CPP_FLAGS) $(INCLUDE) -o $@ -c $<

$(LIB_BIN)/%.o: $(LIBS_LOCAL_BASE)/%.c
	@echo $(COL_LIB)LIB  [CC]  $(notdir $<) $(COL_ERR)
	@"$(CC)" $(C_FLAGS) $(INCLUDE) -o $@ -c $<

# Handle user sources ---------------------------------------------------------
$(USR_BIN)/%.o: $(USR_SRC)/%.S
	@echo $(COL_SRC)USER [ASM] $< $(COL_ERR)
	@"$(CC)" $(S_FLAGS) $(INCLUDE) -o $@ -c $<

$(USR_BIN)/%.o: $(USR_SRC)/%.c
	@echo $(COL_SRC)USER [CC]  $(notdir $<) $(COL_ERR)
	@"$(CC)" $(C_FLAGS) $(INCLUDE) -o "$@" -c $<

$(USR_BIN)/%.o: $(USR_SRC)/%.cpp
	@echo $(COL_SRC)USER [CPP] $(notdir $<) $(COL_ERR)
	@"$(CXX)" $(CPP_FLAGS) $(INCLUDE) -o "$@" -c $<

# Linking ---------------------------------------------------------------------
$(TARGET_ELF): $(CORE_LIB) $(LIB_OBJ) $(USR_OBJ)
	@echo $(COL_LINK)
	@echo [LD]  $@ $(COL_ERR)
	@$(CC) $(LD_FLAGS) -o "$@" $(USR_OBJ) $(LIB_OBJ) $(CORE_LIB) $(LIBS)
	@echo $(COL_OK)User code built and linked to libraries &&echo.

%.lst: %.elf
	@echo $(COL_LINK)
	@echo [LST] $@ $(COL_ERR)
	@$(OBJDUMP) -d -S --demangle --no-show-raw-insn --syms "$<"  > "$@"
	@echo $(COL_OK)Sucessfully built project$(COL_RESET) &&echo.

%.hex: %.elf
	@echo $(COL_LINK)[HEX] $@
	@$(SIZE) "$<"
	@$(OBJCOPY) -O ihex -R.eeprom "$<" "$@"


# Cleaning --------------------------------------------------------------------
cleanUser:
	@echo $(COL_LINK)Cleaning user binaries...$(COL_RESET)
	@if exist $(USR_BIN) rd /s/q "$(USR_BIN)"
	@if exist "$(TARGET_LST)" del $(subst /,\,$(TARGET_LST))

cleanCore:
	@echo $(COL_LINK)Cleaning core binaries...$(COL_RESET)
	@if exist $(CORE_BIN) rd /s/q "$(CORE_BIN)"
	@if exist $(CORE_LIB) del  $(subst /,\,$(CORE_LIB))

cleanLib:
	@echo $(COL_LINK)Cleaning user library binaries...$(COL_RESET)
	@if exist $(LIB_BIN) rd /s/q "$(LIB_BIN)"

# compiler generated dependency info ------------------------------------------
-include $(CORE_OBJ:.o=.d)
-include $(USR_OBJ:.o=.d)
-include $(LIB_OBJ:.o=.d)
 
Tried it with a T3.6, same result, Teensy crashes and vanishes from the bus.
 
Tried it with a T3.6, same result, Teensy crashes and vanishes from the bus.

That's a sophisticated Makefile.

As a first pass, remove FLAGS_COM from C_FLAGS and S_FLAGS. You probably don't want to put instrumentation on the core teensy files.

Can you remove the "@" silencing prefixes and post the output of a clean build so we can see exactly all the commands and command line options that are run?

I didn't try it on T3.4/6.
 
That's a sophisticated Makefile.
Yes, grew somehow over the years. Anyway, the flags are automatically extracted from boards.txt so they should be OK.

Doesn't link without the Flags_COM on the c and assembler files. Do you want me to just remove the -pg on those files?

Code:
CORE [CPP] HardwareSerial4.cpp 
CORE [CPP] AudioStream.cpp 
CORE [CPP] new.cpp 
CORE [CPP] WString.cpp 
CORE [CPP] IPAddress.cpp 
LIB  [CC]  gmon.c 
CORE [CPP] WMath.cpp 
LIB  [CPP] TeensyFile.cpp 
LIB  [CC]  profil.c 
CORE [CPP] HardwareSerial6.cpp 
CORE [CPP] main.cpp 
LIB  [ASM] profiler.S 
CORE [CPP] Stream.cpp 
USER [CPP] main.cpp 
CORE [AR]  .vsteensy/build/core.a 
Teensy core built successfully 


[LD]  .vsteensy/build/profTest.elf
.vsteensy/build/core.a(pins_teensy.o): In function `attachInterrupt':
pins_teensy.c:(.text+0x1b2): undefined reference to `getPinIndex'
.vsteensy/build/core.a(pins_teensy.o): In function `detachInterrupt':
pins_teensy.c:(.text+0x23a): undefined reference to `getPinIndex'
collect2.exe: error: ld returned 1 exit status
make.exe: *** [makefile:235: .vsteensy/build/profTest.elf] Error 1
The terminal process terminated with exit code: 2
 
Ah ha, doesn't crash if I apply the -pg only tho the cpp and .S files. Lets see if it outputs something...
 
Yes, grew somehow over the years. Anyway, the flags are automatically extracted from boards.txt so they should be OK.

Doesn't link without the Flags_COM on the c and assembler files. Do you want me to just remove the -pg on those files?

Code:
CORE [CPP] HardwareSerial4.cpp 
CORE [CPP] AudioStream.cpp 
CORE [CPP] new.cpp 
CORE [CPP] WString.cpp 
CORE [CPP] IPAddress.cpp 
LIB  [CC]  gmon.c 
CORE [CPP] WMath.cpp 
LIB  [CPP] TeensyFile.cpp 
LIB  [CC]  profil.c 
CORE [CPP] HardwareSerial6.cpp 
CORE [CPP] main.cpp 
LIB  [ASM] profiler.S 
CORE [CPP] Stream.cpp 
USER [CPP] main.cpp 
CORE [AR]  .vsteensy/build/core.a 
Teensy core built successfully 


[LD]  .vsteensy/build/profTest.elf
.vsteensy/build/core.a(pins_teensy.o): In function `attachInterrupt':
pins_teensy.c:(.text+0x1b2): undefined reference to `getPinIndex'
.vsteensy/build/core.a(pins_teensy.o): In function `detachInterrupt':
pins_teensy.c:(.text+0x23a): undefined reference to `getPinIndex'
collect2.exe: error: ld returned 1 exit status
make.exe: *** [makefile:235: .vsteensy/build/profTest.elf] Error 1
The terminal process terminated with exit code: 2

Yes. Don't use -pg on c or S files.

Have you tried it with the Arduino app, as described in README.md?

Also, I'd like to see the complete command line options used for each compile to see if it's different than how my system compiles it.

It doesn't seem like the errors shown have anything to do with TeensyGProf. Are you using different "-O" options for different compilations?
 
Ok, writes weird stuff on Serial after the waiting period.
Next step will be to generate a file out of this. Don't want to install python for that. Is it just binary data which needs to be stored to a file?
 
Ok, writes weird stuff on Serial after the waiting period.
Next step will be to generate a file out of this. Don't want to install python for that. Is it just binary data which needs to be stored to a file?

You can look at TeensyFile.cpp for the very simple format used.

You can also configure it to write out the file using hexadecimal and then use something like hex2bin to convert it to a file.
 
One remark:
Would be great if one could pass a Stream into the file generation instead of the hardcoded Serial as output. I more or less always use Serial for PC/Teensy command communication so, having the file spit out on USB Serial is rather troublesome.
 
One remark:
Would be great if one could pass a Stream into the file generation instead of the hardcoded Serial as output. I more or less always use Serial for PC/Teensy command communication so, having the file spit out on USB Serial is rather troublesome.

Very interesting. You can, of course, add your own implementation of the TeensyProf_open(), TeensyProf_close() and TeensyProf_write() functions to send the output to where you want. They're very simple and emulate the standard library open(), close() and write(). I have included 4 implementations: SD card, Serial using a binary format, Serial using hex, and Midi using Sysex messages.
 
Status
Not open for further replies.
Back
Top