Programming Process; teensyduino -> assembly.

Status
Not open for further replies.

Cosford

Well-known member
Hi guys,

I'm studying a masters in Digital Systems engineering and so this is very much a learning experience.

What I'm hoping to do, is get an insight into the process behind going from Arduino code and Teensyduino to the assembly running on the chip itself. I want to learn more about the compilation process, linker scripts, bootloading and the programming process as-well as anything I've missed inbetween.

Is there a guide out there for learning this process and how to influence it? I'm aware of some of these steps, but have no idea on how or why they work.

Thanks in advance,

Iestyn.
 
Note, in general, except for what the IDE does, everything is part of a normal compiler process. It is pretty much the same on a hosted environment as an embedded processor, except on the embedded processor you are using a cross compiler that runs on the host that generates the binary that runs on a different platform (or is in a format that download tools can load to the platform).

In terms of the compilation, logically you have the preprocessor step (where #defines, #ifdef, #include, etc. are handled), then the main compilation step that produces object files. I say logically, because most compilers now do pre-processing as part of compilation. The object files are gathered together by the linker to make the executable. In host environments, the executable is marked with shared libraries that must be loaded, and a runtime loader, loads the main program and the shared libraries into the address jump and starts execution. In an embedded environment you typically convert the executable into a format that the download process uses (s-records, regular ELF files, etc.). Some download agent writes the code to flash memory, and arranges for the processor to start there when it is started up. Typically the first thing that the executable does is copy some values from flash memory to read/write memory (i.e. initialized variables), clear other bits of read/write memory, set up the stack and jump to the process.

This is greatly simplified of course, and at each step of the way there are differences. For example, the GCC compiler in creating the object code writes out an assembly code, and the assembler then reads the text and spits out the binary representation of the functions, while other compilers produce the object code directly.

In general, there are multiple ABI's (application binary interfaces) that come into play. ABI's for instance govern what the calling sequence is (what registers arguments are passed in and returned in, what the stack looks like, what sizes and representation are the basic types). Other ABI's govern what the object format looks like, etc. Sometimes, there is an official group that decides on the contents of the ABI, sometimes it is more ad-hoc.

Within the compiler there are many passes, and depending on the compiler, the code goes through many transformations. Some of these transformations use the same internal format, and add optimizations or flesh out details. Sometimes the transformation is from a more general high level representation to a more detailed format that might use a different representation.

For example, GCC's passes are listed at https://gcc.gnu.org/onlinedocs/gccint/Passes.html. Roughly, they are parsing/front end that produces the tree data structure, and then it goes through gimpilification, then SSA (static single assignment), and then RTL, which is a represention close to the machine language. Finally the backend takes the RTL, and emits assembly language.

I would imagine any intro to compilers class will go in more details. I took mine in 1975 - 1979, and the texts I used back then are long out of date. I currently work on the PowerPC backend of GCC (I have worked on x86, MIPS back-ends in the past, and several others). I have never worked on ARM nor AVR backends, and for these hobby projects where I play with microprocessors, I don't go delving into the details. However, the recent AVR compilers use the infrastructure I previously worked on for the SPU/CELL processor for dealing with multiple address spaces. I have plenty to do in my day job than to go chasing other work. For my purposes, they work, and I don't have to worry about it.
 
Last edited:
Thanks very much for that summary. I've taken a few notes but I have a few questions, based on some observations working with the Arduino IDE.

1) My understanding of a makefile, is that it is just a neat way of passing parameters to the compiler? Is that correct?
2) Looking at my teensyduino installation, there appears to be an .ld file for the two chips in the Teensy 3.0 and 3.1 respectively. What purpose does this .ld file serve and how does it fit into the grand scheme of things? In addition, why can I not seem to find similar files for the various other chips supported by the Arduino IDE?

Thanks once again,

Iestyn.
 
Hi, trying to keep it short...

1) "make" is a dependency checker and interpreter, deciding what steps are performed to transform source input into target results. Properly structured makefiles accelerate development by recognizing which dependencies have already been met, and enforcing rebuilding of targets for dependencies that have not been met. In a given environment after a modification to a subset of the source files this facility speeds up development significantly. Given that competent translation of source into target almost includes a description of translation parameters and options, make is also good at allowing the developer specify them.

2) The ".ld" files specify how the final linking step ("ld" probgram, whose execution is often minimally visible as the last compilation step in producing an executable target) should arrange the various object sections in memory. This is particularly important in the microcontroller world where memory tends to be a limited resource and has qualitatively different characteristics (flash vs RAM, fast vs slo, etc.).

I found a few files with a cursory search, there are likely others. In this case I used "lds" as the filename extension. Note the presence of the LINKER_SCRIPTS node in the path. :)

Code:
$ find . -name "*lds" -print
./firmwares/wifishield/wifiHD/src/SOFTWARE_FRAMEWORK/UTILS/LINKER_SCRIPTS/AT32UC3A/0512/GCC/link_uc3a0512.lds
./firmwares/wifishield/wifiHD/src/SOFTWARE_FRAMEWORK/UTILS/LINKER_SCRIPTS/AT32UC3A/1256/GCC/link_uc3a1256.lds
./firmwares/wifishield/wifi_dnld/src/SOFTWARE_FRAMEWORK/UTILS/LINKER_SCRIPTS/AT32UC3A/0512/GCC/link_uc3a0512.lds
./firmwares/wifishield/wifi_dnld/src/SOFTWARE_FRAMEWORK/UTILS/LINKER_SCRIPTS/AT32UC3A/1256/GCC/link_uc3a1256.lds
$
 
Ld scripts are commands to the linker to put things in certain locations. I.e., put the code and read-only constants in read-only flash at this logical address, put read/write variables at this other logical address (and optionally put the initial copy of those locations at a different location in read-only flash to allow the initializer to copy the contents from read-only memory to read-write memory, create the section which needs to be zeroed at this logical address, create interrupt vectors, etc at this logical address, etc. In the embedded environment, likely each different machine or variant has its own linker script to reflect the actual machine. The documentation is at the binutils ld page: http://sourceware.org/binutils/docs-2.25/ld/index.html.

The IDE doesn't really use the concept of make, and only rebuilding parts of the project that need rebuilding. It assumes you want to rebuild the entire project each time. In the hosted world, I have come across customers with such large scale applications that it takes over a full day to build from scratch (usually on the fastest machine at the time with the most amount of memory), and they need the ability to only rebuild small portions for the engineers to do their work.
 
The IDE doesn't really use the concept of make, and only rebuilding parts of the project that need rebuilding. It assumes you want to rebuild the entire project each time.... .

Assuming that by 'IDE' you mean Arduino; 1.6.3 is only rebuilding changed files for my latest project, quite clearly not rebuilding unchanged files till 'build options changed, rebuilding all' occurs (or Arduino is closed and re-opened on same file, first build is full for sure).
 
Assuming that by 'IDE' you mean Arduino; 1.6.3 is only rebuilding changed files for my latest project, quite clearly not rebuilding unchanged files till 'build options changed, rebuilding all' occurs (or Arduino is closed and re-opened on same file, first build is full for sure).
Ok, my builds are quick enough that I didn't notice the difference in build time the second time I did the build.
 
... but I have a few questions, based on some observations working with the Arduino IDE.

1) My understanding of a makefile, is that it is just a neat way of passing parameters to the compiler? Is that correct?

Arduino does not use make. The sample makefile in hardware/teensy/avr/cores/teensy3 is merely an example, for people who wish to use make instead of Arduino. It's not used at all by Arduino.

Instead, Arduino has its own code which runs gcc, similar to make. The source code is here:

https://github.com/PaulStoffregen/A...re/src/processing/app/debug/Compiler.java#L88

It's configured by a pair of files, boards.txt and platforms.txt. You can find some documentation on these files here:

https://github.com/arduino/Arduino/wiki/Arduino-IDE-1.5---3rd-party-Hardware-specification

Many of the finer details are only "documented" by the Java source code.

2) Looking at my teensyduino installation, there appears to be an .ld file for the two chips in the Teensy 3.0 and 3.1 respectively. What purpose does this .ld file serve and how does it fit into the grand scheme of things?

Linking is a step near the end where all the compiled object files are combined into the final executable code. Compiling and linking have been pretty standard practice since at least the 1970s, so you can find plenty of (perhaps very old) info about how the process works.

Linking can be very complex on normal operating systems like Windows, Mac OS-X and Linux, where shared libraries are resolved to different addresses at runtime. Much of the generic info you might find about linking will not apply to the simpler case on Teensy and Arduino, where a single, fully static output image is generated.

The specific details for how the linker arranges the output vary quite a lot. The gcc linker uses these scripts to allow customizing all the details.

In addition, why can I not seem to find similar files for the various other chips supported by the Arduino IDE?

For AVR chips, they are "built in" to the toolchain. You can find them in hardware/tools/avr/avr/lib/ldscripts. Different extensions are used. I believe "avr5.x" may be the most commonly used by many boards.

The avr-gcc developers decided to create pre-made scripts for all AVR chips. As you can see from the number of files in hardware/tools/avr/avr/lib/ldscripts, that takes a LOT of work, for chips from just one manufacturer.

For ARM, the chips are made by MANY different companies. ARM has also existed for much longer than AVR, including many companies who license the ARM intellectual property to make their own chips. For ARM, the compile developers do not provide ready-made linker scripts, so there is no "built in" directory like hardware/tools/avr/avr/lib/ldscripts for ARM chips. That's why you can see the .ld scripts in hardware/teensy/avr/cores/teensy3.
 
Last edited:
Status
Not open for further replies.
Back
Top