Assembly

Are there others tools than arduino to program a Teensy board ? Arduino GCC compiler produces an executable code which is far from beeing optimised. What is the way to use GNU AS assembler configured,for ARM cortex M7 ?
 
You have 2 basic options. They are the same regardless of Arduino IDE versus another IDE like PlatformIO or just running the compiler from a makefile. Support for assembly is a feature of the underlying toolchain and it works quite well with Arduino IDE.

Inline assembly within .c or .cpp files is usually the "best" way. But the special syntax for input,output, clobber registers is a bit complicated. This gives the best integration with the rest of the non-asm code. You can implement entire functions or just a portion of any function this way.

You can also create "pure" assembly .s files. Generally you would implement only entire functions. Symbols you export will be in C namespace, so to access them from C++ code you would declare them with extern "c" the same as accessing non-static stuff from C code.

However, you can indeed get very well optimized code from the compiler, but with some extra work. Usually the process involves checking the generated assembly. Using Teensy from Arduino IDE, the compiler will create a .lst file with full disassembly of your code. It's written to a temporary folder, so in Arduino IDE you may need to use File > Preferences to turn on verbose output so you can see the compiler commads. Then look at the pathnames to learn the temp folder.

As you read the generated assembly, you'll see the result it sometimes quite inefficient. But you can also learn *why* it's inefficient. The most common problem is using more "alive" local variables than the available CPU registers. When this happens, the compiler is forced to allocate local variables on the stack, which is much slower. In some other cases, you'll see it's using only a few of the registers, which is usually an opportunity to partially unroll a loop or otherwise restructure your code. This takes extra work to recompile and reload the generated .lst file, but it's still a lot less work than micro-managing every register and instruction by writing asm.

You'll also see the compiler often re-orders things in strange ways. It tries to take advantage of M7's dual issue to execute 2 instructions in 1 clock cycle. M7 have complex rules about data dependency. Maybe you know all that stuff very well and would compose your asm code that way from the beginning? I have personally written a lot of asm code on older microcontrollers and, at least to me, the ordering of instructions the compiler does almost never looks obvious. This too is a place where you (might) save a lot of time and end up with nearly the same result by letting the compile generate the asm code, but check the .lst file after each compile and adjust the C / C++ source so it can generate more efficient code.

But if that's not appealing and you really want to write asm code yourself, you certainly can. It works fine with Arduino IDE. You mainly need to choose whether you'll compile .s files with only asm code where you limited to exporting global scope C namespace symbols, or use inline asm within .c / .cpp files which can integrate much more tightly, but comes with complicated syntax needed to accomplish that integration.
 
Thanks for that prompt answer. I have been writing in assembly language for more than 50 years, on antique computers (starting with french CII 10010 and Univac 1108) and microprocessors (starting with Motorola 6800). And also in Basic, turbo-pascal and Labview on PC computers with DOS and Windows. Now retired, i still make funny things with microcontroller. I compared what can be done in C++ with arduino on Microchip's AtTiny 814 and in pure assembly with Microchip Studio. How great is the difference in speed and compactness ! But Attiny is a low-end 8 bits microcontroller, and I am somewhat afraid with Cortex-M7... So i am still hesitating ... I will see that next year, now ! Happy will it be for you ?
 
As you get into using Teensy next year, I'd highly recommend focusing on ordinary C++ usage, with the knowledge that inline asm is there if you really want or need it.

The reality of these modern chips is human written assembly just isn't a huge a benefit as the old days of 8 bit chips. For a specific example, check out this old thread where we optimized the number printing for Teensy LC. To cut to the chase, in msg #36 you can see the extremely crafty C code which turned out slightly faster than my best shot at optimized assembly! The slower asm code can still be found inside Print.cpp, if you want to see it (an example of the inline asm style). Of course that was on Cortex-M0. It's completely unnecessary for Cortex-M7 where we have a 32 bit hardware divide which the compiler uses to great effect.

We do have asm code in the audio library for memory-to-memory copy (an example of the .S file style) which de-interleaves stereo input data to a pair of mono outputs, so there are some rare cases where it really does help. But those are pretty rare. And even in those cases, the C++ code worked pretty well and was later replaced by asm (by Frank B) to use less CPU time doing the memory copy.
 
Hello Denis,
I read your post about wanting to use other environments to program the Teensy, and it resonated with a little of my experience while returning to the world of microcontrollers again after 30 years. I hope you don't mind the discussion/intrusion.

I don't have your background (well, I have to admit to a little Basic, C and Pascal!), so this may be a little off-course based on your question/request; however, like you, I wanted to do assembly on these contemporary chips. When I realised the structured overlay of modern chips, the complexity in hardware and the (often) RISC approach to the code base, I realised that the learning curve would be so steep as to become a long-winded end in itself, let alone make the machines "do anything".

It was a hard pill to swallow!! I resisted for a short time. Then, I saw the true value of the modern (free!!) tools after simply accepting that I needed to work at the next layer. The modern embedded compilers are amazing, and while I can't compete with your analysis of GCC optimisation, these are crazy fast gizmos, regardless. The Teensy (4.1) hammers!

In the early days with the Z80 and 8051, I used the C environment as a basic structure and would smarten up interrupts or some basic I/O with a dash of assembler. These days, with the PIC32 and the STM32, I realise there is little point in that approach anymore. Now, with the Teensy, it's even more of a contrast to the older Z80/8051 style chips. Since I dispensed with my yearning for low-level bare metal, most programs are easier to follow and debug - and it has been transformational, for me at least.

I have to admit that I hesitated with Teensy for about a year, with work and such taking much of my hobby time away. I also saw that the main programming environment for Teensy is "Arduino", which initially griped a bit. Historically, I have had a few irritations with the Arduino enclave in that it tended (purposely) to significantly abstract elements of the programming environment, the processor and I/O etc. before I was happy that I understood the underlying detail.

I wanted to have the programming reference on one hand and write some register addressing peripheral config on the other (old school).

Sometimes, the details matter, and I feel I cannot easily see them in the Arduino environment. However, the STM32 series with the Eclipse IDE with HAL layers (for better or worse) enabled me to track down all the (untidy!!) interrupt call-back functions, discover the ideas behind them, and model the processor and each peripheral both visually and in code. Arduino is not that.

Otherwise, Arduino gets you going superbly quickly, which is excellent as a starter - "following the bouncing ball". As a temporary minus, the last time I looked, I could not work out how to build a suitably modular program using it. With the later versions of Arduino IDE, it becomes a non-sequitur. These days I can bake in C++ or bare-bones C with the rest of the supplied libraries and have the modules linked as required. It's great!

Regarding optimisation, I sense there is a definite balance between hard optimisation and simply being able to produce stable, workable code. With the Teensy 600MHz clock, optimised instruction pipelines, close-coupled caches and modern processor design, the burden of code level optimisation pretty much evaporates - unless that's actually your hobby - to find the most efficient code. In which case, you're right; you will have to see how structures are coded and called at that level.

I hope you don't mind the intrusion, Denis. I hope you don't mind me relating here - your thoughts triggered a few little ideas that I hope it's OK to share. For my own concerns, these modern environments are unimaginably powerful compared with 30 years ago. Happy to hear your thoughts. :)
Kind Regards,
Steve
 
Steve, it is not an intrusion... I appreciate your comments, following Paul's ones... Ok, I surrender and will program with C++ ! Real time Fast fourrier transform (8192 points after windowing and "zero padding" four times per second) for continuous high precision frequency audio analysis.
 
Thanks, Denis. Can you imagine being able to do that FFT on an old Z80/8051 running at 4Mhz? My mind is still boggling. I also remember spending way too much money on a co-processor for my first 8080 PC, which had the full complement of 640K RAM and a 10Mbyte hard drive? (Will I ever fill that up?) I asked. (cough!!)

Here we have, for a handful of dollars, a double-precision FP machine doing what used to be the preserve of dedicated DSP chips. (Now that's another marvel right there; that whole DSP subject is so amazing.) One passing Q: Do you need to do Floating-Point FFTs? This is just a quick query. Fixed point 32 bits will still be pretty amazing, but I guess it's all about the use case..

Nice to know what you will be doing. I also want to try that FFT on Teensy!! 12 years ago, I rewrote some Turbo Pascal FFT source into Delphi and built a spectrum analyser for speaker testing on my old WinXP PC. Of course, you can buy the whole (proper!) commercial package for that purpose for not much money. But, I did learn a lot about dynamic memory management and classes and Windows threads and signal processing from the sound card and such for the effort. I understand your purpose in asking Paul about the above.

I still get mystified questions from friends like - "Why do you want to do that when you can buy it ready-made"? Many people do not often understand my answer. "Simply, because I want to know how it works." And that is enough. It's enough for me, even at the age of 70 (nearly 70). My workshop is full of half-started software and hardware projects, and I don't care, as long as I'm discovering interesting stuff. And this ..... this is really interesting. :) Give the C++ on Teensy stuff a go, see how it runs and then pull it apart. There is always good stuff to learn. These people are a goldmine of sound judgment and worthy ideas.

Kind regards,
Steve
 
Are there others tools than arduino to program a Teensy board ? Arduino GCC compiler produces an executable code which is far from beeing optimised. What is the way to use GNU AS assembler configured,for ARM cortex M7 ?
It isn't hard to use assembly but it is rarely worth the effort. The compilers do a good job.

The trick with writing better assembly language code is to first figure out if the effort is worth it. Is it in some critical bit of code that has severe time constraints? Then possibly. Otherwise probably not.

Or you could write an entire project in assembly. I have done that.
 
I feel like such a babe among sages at only 65. I thoroughly enjoyed 36 years of programming before retiring. Started with assembler (one of my co-workers at the time called it "cave code") on 8085s, 8088s, 8086s, 80186s, 6502s, & 6805s. Progressed to PL/M (which I sarcastically called a "high-order language...about this high" while pinching my forefinger & thumb together) on the Intel micros, then eventually primarily C with a dash of C++ for the majority of my career on PC-compatible hardware boards. All of my work involved real-time communications of one sort or another.

Mark J Culross
KD5RXT
 
Back
Top