Teensyduino 1.62 Beta #1 - Toolchain Update

Paul

Administrator
Staff member
Here is a first beta test for Teensyduino 1.62.

Arduino 2.3.x, for Windows, Linux X86_64, MacOS X86_64, MacOS ARM64 (Apple Silicon). Copy this URL to Arduino IDE File > Preferences (or Arduino IDE > Settings on MacOS).

https://www.pjrc.com/teensy/td_162-beta1/package_teensy_0.62.1_index.json

Use Boards Manager to install Teensy version 0.62.1
(to refresh versions, Shift-Ctrl-P and click "Arduino: Update Package Index")

Arduino 1.8.x, Linux 64 bit: https://www.pjrc.com/teensy/td_162-beta1/TeensyduinoInstall.linux64

Arduino 1.8.x, Windows: https://www.pjrc.com/teensy/td_162-beta1/TeensyduinoInstall.exe

No ARMHF or AARCH64 for Raspberry Pi or other SBC yet. I will add the toolchain for AARCH64 soon.

Whether to put that work into ARMHF to run on 32 bit Raspberry Pi is undecided. If anyone feels strongly about 32 bit Raspberry Pi, now is the time to speak up!


The only major change since Teensyduino 1.61 is updating the gcc toolchain from 11.3 to 15.2.

Wire library was edited to fix a compile error, and boards.txt edited to suppress linker warning.
 
This is the first version for MacOS that should be 100% native Apple Silicon. But my M3 Macbook Air has Rosetta2 installed, so I haven't been able to fully confirm. If you have a M-series Mac that definitely does not have Rosetta installed, please give this a try. Tomorrow I will try to delete Rosetta2... but it's not so easy to get rid of it.

The new gcc 15.2 toolchain gives a lot more compiler warnings, as we've seen with every prior toolchain update. In particular there are warnings about write() virtual override, which I'm pretty sure come down to the difference in pointer types (const uint8_t , const void *, const char *) between libraries and the core library.

I also did some quick tests for 100% static init on critical C++ classes. Serial, Wire, and HardwareSerial look good. But SPI probably needs some careful attention.

I have mostly tested on Arduino 2.3.8 for all 4 platforms, and Arduino 1.8.19 on Linux. Haven't touched Arduino 1.8.19 on Windows. Hopefully it is ok?
 
Installed Win 11 IDE 1.8.19 : No Problem

In building the WRITE warning shows and PrintFile - nothing else::
Code:
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\SdFat\src/SdFat.h:495:23:   required from here
  495 | class SdFile : public PrintFile<SdBaseFile> {
      |                       ^~~~~~~~~~~~~~~~~~~~~
T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\cores\teensy4/Print.h:61:24: warning: 'virtual size_t Print::write(const uint8_t*, size_t)' was hidden [-Woverloaded-virtual=]
   61 |         virtual size_t write(const uint8_t *buffer, size_t size);
      |                        ^~~~~

Uses a few libraries - builds and seems fine:
Code:
Using library ST7735_t3-dev-big-screen-t4 at version 1.2.1 in folder: T:\T_Drive\tCode\libraries\ST7735_t3-dev-big-screen-t4
Using library SPI at version 1.0 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\SPI
Using library ILI9341_fonts at version 1.0 in folder: T:\T_Drive\tCode\libraries\ILI9341_fonts
Using library Adafruit_FT6206_Library at version 1.1.0 in folder: T:\T_Drive\tCode\libraries\Adafruit_FT6206_Library
Using library Adafruit_BusIO at version 1.17.1 in folder: T:\T_Drive\tCode\libraries\Adafruit_BusIO
Using library Wire at version 1.0 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\Wire
Using library TeensyUserInterface at version 1.3.1 in folder: T:\T_Drive\tCode\libraries\TeensyUserInterface
Using library Audio at version 1.3 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\Audio
Using library SD at version 2.0.0 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\SD
Using library SdFat at version 2.1.2 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\SdFat
Using library SerialFlash at version 0.5 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\SerialFlash
Using library Bounce2 at version 2.55 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\Bounce2
Using library LittleFS at version 1.0.0 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\LittleFS
Using library USBHost_t36 at version 0.2 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\USBHost_t36
Using library EEPROM at version 2.0 in folder: T:\T_Drive\arduino-1.8.19\hardware\teensy\avr\libraries\EEPROM
Using library QNEthernet_WIFI at version 0.36.0-snapshot in folder: T:\T_Drive\tCode\libraries\QNEthernet_WIFI
 
Fresh copy from : https://github.com/PaulStoffregen/teensy41_psram_memtest/blob/master/teensy41_psram_memtest.ino
How does this work? 32 MB tests just 19 secs slower than 16 MB? But 16 MB takes 2X longer than 8 MB.

Code:
EXTMEM Memory Test, 32 Mbyte
 CCM_CBCMR=95AE8304 (105.6 MHz)
testing with fixed pattern 5A698421
...
testing with fixed pattern 00000000
 test ran for 118.39 seconds
All memory tests passed :-)


Code:
EXTMEM Memory Test, 16 Mbyte
 CCM_CBCMR=95AE8304 (105.6 MHz)
testing with fixed pattern 5A698421
...
testing with fixed pattern 00000000
 test ran for 59.19 seconds
All memory tests passed :-)


Code:
EXTMEM Memory Test, 8 Mbyte
 CCM_CBCMR=95AE8304 (105.6 MHz)
testing with fixed pattern 5A698421
...
testing with fixed pattern 00000000
 test ran for 29.60 seconds
All memory tests passed :-)

1779608693985.png

 
Note that this test isn’t probing enough if FlexSPI pre-fetch is enabled and 16MB parts are fitted. I have a PR in to fix this issue.
How does this work? 32 MB tests just 19 secs slower than 16 MB? But 16 MB takes 2X longer than 8 MB.
Looks correct to me - 118, 59 and 29 seconds. :unsure:
I managed to remove Rosetta 2 from my M3 Macbook Air. Then I discovered Arduino doesn't seem to have their house fully in order for future MacOS. They're tracking it with this github issue.
That’s the behaviour I saw which prompted this post. Good to know they’re tracking it, and have been for a couple of years now. No doubt a hasty bodge will get applied some time after macOS 27 is released.
 
Just downloaded 1.62.1 and just one of my graphics opengl cases. Seems to be throwing a lot more
Code:
...is used uninitialized [-Wuninitialized]

which comes from the st7735 library.

and also one warning that I haven't seen before,
Code:
d:\Users\Merli\Documents\Arduino\libraries\TeensyOpenGL\TeensyGL.cpp: In member function 'void Teensy_OpenGL::glEnd()':
d:\Users\Merli\Documents\Arduino\libraries\TeensyOpenGL\TeensyGL.cpp:701:50: warning: the address of 'Teensy_OpenGL::draw_order' will never be NULL [-Waddress]
  701 |                                 uint16_t order = draw_order? draw_order[quad]:quad;  //face to draw
 
That’s the behaviour I saw which prompted this post. Good to know they’re tracking it, and have been for a couple of years now. No doubt a hasty bodge will get applied some time after macOS 27 is released.

After writing msg #5, I played with the ctags code. Found a simple way to get it to compile on modern MacOS with only a minimal edit. Posted that info on their github issue, with screenshots and a copy of the compiled ctags file.

I found conflicting info about Apple's plans to drop Rosetta 2. Some sites say MacOS 27 which ought to be only a few months away, others are saying MacOS 28. Whenever it comes, I'm feeling pretty good we're now on the right path. Hopefully that ctags info wrote will help Arduino too.


Note that this test isn’t probing enough if FlexSPI pre-fetch is enabled and 16MB parts are fitted. I have a PR in to fix this issue.

Yeah, can confirm FlexSPI2 prefetch is on my very long list of low priority stuff. I know that's not a very satisfying answer, but this toolchain update has also been a pretty low priority and now it's finally happening.
 
Note that this test isn’t probing enough if FlexSPI pre-fetch is enabled and 16MB parts are fitted. I have a PR in to fix this issue.
Looks correct to me - 118, 59 and 29 seconds. :unsure:
If the memory is Written and Read End to Front does that defeat the Cache read 'ahead'?

I flipped the for loops to go end to begin on the dWord count. That added some time for the cnt++.

Code:
//  for (p = memory_begin; p < memory_end; p++) {
  for (ii = 0, p = memory_end-1; ii < memory_size; p--,ii++) {

EXTMEM Memory Test, 32 Mbyte
test ran for 136.87 seconds
All memory tests passed :)

EXTMEM Memory Test, 16 Mbyte
test ran for 68.46 seconds
All memory tests passed :)

EXTMEM Memory Test, 8 Mbyte
test ran for 34.21 seconds
All memory tests passed :)
 
As of Teensyduino 1.61, FlexSPI pre-fetch isn’t enabled, so no issues will arise with any test. There’s a PR in to enable it, which gives a useful speed increase but can result in memory errors if a pre-fetch access occurs across a page boundary. The 8MB parts seem unaffected, but the 16MB ones are.

Your reversed-direction test probably would defeat various silicon optimisations, and maybe make the test more likely to pass. But we kinda don’t want that - we want to push it to the point of failure, but not beyond…
 
Should mention, that Norton is having a fit over this...
View attachment 39407

Should only take a few hours....

Oops meant to put this in:

I now added whole teensy areas of Arduino15 to the don't scan so now it builds.

Not sure if you picked up the flexio changes or not, should probably look....

For those of us who are used to programming c++ back in the stone age, what new things does the updated toolchain provide?

Should mention, my mtp picture viewer is working. I uploaded a couple of new pictures, renamed them and display...
1779817025202.png

Getting pretty good using the paws
 
Last edited:
Yes, gcc 15.2 brings the possibility to migrate to C++20. But it won't happen soon. As with migrating from C++11 to C++14 and C++14 to C++17, we move to the new toolchain first but remain on the same C++ dialect for at least a couple releases, maybe longer depending on how much changes with core library and some of the main libraries (Wire, SPI, SD, SdFat, etc).

The update to gcc 15.2 also brings support for newer chips. Like language dialects, we won't be using this immediately. But the general idea is to migrate the toolchain well ahead, so future hardware can build on a mature ecosystem of libraries and programs well tested with the toolchain the new hardware requires.

MacOS changes anticipated in September 2027 are also a motivation. Apple is expected to remove Rosetta 2, which is how the gcc 11.3 toolchain runs on Apple Silicon Macs. Like language dialects and future hardware, my goal is to prepare well in advance.
 
I hoping for Windows feedback, as that's the platform I personally use the least. My main question today is whether 32 bit vs 64 bit toolchain makes noticeable difference?

This beta has the 32 bit toolchain, which should be compatible back to ancient Windows versions. To get the 64 bit compiler, here is a direct link to arm-gnu-toolchain-15.2.rel1-mingw-w64-x86_64-arm-none-eabi.zip at ARM's download site. To use this, navigate to {AppData}\Arduino15\packages\teensy\tools\teensy-compile\15.2.1\arm. In this directory you should see 5 subdirs "arm-none-eabi", "bin", "include", "lib" and "libexec". Those are the 32 bit toolchain. When you extract the ZIP archive, you should get those 5 folders and also "share" and "manifest.txt" file. Those last 2 are just documentation. Just move or delete the 5 original folders and move those 5 folders from the ZIP file into place.

Probably best to restart the Arduino IDE. Maybe also delete places where compile stuff gets cached, but I honestly don't know where those are on Windows.

In theory you should get exactly the same compiled result for Teensy. But the time taken and memory used on your PC during compile might be different? Or maybe they'll be too similar to really notice?

If they are different, the question becomes whether I should dump the 32 bit version and always install the 64 bit toolchain? Arduino IDE / CLI can theoretically have both and Arduino CLI is supposed to automatically choose which to download. How well that actually works, I'm not sure. So far nearly all the stuff I've heard about selecting version is for MacOS choosing between Intel vs Apple Silcon.
 
From a compiler point of view, the answer is it depends.

32-bit programs tend to be faster in terms of cache behavior.
  • Often times on the x86, the 32-bit code is smaller due to the original i386 instructions being small, and when the 64-bit instructions were added, these additions often were larger because an instruction modifier was used which made the instruction larger. Smaller instruction size means the i-cache is more effective.
  • The 32-bit calling sequence uses push/pull instructions to build up the stack while the 64-bit side uses normal stores and loads. These push/pull instructions are smaller that explicit stores/loads. In addition, most of the x86 processors have special optimizations in the hardware for push/pull instructions (due in part because a push/pull instruction does both a store or load and also updates the stack pointer).
  • A real minor effect on code size is that switch statements and similar things use 64-bit elements to get the address to jump to on 64-bit environments, and 32-bit elements in a 32-bit environment.
  • In 64-bit environments, pointers are 64-bits instead of 32-bits, which means the data touched is now bigger, and hence the d-cache is less effective since it is bigger.
  • Due to 64-bit types, there are often unused holes in data structures and on the stack because of alignment constraints, which also means the d-cache is slightly less effective.
  • I don't recall the details, but when I worked at AMD, there were a few Spec 2006 benchmarks that ran faster in 32-bit mode, due to cache issues.
Depending on how the application is compiled, on 32-bit systems, it may use the original 80-bit i387 floating stack or it may use the 32-bit/64-bit floating point registers in the SSE unit. The 64-bit systems generally use the SSE unit unless the programmer uses 'long double'. In general, the 80-bit i387 unit is slower than the floating point support in the SSE registers for normal 32/64-bit floating point (though the instructions to access the stack are smaller than loads/stores).

On the other hand, in 64-bit environments, since the program knows it has a large address space, it can hold everything in memory, using simple address modes. For 32-bit environments, some programs that want to process large amounts of data need to maintain their own secondary disk based caches, that take more cycles to execute.
 
Last edited:
  • A real minor effect on code size is that switch statements and similar things use 64-bit elements to get the address to jump to on 64-bit environments, and 32-bit elements in a 32-bit environment.
That's generally not true any more due to requirements for position-independent code/ASLR support, 64-bit programs use 32-bit displacements relative to the switch's jump instruction.
 
That's generally not true any more due to requirements for position-independent code/ASLR support, 64-bit programs use 32-bit displacements relative to the switch's jump instruction.
Using default options, I just built a switch test program using GCC 15 on my Fedora system, and it uses native addresses (using .quad for 64-bit and .long for 32-bit). Of course this is for a Linux system. Perhaps the defaults are different for Windows systems.

Sure if I build position independent code that goes into a shared library using -fpic then it uses relative addresses and 32-bit differences (using switch-label - case-label for 64-bit and case-label@GOTOFF for 32-bit).
 
What I'm getting from this either might be faster for various reasons.

I tried a quick test on my Windows 11 laptop, which has Intel Core Ultra 7 165U processor and 16GB RAM. I used a kitchen timer to measure how long compiling the Audio WavFilePlayer takes.

32 bit toolchain took 31 seconds
64 bit toolchain took 27 seconds.

In both tests, I restarted the Arduino IDE did a first compile without timing. I waited for Arduino IDE to finish is rebuilding index stuff. Then I used Windows File Explorer to delete {AppData}/Local/arduino, which seems to be the place where prior compiles get cached. Then I just clicked Verify and pressed the timer start button, and pressed the timer button to stop when I saw the memory usage info appear.

If anyone else wants to try, my instructions in msg #17 were missing a step. The ARM math library .a files need to be copied into the toolchain's arm-none-eabi/lib folder. I just copied those 4 files from the 32 bit version into that same folder on the 64 bit version.
 
Two more quick tests compiling Audio WavFilePlayer with the new 15.2 toolchain after deleting all cached data.

14 seconds on Macbook Air (M3, 16GB RAM)
25 seconds on Macbook Pro (Intel Core i9 2.4 GHz, 2019 model, 64GB RAM)
7 seconds on Linux Desktop (AMD 9950X, 96GB RAM)

I am kinda curious how a Windows desktop with fast CPU and lots of RAM compares?
 
Last edited:
curious how a Windows desktop with fast CPU and lots of RAM compares?
Any chance you could assemble an IDE 1.8.19 Windows x64 Beta build with the right things installed in the right places?
I built a 2023 i7-13700K with 64GB DDR4 RAM and only SSD's - involved first builds seem slow, esp IDE2

AI says: Since 2019 Win 10 release Windows has been 64 bit only, and optional since Windows XP in 2005.
 
I'll package up beta2 shortly, with both 32 and 64 toolchain builds for Windows running Arduino IDE 2.3.x

I'm making new copies of the toolchain for all platforms, with some the utilities we never use deleted to trim the download size byte a few MB. Also worked on teensy_size to use JSON format so the info can appear normally as white text in IDE 2.3.x, not alarming red.

There will be no Windows 64 bit support for old Arduino IDE 1.8.x.
 
There will be no Windows 64 bit support for old Arduino IDE 1.8.x.
That’s disappointing.

I have two portable and one standard installations of 1.8.19 on one of my main dev machines. I prefer those greatly over 2.x, for which Arduino seem to have taken a Microsoft attitude of “let’s do the stuff we want, break a bunch of useful features, and ignore users’ pleas to fix them”.

Yes, that machine is still running Windows 10, because I have it set up to run as I want. Can’t do it with W11…
 
Back
Top