Teensyduino 1.36 Beta #1 (ARM Toolchain Update)

Status
Not open for further replies.

Paul

Administrator
Staff member
Here is a first beta test for Teensyduino 1.36.


Old beta download links removed. Please use the latest version:
https://www.pjrc.com/teensy/td_download.html


The only change since 1.35 is switching to the gcc 5.4 toolchain, which was briefly attempted with 1.34-beta1 before an Arduino release forced reverting back to the old toolchain. With any luck we'll have time to throughly test.

The LTO (link time optimization) and the "Fastest" (-O3 optimizaton) may break some program or libraries. Please report anything you find which doesn't work.
 
Saw this used elsewhere - does __attribute__((optimize("O1"))) show a way to control local optimizations that would work? On 1.34b1 I saw Zilch had an issue - though it may have been from the LTO rework after the compile. I adorned a couple functions in a simple sketch and it compiles and changes code size and works . . . not tested on a known fail case . . .

Code:
__attribute__((optimize("O3"))) void foo(  ... )
{
}
 
Last edited:
Here's a list of the problems discovered from 1.34-beta1.


Libraries which might have compile errors:

  • Adafruit_CC3000 buildtest example
  • ks0108 error compiling
  • LowPower fails on all boards, even Teensy 2.0
  • PS2Keyboard errors
  • ST7565 error, C++ overload on srandom()
 
Last edited:
I'll look at delayMicroseconds() after my 1.8.1 and 1.36 downloads finish and get installed. I'll test it against millis() and the T_3.5/3.6 manitou's RTC isr() and then if needed try the __attribute__((optimize("O1"))) [or "O2"] since that is a single function to test against.
 
Quick test shows an elapsedMicros over 10 seconds behavior agrees with 10 sets of delayMicroseconds( 1000000 ) on 1.34 and 1.36.

Under 1.8.0 w/TD_1.34 and 1.8.1 w/TD1.36 - with and without the LTO and FASTEST.

I ran it in a for() so there is 13+ms over head that seems consistent enough.
 
For those using TYQT on a Teensy before T_3.5 if you get failures to upload - you may need an updated version of TYQT after the toolchain update. I found this last year with beta of 1.34 and Koromix addressed it and it worked on my T_3.0 last night as noted in this linked post.

I didn't get a solid confirmation that the '__attribute__((optimize("O1"))) ' decoration on the delayMicroseconds() was working - since there was nothing to fix - and not looking at the ASM output I could only go by code size which for that single function change didn't seem different, it did compile.

NOTE: I noted before that when editing CORE files - where IDE used to force rebuild on each compile - now never seems to recompile with latest IDE's even when I edited the core_pins.h file, until forcing a full rebuild with a 'Tools' change.
 
I' playing with GCC 6, it has a some nice new features plus additions by ARM.
It works, but i had not the time to do any benchmarks - i wonder if the new "ARM PURECODE support for ARMv7-M" is helpful (-> faster code?) for the Teensies ?
 
Last edited:
I'm curious about this too. But my guess is it'll be slower if multiple instructions are needed to create literals.

Wasn't there some option to specify different code generation for fast vs slow program/flash memory? Maybe that would make a difference, especially on 3.2 & 3.5 where there's very little cache for the flash memory.

-mslow-flash-data ?

This should work with GCC5, too. I remember i did some tests, but the gain, if any, was very small.
 
Suggestion maybe put a quick summary of what these options are in the optimize menu. Like what is LTO...

Also by the ordering in menu, is faster faster than fast?

That is the order of the items in the menu (disregard LTO) are:
Faster, Fast, Fastest, Debug, Smallest
 
Suggestion maybe put a quick summary of what these options are in the optimize menu. Like what is LTO...

Also by the ordering in menu, is faster faster than fast?

That is the order of the items in the menu (disregard LTO) are:
Faster, Fast, Fastest, Debug, Smallest
I agree the optimization menu needs to be fixed, it really makes sense to have logical order to this especially when testing the libraries against different options?

Also is this the same toolchain build as 1.34? Now the pargma I was using "#pragma GCC optimize ("no-lto")" to disable LTO for certain code sections won't compile on this update and would for 1.34?

Snooze does not work with this build as of yet with both LTO and NO-LTO, don't know whats going on but there are problems.

Paul, is there any guidance on how you plan to proceed with this toolchain update? Are you planning on having all these optimization build options available to the user, why I ask is I don't want to spend a lot of time fixing things that won't be used anyway in the standard Teensyduino Arduino IDE update.
 
Yesterday in the SHT31 thread I mentioned, that it stopped working properly. I was finding that some of the SPI.endTransmission calls were failing with timeout (4)... Through looking at logic output on SCL/SDA, it appeared like the Stop bit was not done on the previous call to requestFrom... I put a hack into other library if I fail in the endTransmisison I do it again...
More in the thread: https://forum.pjrc.com/threads/4125...dafruit-sht31d?p=131258&viewfull=1#post131258

I am quite sure this worked on previous toolchain...

I found if I went back to the released version of wire it worked... I checked my differences in these functions and the differences had to do with using a pointer to the I2C structure to get to the member variables like C1 and S instead of hard coded.

What was interesting was I then changed a couple of functions that I had as inline to not be inline... Removed #if 0 code here...
Code:
uint8_t TwoWire::i2c_status(KINETIS_I2C_t  *kinetisk_pi2c)
{
	return kinetisk_pi2c->S;
}

void TwoWire::i2c_wait(KINETIS_I2C_t  *kinetisk_pi2c)
{
	while (1) {
		if ((i2c_status(kinetisk_pi2c) & I2C_S_IICIF)) break;
	}
	kinetisk_pi2c->S = I2C_S_IICIF;
}
And it started working.
I am using this Beta, T3.6 Fast

Will investigate more... Example not sure which of these two may have changed...

Update: No problem inline the i2c_status function, but inline of the wait causes the issue....
 
Last edited:
might be there's a "volatile" missing somewhere(, and the compiler thinks it is enough to read it once, and the "break" will never occur ?) - (inside i2c_status? - i did not read the code)
 
Last edited:
Thanks Frank,
The kinetisk_pi2c pointer is of type KINETIS_I2C_t which is defined in kenetis.h:

Code:
typedef struct {
	volatile uint8_t	A1;
	volatile uint8_t	F;
	volatile uint8_t	C1;
	volatile uint8_t	S;
	volatile uint8_t	D;
	volatile uint8_t	C2;
	volatile uint8_t	FLT;
	volatile uint8_t	RA;
	volatile uint8_t	SMB;
	volatile uint8_t	A2;
	volatile uint8_t	SLTH;
	volatile uint8_t	SLTL;
} KINETIS_I2C_t;
And as I mentioned the i2c_status function appears to work inline, which is the one it dereferences the pointer... I will try again and try looking at generated code, to see if I see anything obvious.
 
Just wanted to let you know that when targeting Teensy 3.1/2, Snooze results in the following error only when using "Faster with LTO", "Fastest with LTO" and "Smallest Code with LTO":
Code:
mk20dx256.ld:45 cannot move location counter backwards (from 00000000000004a8 to 0000000000000400)
collect2: error: ld returned 1 exit status

It compiles successfully for the remaining 7 options for Teensy 3.1/2. When targeting Teensy 3.6, all 10 optimization options compile successfully. I find it odd that it works for some of the LTO options. To test this, I used the following sketch:

Code:
#include <Snooze.h>
SnoozeTimer timer;
SnoozeBlock config(timer);
void setup() {
  timer.setTimer(1000);
}
void loop() {
  (void)Snooze.sleep( config );
  (void)Snooze.deepSleep( config );
  (void)Snooze.hibernate( config );
}
 
Just wanted to let you know that when targeting Teensy 3.1/2, Snooze results in the following error only when using "Faster with LTO", "Fastest with LTO" and "Smallest Code with LTO":
Code:
mk20dx256.ld:45 cannot move location counter backwards (from 00000000000004a8 to 0000000000000400)
collect2: error: ld returned 1 exit status
Yes, I know this issue quite well, it's a problem with externed variable in the wake.h file. I wanted fix it but since I haven't heard from paul on the direction of this toolchain update its on the back-burner for now since this error is not consistent through the different compile options and-or Teensies.
 
Code:
mk20dx256.ld:45 cannot move location counter backwards (from 00000000000004a8 to 0000000000000400)
collect2: error: ld returned 1 exit status

A way that always works, is to edit the linker-file (attached). My edit moves the startup-code behind the flashconfig.
This wastes some 100 Bytes flash, but we have 1MB.

(just for the case you get this error, and need a quick way to fix it)
 

Attachments

  • mk66fx1m0.zip
    1.4 KB · Views: 403
Last edited:
Paul, to prevent this error - it can occur any time with different libs with lto - would it be feasable to use the alternate startup-address (linker-file from above) if "lto" is enabled ?
 
Frank- IIRC you ran your T_3.5's at (OC) speeds and found them to work? Would you suggest Paul might enable one or more T_3.5 OC speeds in this Beta/release?
 
Frank- IIRC you ran your T_3.5's at (OC) speeds and found them to work? Would you suggest Paul might enable one or more T_3.5 OC speeds in this Beta/release?

I think the 3.6 is the better choice than overclocking a 3.5
Next logical step is a CORTEX-M7 with twice the DMIPS/MHz.
I dont'k know, if NXP has a MCU that can be used.
 
Last edited:
Maybe we should a do for the T3.5 like we have for T3.1 and define some of the overclocked options for the T3.5, but leave the menu item entry for them commented out.

Yes T3.6 is nicer for higher speeds, but if you need/want 5v... then maybe should be put in as option, that the user can if desired try out
 
Boards.txt already has them under comment - I tried 144 the other day - now gone after installing this beta.

Indeed running the T_3.6 at 240 is way cooler than OC on the T_3.5. But if it works? Maybe it isn't as reliable even for conditional OC support? Most of my time on T_3.5 was in beta when pushing it wasn't desired to assure KS release.

Just bringing it up because of this thread: Teensy-3-5-overclocking-(not-3-6)
 
Boards.txt already has them under comment - I tried 144 the other day - now gone after installing this beta.

Indeed running the T_3.6 at 240 is way cooler than OC on the T_3.5. But if it works? Maybe it isn't as reliable even for conditional OC support? Most of my time on T_3.5 was in beta when pushing it wasn't desired to assure KS release.

Just bringing it up because of this thread: Teensy-3-5-overclocking-(not-3-6)

I don't see any commented out ones in either my 1.8.0 with current released TD and 1.8.1 with this beta... They both look like:
Code:
teensy35.menu.speed.120=120 MHz
teensy35.menu.speed.96=96 MHz
teensy35.menu.speed.72=72 MHz
teensy35.menu.speed.48=48 MHz
teensy35.menu.speed.24=24 MHz
teensy35.menu.speed.16=16 MHz (No USB)
teensy35.menu.speed.8=8 MHz (No USB)
teensy35.menu.speed.4=4 MHz (No USB)
teensy35.menu.speed.2=2 MHz (No USB)
teensy35.menu.speed.120.build.fcpu=120000000
teensy35.menu.speed.96.build.fcpu=96000000
teensy35.menu.speed.72.build.fcpu=72000000
teensy35.menu.speed.48.build.fcpu=48000000
teensy35.menu.speed.24.build.fcpu=24000000
teensy35.menu.speed.16.build.fcpu=16000000
teensy35.menu.speed.8.build.fcpu=8000000
teensy35.menu.speed.4.build.fcpu=4000000
teensy35.menu.speed.2.build.fcpu=2000000
 
I don't see any commented out ones in either my 1.8.0 with current released TD and 1.8.1 with this beta... They both look like:
Code:
teensy35.menu.speed.120=120 MHz
teensy35.menu.speed.96=96 MHz
...
teensy35.menu.speed.2.build.fcpu=2000000

opps ... TOO right Kurt - I suppose I did manually create one . . . it worked for what I did as I did it.

Commented versions would be a good first step unless there are regular issues where it would cause harm or distraction from undefined behavior.
 
Status
Not open for further replies.
Back
Top