Are there any plans/schedule to update GCC toolchain?

petermitrano · Oct 22, 2018

I'm interested in using `std:

ptional` and `std::variant` on my teensy project for fun, and I've noticed that teensyduino is still on GCC 5.4

Are there plans to update to GCC 7 or something newer anytime soon? If not, how could I go about doing that manually?

Theremingenieur · Oct 22, 2018

I think that actually, PJRC's priority is rather on bringing a new product to market. Thus, I'd not expect a gcc upgrade soon, especially since the actual version works very well with the Teensyduino core files and libraries. "Never touch a running system - never change a winning team"

defragster · Oct 22, 2018

Theremingenieur said:
I think that actually, PJRC's priority is rather on bringing a new product to market. Thus, I'd not expect a gcc upgrade soon, especially since the actual version works very well with the Teensyduino core files and libraries. "Never touch a running system - never change a winning team"

That seemed to indicate the perfect time to update the toolchain - given the nature and power of the pending T_4 and the hope it will good for the coolest new AI with the components in a newer CMSIS (?). Getting that out of the way and tested on the old MCU's before T_4 ships rather than after when the toolchain doesn't allow best use - and then changing it across the board. Have seen a few posts looking for the newer CMSIS already - perhaps that is a wholly separate part independent of GCC though newer gcc may do better optimization for the newer T_4 hardware cache and pipelining? If I found the right info it seems the 600 MHz processor more than doubles the FLASH {off chip} access rate - and probably not 32 bits wide at that?

Frank B · Oct 22, 2018

The fastest way is to download the arm GCC embedded toolchain as zipfile, and overwrite the original toolchain in arduino arm(watch the paths). Pretty easy.

Flash in T4: 4 Bit parallel access, I think(?) - Like the ESP.
There is a cache.

PaulStoffregen · Oct 22, 2018

petermitrano said:
Are there plans to update to GCC 7 or something newer anytime soon?

Yes and no.

Yes, an upgrade will happen eventually.

But no, not "anytime soon". Definitely not before Teensy 4.0 is released.

An update to the arm_math.h library (aka CMSIS-DSP) will probably happen before the toolchain update. That too is a big deal that won't happen until well after next year's hardware is shipping.

brtaylor · Oct 22, 2018

Frank B said:
The fastest way is to download the arm GCC embedded toolchain as zipfile, and overwrite the original toolchain in arduino arm(watch the paths). Pretty easy.

I agree. You will also need to edit 'boards.txt' to use c++17. I use a makefile to compile the project I have that uses std::variant, you can have a look at my compile and linker options, these would be the same that you'll need to use in 'boards.txt' for the board that you are using. If I remember correctly, the only relevant option here is -std=gnu++17:
https://github.com/bolderflight/RAPTRS/blob/master/software/Makefile#L153

PaulStoffregen · Oct 22, 2018

Regarding the new hardware, there's a consistent theme of "it's complicated"....

defragster said:
... for the newer T_4 hardware cache and pipelining? If I found the right info it seems the 600 MHz processor more than doubles the FLASH {off chip} access rate - and probably not 32 bits wide at that?

Frank B said:
Flash in T4: 4 Bit parallel access, I think(?) - Like the ESP.
There is a cache.

Yes, the flash memory is 4 bit QPI DDR clocked at 60 MHz. So the raw data rate, minus command overhead, is 60 MByte/sec. When accessing the flash directly as addressable memory, there are caches, 32K for data and 32K for instructions... so access is pretty slow for a cache miss, but then you get single cycle speed of the cache. The flash isn't accessed as bytes or words, but bursts that fill cache rows.

But that's not the normal way you use the chip. The main usage model involves copying your code into RAM at startup. On the new chip, basically everything will default to running like using "FASTRUN" is now. But there too, the "it's complicated" theme applies.

The RAM has a total size of 512K, which is organized in 32K blocks. Each block can be configured to connect in 1 of 3 possible ways, ITCM, DTCM or AXI. All 3 of these buses are 64 bits wide (but technically DTCM is a pair of 32 bit paths). The TCM buses run at the full clock speed. AXI runs at 1/4 of the CPU speed, but the bus provides amazing features. The general rule is you want all your code on the ITCM bus, your stack and all your normal variables accessed via the DTCM bus, and you want buffers accessed by DMA from peripherals on the AXI bus.

The two 32K caches only apply to memory on the AXI bus. Caching is never used on the ITCM and DTCM buses, since those are single cycle access to all of the memory assigned.

defragster · Oct 22, 2018

PaulStoffregen said:
Regarding the new hardware, there's a consistent theme of "it's complicated"....
...

Thanks for the added detail. I wasn't sure if updated gcc might have any awareness of such hardware organization and help prep for it to minimize stalling. Using 4 bit QPI, okay I wasn't sure if one of the more elaborate flash connects might be used as it seemed that was what the NXP demo board was using when I glanced at it.

PaulStoffregen · Oct 22, 2018

Definitely not using Hyperflash. The chips are physically too large to fit and they're far too expensive. Performance wise, they only help if you have a huge program where you need a lot of code that executes directly from the flash instead of the ITCM. Even then, the performance pales in comparison to running from RAM on the ITCM bus.

Clearly NXP wants to show off the best their new chips can achieve, and they probably do have customers making designs that aren't very cost sensitive who will go that route.

defragster · Oct 22, 2018

Yeah - HyperFlash - didn't even get as far as finding available parts to see size or cost - new and developing - assumed they could be more than the MCU. Showing off indeed.

Did you settle on a Flash size? I'm not remembering if you posted that.

Nice that FASTRUN can be used more widely and still keep the new cache blocks free - even if it does take away from RAM. The rest of the mix does sound like well engineered complications.

PaulStoffregen · Oct 22, 2018

The flash will be 2 Mbyte, the largest available in a 2x3 mm package.

Frank B · Oct 22, 2018

PaulStoffregen said:
The flash will be 2 Mbyte, the largest available in a 2x3 mm package.

Would it be possible to use the free space for a filesystem?

PaulStoffregen · Oct 22, 2018

Maybe. I believe the way the ESP folks do this involves a partition setting, where a portion of the chip is set aside for filesystem and can't be used for code.

I'm going to consider this sort of thing much later, probably well after beta testing has started. So much more to do before this point...

Frank B · Oct 22, 2018

My question was more whether you thought it technically possible. So, yes. Fine.

PaulStoffregen · Oct 22, 2018

Technically, it's supposed to be possible. But this is one of hundreds of things I've not yet actually tried.

If you look at figure 30-1 on page 1477, you'll see the diagram showing the flash arbitrator which is supposed to allow both the AXI/AHB bus and the peripheral bus to gain access to the hardware. So far, everything I've done with the flash has been only through the peripheral bus. I haven't yet used it as memory mapped into the ARM's address space (only been running code from ITCM). But I have done quite a few tests with the LUTs in different config for 1 bit, 4 bit SDR and 4 bit DDR modes.

Something I don't know how to handle is the case where the flash chip is busy writing, but the AHB bus wants access because something is accessing the memory space and you get a cache miss. Maybe the easy way would be to just busy loop with interrupts disabled. A really awesome way might involve reprogramming the FlexSPI's LUTs to do a write suspend and write resume. The hardware is so incredibly configurable, but with great configurability comes great complexity!

defragster · Oct 22, 2018

Some nice T_4 details on yet another thread. Sounds like the potential is there. Frank asked what I was wondering when I saw 2MB for Flash - having a static write area onboard will be useful. If the ESP folks unit can partition the flash with there hardware it sounds like the potential is on the T_4 - they set specify that partition in the IDE settings before upload.

Does the T_4 allow EEPROM area like the T_3.6?

Are there any plans/schedule to update GCC toolchain?

petermitrano

New member

Theremingenieur

Senior Member+

defragster

Senior Member+

Frank B

Senior Member

PaulStoffregen

Well-known member

brtaylor

Well-known member

PaulStoffregen

Well-known member

defragster

Senior Member+

PaulStoffregen

Well-known member

defragster

Senior Member+

PaulStoffregen

Well-known member

Frank B

Senior Member

PaulStoffregen

Well-known member

Frank B

Senior Member

PaulStoffregen

Well-known member

defragster

Senior Member+