Are there any plans/schedule to update GCC toolchain?

Status
Not open for further replies.

petermitrano

New member
I'm interested in using `std::eek:ptional` and `std::variant` on my teensy project for fun, and I've noticed that teensyduino is still on GCC 5.4

Are there plans to update to GCC 7 or something newer anytime soon? If not, how could I go about doing that manually?
 
I think that actually, PJRC's priority is rather on bringing a new product to market. Thus, I'd not expect a gcc upgrade soon, especially since the actual version works very well with the Teensyduino core files and libraries. "Never touch a running system - never change a winning team"
 
I think that actually, PJRC's priority is rather on bringing a new product to market. Thus, I'd not expect a gcc upgrade soon, especially since the actual version works very well with the Teensyduino core files and libraries. "Never touch a running system - never change a winning team"

That seemed to indicate the perfect time to update the toolchain - given the nature and power of the pending T_4 and the hope it will good for the coolest new AI with the components in a newer CMSIS (?). Getting that out of the way and tested on the old MCU's before T_4 ships rather than after when the toolchain doesn't allow best use - and then changing it across the board. Have seen a few posts looking for the newer CMSIS already - perhaps that is a wholly separate part independent of GCC though newer gcc may do better optimization for the newer T_4 hardware cache and pipelining? If I found the right info it seems the 600 MHz processor more than doubles the FLASH {off chip} access rate - and probably not 32 bits wide at that?
 
The fastest way is to download the arm GCC embedded toolchain as zipfile, and overwrite the original toolchain in arduino arm(watch the paths). Pretty easy.

Flash in T4: 4 Bit parallel access, I think(?) - Like the ESP.
There is a cache.
 
Last edited:
Are there plans to update to GCC 7 or something newer anytime soon?

Yes and no.

Yes, an upgrade will happen eventually.

But no, not "anytime soon". Definitely not before Teensy 4.0 is released.

An update to the arm_math.h library (aka CMSIS-DSP) will probably happen before the toolchain update. That too is a big deal that won't happen until well after next year's hardware is shipping.
 
The fastest way is to download the arm GCC embedded toolchain as zipfile, and overwrite the original toolchain in arduino arm(watch the paths). Pretty easy.

I agree. You will also need to edit 'boards.txt' to use c++17. I use a makefile to compile the project I have that uses std::variant, you can have a look at my compile and linker options, these would be the same that you'll need to use in 'boards.txt' for the board that you are using. If I remember correctly, the only relevant option here is -std=gnu++17:
https://github.com/bolderflight/RAPTRS/blob/master/software/Makefile#L153
 
Regarding the new hardware, there's a consistent theme of "it's complicated"....

... for the newer T_4 hardware cache and pipelining? If I found the right info it seems the 600 MHz processor more than doubles the FLASH {off chip} access rate - and probably not 32 bits wide at that?

Flash in T4: 4 Bit parallel access, I think(?) - Like the ESP.
There is a cache.

Yes, the flash memory is 4 bit QPI DDR clocked at 60 MHz. So the raw data rate, minus command overhead, is 60 MByte/sec. When accessing the flash directly as addressable memory, there are caches, 32K for data and 32K for instructions... so access is pretty slow for a cache miss, but then you get single cycle speed of the cache. The flash isn't accessed as bytes or words, but bursts that fill cache rows.

But that's not the normal way you use the chip. The main usage model involves copying your code into RAM at startup. On the new chip, basically everything will default to running like using "FASTRUN" is now. But there too, the "it's complicated" theme applies.

The RAM has a total size of 512K, which is organized in 32K blocks. Each block can be configured to connect in 1 of 3 possible ways, ITCM, DTCM or AXI. All 3 of these buses are 64 bits wide (but technically DTCM is a pair of 32 bit paths). The TCM buses run at the full clock speed. AXI runs at 1/4 of the CPU speed, but the bus provides amazing features. The general rule is you want all your code on the ITCM bus, your stack and all your normal variables accessed via the DTCM bus, and you want buffers accessed by DMA from peripherals on the AXI bus.

The two 32K caches only apply to memory on the AXI bus. Caching is never used on the ITCM and DTCM buses, since those are single cycle access to all of the memory assigned.
 
Regarding the new hardware, there's a consistent theme of "it's complicated"....
...

Thanks for the added detail. I wasn't sure if updated gcc might have any awareness of such hardware organization and help prep for it to minimize stalling. Using 4 bit QPI, okay I wasn't sure if one of the more elaborate flash connects might be used as it seemed that was what the NXP demo board was using when I glanced at it.
 
Definitely not using Hyperflash. The chips are physically too large to fit and they're far too expensive. Performance wise, they only help if you have a huge program where you need a lot of code that executes directly from the flash instead of the ITCM. Even then, the performance pales in comparison to running from RAM on the ITCM bus.

Clearly NXP wants to show off the best their new chips can achieve, and they probably do have customers making designs that aren't very cost sensitive who will go that route.
 
Yeah - HyperFlash - didn't even get as far as finding available parts to see size or cost - new and developing - assumed they could be more than the MCU. Showing off indeed.

Did you settle on a Flash size? I'm not remembering if you posted that.

Nice that FASTRUN can be used more widely and still keep the new cache blocks free - even if it does take away from RAM. The rest of the mix does sound like well engineered complications.
 
Maybe. I believe the way the ESP folks do this involves a partition setting, where a portion of the chip is set aside for filesystem and can't be used for code.

I'm going to consider this sort of thing much later, probably well after beta testing has started. So much more to do before this point...
 
Technically, it's supposed to be possible. But this is one of hundreds of things I've not yet actually tried.

If you look at figure 30-1 on page 1477, you'll see the diagram showing the flash arbitrator which is supposed to allow both the AXI/AHB bus and the peripheral bus to gain access to the hardware. So far, everything I've done with the flash has been only through the peripheral bus. I haven't yet used it as memory mapped into the ARM's address space (only been running code from ITCM). But I have done quite a few tests with the LUTs in different config for 1 bit, 4 bit SDR and 4 bit DDR modes.

Something I don't know how to handle is the case where the flash chip is busy writing, but the AHB bus wants access because something is accessing the memory space and you get a cache miss. Maybe the easy way would be to just busy loop with interrupts disabled. A really awesome way might involve reprogramming the FlexSPI's LUTs to do a write suspend and write resume. The hardware is so incredibly configurable, but with great configurability comes great complexity!
 
Some nice T_4 details on yet another thread. Sounds like the potential is there. Frank asked what I was wondering when I saw 2MB for Flash - having a static write area onboard will be useful. If the ESP folks unit can partition the flash with there hardware it sounds like the potential is on the T_4 - they set specify that partition in the IDE settings before upload.

Does the T_4 allow EEPROM area like the T_3.6?
 
Status
Not open for further replies.
Back
Top