Teensy 4? testing mbed NXP MXRT1050-EVKB (600 Mhz M7)

I can't seem to find the max GPIO sink/source current???

I don't know the official spec, but I can tell you I've "experimentally" seen ~60 mA into a dead short. ;)


Most interesting is, in the chapter about the seven PLLs (Reference Manual) they say the CHIP is capable of running with 1 GHz.

Where did you see that?

Like we have now on Teensy 3.x, the PLLs (usually) run at much higher frequency than the CPU and get divided down by various factors to make the actual CPU clock.

For example, in the K66 the PLL gets divided by 2 (which isn't configurable) so we're actually running the PLL at 360 MHz on Teensy 3.6 when the CPU runs at 180 MHz.

All the info I've seen so far is PLL1 officially runs between 648 to 1296 MHz. The maximum (beyond spec) configurable PLL speed seems to be 1524 MHz. Unlike Kinetis, we get quite a playground of configurable stuff in this chip. The divider right after the PLL ("ARM_PODF") defaults to div-by-2, but supposedly can be set from 1 to 8. There's one more divider ("AHB_PODF") in the path, also configurable from 1 to 8. So if the chip really works with these both in div-by-1 mode, overclocking all the way up to 1.525 GHz might be possible to at least try.

If that first divider really needs to be div-by-2 (maybe for a 50% duty cycle?) then we may be limited to attempting only up to 762 MHz.

However, the other really unfortunate overclocking problem is the IPG_PODF divider can be at most div-by-4. The IPG clock is pretty much the same as F_BUS we have on Teensy 3.x... all the peripherals use it, at least for access to their registers. The official spec is 150 MHz max. Whether this will impose an upper limit on overclocking is a good question?


But let's not talk about overclocking now :) Its just an interesting information, now.( And I fear we had to increase the core-voltage, which may be a bit risky)

Sure, why not talk of overclocking?! It's fun.

And yeah, pushing the core voltage up to 1.5V seems pretty risky...

I must admit, so far I've been doing pretty everything at 396 MHz, with the expectation it'll scale up or down. The RT1062 runs a little warm at this speed... so with extreme overclocking we might need heatsinks or other cooling.
 
Imax = N x C x V x (0.5 x F)
Where:
N—Number of IO pins supplied by the power line
C—Equivalent external capacitive load
V—IO voltage
(0.5 x F)—Data change rate. Up to 0.5 of the clock rate (F)
In this equation, Imax is in Amps, C in Farads, V in Volts, and F in Hertz.
 
Where did you see that?

i.MX RT1060 Processor Reference
Manual
Document Number: IMXRT1060RM
Rev. 0, 08/2018



12.3.2.2 PLLs
Seven PLLs are included in the clock generation section. Two of these PLLs are each
equipped with four Phase Fractional Dividers (PFDs) in order to generate additional
frequencies.
[...]
The seven PLLs are listed below:
PLL1 (also referred to as ARM_PLL) - This is the PLL clocking the ARM core
complex. It is a programmable integer frequency multiplier capable of output
frequency of up to 1.3 GHz. Note that this frequency is higher than the maximum
chip supported frequency 1.0 GHz.

• PLL2 (also referred tp as System_PLL or 528_...

Maybe I read this wrong..?
But 1GHZ/2 (if divided) would be 500MHz only...


Same text is in other manuals, too.
 
Last edited:
anyway, we'll have raspberry-killer, even with 600MHz (I've read it's faster than A7@900MHz - don't remember where exactly), an no time consuming OS.

edit: raspberry 1 :)
 
Looks like the old compiler we currently have produces slightly faster code than the newest.

These are with 396 MHz clock and -O3 optimization.

Using the compiler currently in Teensyduino, CoreMark=1636

Code:
*2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 12219
Total time (secs): 
Iterations/Sec   : 1636
Iterations       : 20000
Compiler version : GCC5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496]

Using a much newer version, CoreMark=1600

Code:
*2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 12500
Total time (secs): 
Iterations/Sec   : 1600
Iterations       : 20000
Compiler version : GCC7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]
 
Is there a source where I can get a copy of coremark so I can play with it on the Teensy? Thank you.
 
Thank you for the link. Will take a look and give it a try. Guess curiosity is good and fun most of the time!
Thanks again
Mike
 
Here are various coremarkish results for 1050@600 mhz. Typically, IAR compiler gives highest performance. ARM CC is used by mbed on-line compiler. I don't have local access to ARM CC or IAR toolchain ($$$)
Code:
NXP reports 3036 or 3014  (IAR 7.80)
ARM CC -O3  2438  (mbed on-line)
GNU GCC 7.2.1 -O3  2444  (MCUXpresso, using free-running GPT for micros())
GNU GCC 7.3.1 -O3  2397
GNU GCC 5.4.1 -O3  2445    4.075 coremarks/MHz
IAR 2943 from https://mcu-things.com/blog/imxrt-how-to-coremark-implementation/

https://www.eembc.org/coremark/scores.php

See post #1 for comparison with Teensy 3*
 
Last edited:
Pretty sure that's leftover copy-paste stuff from iMX6, which does officially support 1 GHz.

Yes, i've found it here: https://www.nxp.com/docs/en/reference-manual/IMX6SLLRM.pdf , 10.3.2.3 (Page 349) - copy & paste, you're right.

Compilers:
https://forum.pjrc.com/threads/29017-Cxxmark-Compiler-Performance-LC-3-1
- showed faster results for newer compilers - highest compilerversion 5.2.1 was that time, but it was faster than the versions before.

What are GCCs` default-settings for CM7? maybe there is some potential to tune the codegeneration a bit? For example, there are switches for level-1 cache size and some more.
And as we have RAM with waitstates now, we should look a bit closer on variable-alignments in our code, and align important arrays and structs to 32 so that prefetching can work optimal.
 
Did a few more Coremark tests using a T3.5 for comparison to the T3.6 and the rt1052. Results are based on the GCC5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496]:
Code:
               coremarkish
              iterations/sec
    M7@600mhz      2438        ARM CC -O3    @manitou
  T3.5@144mhz       331        Fastest
  T3.5@168mhz       369        Fastest
  T3.5@144mhz       319        Fastest+pure+LTO
  T3.5@168mhz       357        Fastest+pure+LTO

  From post #1 for T3.6 for easy comparison
  T3.6@180mhz       434         Faster
  T3.6@256mhz       659         Fstest+pure+LTO
  T3.2@120mhz       254         Faster
 
Did you know that you can pass optimizations per function to gcc ?
For example:
__attribute__((optimize("unroll-loops"))) void xyz(void) {...}

A more compatible way is using #pragma
 
T3.6 : A bit difficult to compare with different compiler options
Don't know if this will help but if we just compare -O3 optimization:
Code:
               coremarkish
              iterations/sec
    M7@600mhz      2438        ARM CC -O3
  T3.5@168mhz       369        Fastest (O3)
  T3.6@240mhz       579        Fastest (O3)
Think for me the takeaway is that the 1050 will be about 4x faster than the 1050 and 6x faster the T3.5, at least for the Coremark tests.

Code:
Did you know that you can pass optimizations per function to gcc ?
No - I didn't know that. Where the heck do you find this stuff. Wish we had a single page where we could put all these tips and tricks. I try to keep a collection with bookmarks but I tend to loose track of them.

EDIT: Answered my own question: https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
 
Did you know that you can pass optimizations per function to gcc ?
For example:
__attribute__((optimize("unroll-loops"))) void xyz(void) {...}

A more compatible way is using #pragma

I added the original implementation of __attribute__ optimize back in the GCC 4.5 time period, along with target support for the x86 at the end of the time I worked for AMD. Later versions of the compiler added target support for other backends including ARM.

Note, the #pragma/attribute target support doesn't work with C++, but the pragma/attribute for optimize does. There is grumbling within the GNU GCC community of developers that allowing particular -f options to be set/reset can break things, particularly as more of the optimization is moved to the earlier parts of the compiler. But in the embedded space as well as the HPC (high performance computing) arenas there are times when you need to tune individual functions.

If you select a particular function to have optimize/target attributes, it will cause that function not to be inlined by other functions that don't have the same options set.
 
..and it is great.


Oh, good to know! Does always_inline override that?

Well it overrides it, but the function switches of the outer function controls the optimization options. If you don't use always_inline, then a call is done and the outer function has its switches in use. If you are inlining a function without optimization or target options, then it uses the switches of the outer function.
 
Figured out how to get a sdcard example working on the 1052 evkb. I added a timing function based on something the @manitou helped me with:

Code:
Make file system......The time may be long if the card capacity is big.
(I am using a 64GB sdxhc card)

Make file system......The time may be long if the card capacity is big.
TIME TO MAKE FS: 15796466

Create directory......
TIME TO CREATE DIRECTORY: 98777

Create a file in that directory......
TIME TO CREATE FILE: 2835

Create a directory in that directory......
TIME TO CREATE DIR in that DIR: 74598

List the file in that directory......
General file : F_1.DAT.
Directory file : DIR_2.

Write/read file until encounters error......

Write to above created file.
TIME TO CREATE Write: 3041

Read from above created file.
TIME TO CREATE READ:582
Time is in us and represent writing a 1024 byte buffer.

I just found this post that has some times for the T3.6: https://forum.pjrc.com/threads/54241-Simple-SdFat-write-benchmark-for-T3-5-3-6?highlight=card+speed.

Cheers
 
Back
Top