Forum Rule: Always post complete source code & details to reproduce any issue!
Page 4 of 6 FirstFirst ... 2 3 4 5 6 LastLast
Results 76 to 100 of 139

Thread: Teensy 4? testing mbed NXP MXRT1050-EVKB (600 Mhz M7)

  1. #76
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,691
    Quote Originally Posted by manitou View Post
    I can't seem to find the max GPIO sink/source current???
    I don't know the official spec, but I can tell you I've "experimentally" seen ~60 mA into a dead short.


    Quote Originally Posted by Frank B View Post
    Most interesting is, in the chapter about the seven PLLs (Reference Manual) they say the CHIP is capable of running with 1 GHz.
    Where did you see that?

    Like we have now on Teensy 3.x, the PLLs (usually) run at much higher frequency than the CPU and get divided down by various factors to make the actual CPU clock.

    For example, in the K66 the PLL gets divided by 2 (which isn't configurable) so we're actually running the PLL at 360 MHz on Teensy 3.6 when the CPU runs at 180 MHz.

    All the info I've seen so far is PLL1 officially runs between 648 to 1296 MHz. The maximum (beyond spec) configurable PLL speed seems to be 1524 MHz. Unlike Kinetis, we get quite a playground of configurable stuff in this chip. The divider right after the PLL ("ARM_PODF") defaults to div-by-2, but supposedly can be set from 1 to 8. There's one more divider ("AHB_PODF") in the path, also configurable from 1 to 8. So if the chip really works with these both in div-by-1 mode, overclocking all the way up to 1.525 GHz might be possible to at least try.

    If that first divider really needs to be div-by-2 (maybe for a 50% duty cycle?) then we may be limited to attempting only up to 762 MHz.

    However, the other really unfortunate overclocking problem is the IPG_PODF divider can be at most div-by-4. The IPG clock is pretty much the same as F_BUS we have on Teensy 3.x... all the peripherals use it, at least for access to their registers. The official spec is 150 MHz max. Whether this will impose an upper limit on overclocking is a good question?


    But let's not talk about overclocking now Its just an interesting information, now.( And I fear we had to increase the core-voltage, which may be a bit risky)
    Sure, why not talk of overclocking?! It's fun.

    And yeah, pushing the core voltage up to 1.5V seems pretty risky...

    I must admit, so far I've been doing pretty everything at 396 MHz, with the expectation it'll scale up or down. The RT1062 runs a little warm at this speed... so with extreme overclocking we might need heatsinks or other cooling.

  2. #77
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Imax = N x C x V x (0.5 x F)
    Where:
    N—Number of IO pins supplied by the power line
    C—Equivalent external capacitive load
    V—IO voltage
    (0.5 x F)—Data change rate. Up to 0.5 of the clock rate (F)
    In this equation, Imax is in Amps, C in Farads, V in Volts, and F in Hertz.

  3. #78
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Quote Originally Posted by PaulStoffregen View Post
    Where did you see that?
    i.MX RT1060 Processor Reference
    Manual
    Document Number: IMXRT1060RM
    Rev. 0, 08/2018



    12.3.2.2 PLLs
    Seven PLLs are included in the clock generation section. Two of these PLLs are each
    equipped with four Phase Fractional Dividers (PFDs) in order to generate additional
    frequencies.
    [...]
    The seven PLLs are listed below:
    PLL1 (also referred to as ARM_PLL) - This is the PLL clocking the ARM core
    complex. It is a programmable integer frequency multiplier capable of output
    frequency of up to 1.3 GHz. Note that this frequency is higher than the maximum
    chip supported frequency 1.0 GHz.

    • PLL2 (also referred tp as System_PLL or 528_...
    Maybe I read this wrong..?
    But 1GHZ/2 (if divided) would be 500MHz only...


    Same text is in other manuals, too.
    Last edited by Frank B; 11-23-2018 at 12:48 PM.

  4. #79
    Senior Member
    Join Date
    Jul 2014
    Posts
    1,940
    Quote Originally Posted by Frank B View Post

    Same text is in other manuals, too.
    refer to Paul's observation that actual manual is full of errors (cut/paste type)
    note also that in the lowpower AN the 600 MHz clock is called 'Overdrive run'

  5. #80
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Note that I wrote "or its just a wrong information" above.

  6. #81
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    anyway, we'll have raspberry-killer, even with 600MHz (I've read it's faster than A7@900MHz - don't remember where exactly), an no time consuming OS.

    edit: raspberry 1 :-)

  7. #82
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,691
    Pretty sure that's leftover copy-paste stuff from iMX6, which does officially support 1 GHz.

  8. #83
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,691
    Looks like the old compiler we currently have produces slightly faster code than the newest.

    These are with 396 MHz clock and -O3 optimization.

    Using the compiler currently in Teensyduino, CoreMark=1636

    Code:
    *2K performance run parameters for coremark.
    CoreMark Size    : 666
    Total ticks      : 12219
    Total time (secs): 
    Iterations/Sec   : 1636
    Iterations       : 20000
    Compiler version : GCC5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496]
    Using a much newer version, CoreMark=1600

    Code:
    *2K performance run parameters for coremark.
    CoreMark Size    : 666
    Total ticks      : 12500
    Total time (secs): 
    Iterations/Sec   : 1600
    Iterations       : 20000
    Compiler version : GCC7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]

  9. #84
    Senior Member
    Join Date
    Jul 2014
    Location
    New York
    Posts
    1,942
    Is there a source where I can get a copy of coremark so I can play with it on the Teensy? Thank you.

  10. #85
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,691
    Here's the CoreMark code.

    https://github.com/eembc/coremark

    Minor editing in the core_portme.c and core_portme.h files are needed, but pretty simple. I also renamed "main" so it can be called from other code, even though technically you're not supposed to edit those files.

  11. #86
    Senior Member
    Join Date
    Jul 2014
    Location
    New York
    Posts
    1,942
    Thank you for the link. Will take a look and give it a try. Guess curiosity is good and fun most of the time!
    Thanks again
    Mike

  12. #87
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,643
    Here are various coremarkish results for 1050@600 mhz. Typically, IAR compiler gives highest performance. ARM CC is used by mbed on-line compiler. I don't have local access to ARM CC or IAR toolchain ($$$)
    Code:
    NXP reports 3036 or 3014  (IAR 7.80)
    ARM CC -O3  2438  (mbed on-line)
    GNU GCC 7.2.1 -O3  2444  (MCUXpresso, using free-running GPT for micros())
    GNU GCC 7.3.1 -O3  2397
    GNU GCC 5.4.1 -O3  2445    4.075 coremarks/MHz
    IAR 2943 from https://mcu-things.com/blog/imxrt-ho...mplementation/

    https://www.eembc.org/coremark/scores.php

    See post #1 for comparison with Teensy 3*
    Last edited by manitou; 11-26-2018 at 05:09 PM.

  13. #88
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Quote Originally Posted by PaulStoffregen View Post
    Pretty sure that's leftover copy-paste stuff from iMX6, which does officially support 1 GHz.
    Yes, i've found it here: https://www.nxp.com/docs/en/referenc.../IMX6SLLRM.pdf , 10.3.2.3 (Page 349) - copy & paste, you're right.

    Compilers:
    https://forum.pjrc.com/threads/29017...ormance-LC-3-1
    - showed faster results for newer compilers - highest compilerversion 5.2.1 was that time, but it was faster than the versions before.

    What are GCCs` default-settings for CM7? maybe there is some potential to tune the codegeneration a bit? For example, there are switches for level-1 cache size and some more.
    And as we have RAM with waitstates now, we should look a bit closer on variable-alignments in our code, and align important arrays and structs to 32 so that prefetching can work optimal.

  14. #89
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Just found an application note about using the caches:https://www.nxp.com/docs/en/applicat...te/AN12042.pdf

    Edit:
    Important is, it says with DMA we have to take care about the cache, and mark DMA memory as not cachable (for example).


    Last edited by Frank B; 11-25-2018 at 11:20 AM.

  15. #90
    Senior Member
    Join Date
    Jul 2014
    Location
    New York
    Posts
    1,942
    Did a few more Coremark tests using a T3.5 for comparison to the T3.6 and the rt1052. Results are based on the GCC5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496]:
    Code:
                   coremarkish
                  iterations/sec
        M7@600mhz      2438        ARM CC -O3    @manitou
      T3.5@144mhz       331        Fastest
      T3.5@168mhz       369        Fastest
      T3.5@144mhz       319        Fastest+pure+LTO
      T3.5@168mhz       357        Fastest+pure+LTO
    
      From post #1 for T3.6 for easy comparison
      T3.6@180mhz       434         Faster
      T3.6@256mhz       659         Fstest+pure+LTO
      T3.2@120mhz       254         Faster

  16. #91
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    T3.6 : A bit difficult to compare with different compiler options

  17. #92
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Did you know that you can pass optimizations per function to gcc ?
    For example:
    __attribute__((optimize("unroll-loops"))) void xyz(void) {...}

    A more compatible way is using #pragma

  18. #93
    Senior Member
    Join Date
    Jul 2014
    Location
    New York
    Posts
    1,942
    T3.6 : A bit difficult to compare with different compiler options
    Don't know if this will help but if we just compare -O3 optimization:
    Code:
                   coremarkish
                  iterations/sec
        M7@600mhz      2438        ARM CC -O3
      T3.5@168mhz       369        Fastest (O3)
      T3.6@240mhz       579        Fastest (O3)
    Think for me the takeaway is that the 1050 will be about 4x faster than the 1050 and 6x faster the T3.5, at least for the Coremark tests.

    Code:
    Did you know that you can pass optimizations per function to gcc ?
    No - I didn't know that. Where the heck do you find this stuff. Wish we had a single page where we could put all these tips and tricks. I try to keep a collection with bookmarks but I tend to loose track of them.

    EDIT: Answered my own question: https://gcc.gnu.org/onlinedocs/gcc/F...ttributes.html

  19. #94
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,691
    I like "coremarkish".

  20. #95
    Senior Member
    Join Date
    Jul 2014
    Location
    New York
    Posts
    1,942
    Thank @manitou for that one - I thought it was appropriate

  21. #96
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    2,828
    Quote Originally Posted by Frank B View Post
    Did you know that you can pass optimizations per function to gcc ?
    For example:
    __attribute__((optimize("unroll-loops"))) void xyz(void) {...}

    A more compatible way is using #pragma
    I added the original implementation of __attribute__ optimize back in the GCC 4.5 time period, along with target support for the x86 at the end of the time I worked for AMD. Later versions of the compiler added target support for other backends including ARM.

    Note, the #pragma/attribute target support doesn't work with C++, but the pragma/attribute for optimize does. There is grumbling within the GNU GCC community of developers that allowing particular -f options to be set/reset can break things, particularly as more of the optimization is moved to the earlier parts of the compiler. But in the embedded space as well as the HPC (high performance computing) arenas there are times when you need to tune individual functions.

    If you select a particular function to have optimize/target attributes, it will cause that function not to be inlined by other functions that don't have the same options set.

  22. #97
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,773
    Quote Originally Posted by MichaelMeissner View Post
    I added the original implementation of __attribute__ optimize back in the GCC 4.5 time period
    ..and it is great.

    Quote Originally Posted by MichaelMeissner View Post
    If you select a particular function to have optimize/target attributes, it will cause that function not to be inlined by other functions that don't have the same options set.
    Oh, good to know! Does always_inline override that?

  23. #98
    Senior Member+ MichaelMeissner's Avatar
    Join Date
    Nov 2012
    Location
    Ayer Massachussetts
    Posts
    2,828
    Quote Originally Posted by Frank B View Post
    ..and it is great.


    Oh, good to know! Does always_inline override that?
    Well it overrides it, but the function switches of the outer function controls the optimization options. If you don't use always_inline, then a call is done and the outer function has its switches in use. If you are inlining a function without optimization or target options, then it uses the switches of the outer function.

  24. #99
    Senior Member
    Join Date
    Jul 2014
    Location
    New York
    Posts
    1,942
    Figured out how to get a sdcard example working on the 1052 evkb. I added a timing function based on something the @manitou helped me with:

    Code:
    Make file system......The time may be long if the card capacity is big.
    (I am using a 64GB sdxhc card)
    
    Make file system......The time may be long if the card capacity is big.
    TIME TO MAKE FS: 15796466
    
    Create directory......
    TIME TO CREATE DIRECTORY: 98777
    
    Create a file in that directory......
    TIME TO CREATE FILE: 2835
    
    Create a directory in that directory......
    TIME TO CREATE DIR in that DIR: 74598
    
    List the file in that directory......
    General file : F_1.DAT.
    Directory file : DIR_2.
    
    Write/read file until encounters error......
    
    Write to above created file.
    TIME TO CREATE Write: 3041
    
    Read from above created file.
    TIME TO CREATE READ:582
    Time is in us and represent writing a 1024 byte buffer.

    I just found this post that has some times for the T3.6: https://forum.pjrc.com/threads/54241...ght=card+speed.

    Cheers

  25. #100
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    1,643
    Quote Originally Posted by mjs513 View Post
    Figured out how to get a sdcard example working on the 1052 evkb. I added a timing function based on something the @manitou helped me with:
    Excellent. that was on my to-do list. which example did you work from ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •