Forum Rule: Always post complete source code & details to reproduce any issue!
Page 4 of 4 FirstFirst ... 2 3 4
Results 76 to 84 of 84

Thread: Teensyduino 1.34 Beta #1 (ARM Toolchain Update)

  1. #76
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    4,531
    Quote Originally Posted by manitou View Post
    you're right, RPI3 jessie is still running in 32 bit mode.

    i tested 1.6.12 with 1.34beta1 on mac os,
    coremark:
    previously T3.2@96mhz -O2 189.4 iterations/sec | with LTO fastest 207.29
    previously T3.6@180mhz -O2 384.0 | with LTO fastest 447.7
    ... so many optimization choices ...

    Code:
    T3.6@180mhz coremark
            fastest LTO 447.676389
            fastest     463.692033
            faster LTO  437.121360
            faster      434.528617
            fast  LTO   333.619557
            fast        333.032915
            small LTO   323.248789  no float printf
            small       320.692182
    GCC6 :

    - 180MHz fastest with LTO: Compiler crashes with this sketch ("lto1.exe: internal compiler error: Segmentation fault")
    - 180MHz fastest withou tLTO:
    Code:
    Start
    2K performance run parameters for coremark.
    CoreMark Size    : 666
    Total ticks      : 13072
    Total time (secs): 13.072000
    Iterations/Sec   : 458.996328
    Iterations       : 6000
    Compiler version : GCC6.2.1 20161205 (release) [ARM/embedded-6-branch revision 243739]
    Compiler flags   : 
    Memory location  : STACK
    seedcrc          : 0xe9f5
    [0]crclist       : 0xe714
    [0]crcmatrix     : 0x1fd7
    [0]crcstate      : 0x8e3a
    [0]crcfinal      : 0xa14c
    Correct operation validated. See readme.txt for run and reporting rules.
    CoreMark 1.0 : 458.996328 / GCC6.2.1 20161205 (release) [ARM/embedded-6-branch revision 243739]  / STACK
    So, again 10 points faster, even without LTO (more than twice as fast as T3.2 @ 96MHz)

    240MHz: 612.199466 ... no comment ..wow..
    My benchmarks were done with "-mpure-code" - seems to be a little bit faster.
    Last edited by Frank B; 01-22-2017 at 06:32 PM.

  2. #77
    Junior Member
    Join Date
    Apr 2015
    Posts
    10
    Quote Originally Posted by defragster View Post
    I got a pi 3 months back - unpowered yet - but I understood it was still using 32 bit Jessie for compatibility to all existing code/usage?
    I was under the impression that it is now 64 bit. I know Odroid is trying to transition to 64 bit Linux on the C2.

  3. #78
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    3,653
    My guess is that there are some alternate 64 bit setups for RPI3, but I don't know of any mainline one ones yet. Although I have not looked yet.

    Yes Odroid C2 main setups is 64 bits. There are still issues with it. For example trying to run Arduino on it. I played around enough to get the main parts of the compiler and downloads to work, but have not gotten the Serial monitor to work. More details in the thread: http://forum.odroid.com/viewtopic.php?f=136&t=21249

    Started trying to see about building a 64 bit version from sources. But then ran into issues where pieces of the build are from zip files or the like that have components for the different distros and there is not one for ARM 64 bits.... So I punted

  4. #79
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    18,343
    In case anyone's wondering, I am not eager to expand Linux's portion of the Teensyduino release process from 3 of 5 to 4 of 6 files built.

    Even if I was, my position is the same as before the 32 bit linuxarm build: I will officially support whatever architectures Arduino.cc officially supports with their non-beta releases. Until Arduino.cc adds a 64 bit linuxarm build, I will not do it. I know that's probably not the answer some Odriod enthusiasts probably want to hear, but hopefully a clear answer is better than uncertainty?

  5. #80
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    3,653
    Thanks Paul,

    Actually I would be happy with the 32 bit stuff working fine. And for me just having the compiler and upload is fine... There are obviously alternatives to using the terminal monitor.

    Actually I would be even happier to be able to do all of it from the command line. The current Arduino added Linux support for command line to work without GUI, which I tried and it worked all the way up to upload, which failed... But that is another story...

  6. #81
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    Quote Originally Posted by duff View Post
    Here is what delayMicroseconds disassembly look like with -O3 and LTO enabled on a T3:

    Code:
    000018b8 <L_783_delayMicroseconds>:
        18b8:    3b01          subs    r3, #1
        18ba:    d1fd          bne.n    18b8 <L_783_delayMicroseconds>
        18bc:    f892 3200     ldrb.w    r3, [r2, #512]    ; 0x200
        18c0:    f892 1280     ldrb.w    r1, [r2, #640]    ; 0x280
        18c4:    b2db          uxtb    r3, r3
        18c6:    2900          cmp    r1, #0
        18c8:    d1f2          bne.n    18b0 <main+0x28>
        18ca:    b13b          cbz    r3, 18dc <L_783_delayMicroseconds+0x24>
        18cc:    6803          ldr    r3, [r0, #0]
        18ce:    f023 0302     bic.w    r3, r3, #2
        18d2:    6003          str    r3, [r0, #0]
        18d4:    e7ef          b.n    18b6 <main+0x2e>
        18d6:    f882 5100     strb.w    r5, [r2, #256]    ; 0x100
        18da:    e7ec          b.n    18b6 <main+0x2e>
        18dc:    6803          ldr    r3, [r0, #0]
        18de:    f043 0303     orr.w    r3, r3, #3
        18e2:    6003          str    r3, [r0, #0]
        18e4:    e7e7          b.n    18b6 <main+0x2e>
        18e6:    f8df 8078     ldr.w    r8, [pc, #120]    ; 1960 <L_869_delayMicroseconds+0x58>
        18ea:    f8df c078     ldr.w    ip, [pc, #120]    ; 1964 <L_869_delayMicroseconds+0x5c>
        18ee:    f8df e078     ldr.w    lr, [pc, #120]    ; 1968 <L_869_delayMicroseconds+0x60>
        18f2:    4f18          ldr    r7, [pc, #96]    ; (1954 <L_869_delayMicroseconds+0x4c>)
        18f4:    4e18          ldr    r6, [pc, #96]    ; (1958 <L_869_delayMicroseconds+0x50>)
        18f6:    4d19          ldr    r5, [pc, #100]    ; (195c <L_869_delayMicroseconds+0x54>)
        18f8:    4c15          ldr    r4, [pc, #84]    ; (1950 <L_869_delayMicroseconds+0x48>)
        18fa:    e010          b.n    191e <L_869_delayMicroseconds+0x16>
        18fc:    b1eb          cbz    r3, 193a <L_869_delayMicroseconds+0x32>
        18fe:    6803          ldr    r3, [r0, #0]
        1900:    f023 0302     bic.w    r3, r3, #2
        1904:    6003          str    r3, [r0, #0]
        1906:    4623          mov    r3, r4
    
    
    00001908 <L_869_delayMicroseconds>:
        1908:    3b01          subs    r3, #1
        190a:    d1fd          bne.n    1908 <L_869_delayMicroseconds>
        190c:    f898 3000     ldrb.w    r3, [r8]
        1910:    f89c 3000     ldrb.w    r3, [ip]
        1914:    f89e 3000     ldrb.w    r3, [lr]
        1918:    783b          ldrb    r3, [r7, #0]
        191a:    7833          ldrb    r3, [r6, #0]
        191c:    782b          ldrb    r3, [r5, #0]
        191e:    f892 3200     ldrb.w    r3, [r2, #512]    ; 0x200
        1922:    f892 1280     ldrb.w    r1, [r2, #640]    ; 0x280
        1926:    b2db          uxtb    r3, r3
        1928:    2900          cmp    r1, #0
        192a:    d0e7          beq.n    18fc <L_783_delayMicroseconds+0x44>
        192c:    b113          cbz    r3, 1934 <L_869_delayMicroseconds+0x2c>
        192e:    f882 9100     strb.w    r9, [r2, #256]    ; 0x100
        1932:    e7e8          b.n    1906 <L_783_delayMicroseconds+0x4e>
        1934:    f882 9080     strb.w    r9, [r2, #128]    ; 0x80
        1938:    e7e5          b.n    1906 <L_783_delayMicroseconds+0x4e>
        193a:    6803          ldr    r3, [r0, #0]
        193c:    f043 0303     orr.w    r3, r3, #3
        1940:    6003          str    r3, [r0, #0]
        1942:    e7e0          b.n    1906 <L_783_delayMicroseconds+0x4e>
        1944:    4004b014     andmi    fp, r4, r4, lsl r0
        1948:    43fe1014     mvnsmi    r1, #20
        194c:    1fff8e08     svcne    0x00ff8e08
        1950:    00f42400     rscseq    r2, r4, r0, lsl #8
        1954:    1fff8e0c     svcne    0x00ff8e0c
        1958:    1fff8e00     svcne    0x00ff8e00
        195c:    1fff8dff     svcne    0x00ff8dff
        1960:    1fff8e09     svcne    0x00ff8e09
        1964:    1fff8e0a     svcne    0x00ff8e0a
        1968:    1fff8e0b     svcne    0x00ff8e0b
    Ouch, seems like all the inline assembly gets mucked up with LTO enabled! I found this out with my Zilch library which heavily uses inline assembly.

    For reference here is delayMicroseconds using -03 without LTO:
    Code:
    0000048a <L_36_delayMicroseconds>:
         48a:    3b01          subs    r3, #1
         48c:    d1fd          bne.n    48a <L_36_delayMicroseconds>
         48e:    bd08          pop    {r3, pc}
         490:    00f42400     rscseq    r2, r4, r0, lsl #8
    No, it doesn't. The inline assembly part is just the 'subs ...; bne.n ...' which is identical in both cases.

    BTW, Zilch Simple_Task works with higher optimization levels, if I change zilch.cpp:
    Code:
    void task_swap( volatile stack_frame_t *prevframe, volatile stack_frame_t *nextframe ) {
    to either:
    Code:
    void __attribute__ ((noinline)) task_swap( volatile stack_frame_t *prevframe, volatile stack_frame_t *nextframe ) {
    or:
    Code:
    void __attribute__ ((naked)) task_swap( volatile stack_frame_t *prevframe, volatile stack_frame_t *nextframe ) {
    I think there is a GCC bug here, since simply adding a proper clobber list to the asm statement doesn't work.

    \\

    In general, GCC has no idea what the inline assembly does and assumes it doesn't change memory or registers. You need to add proper clobber lists, which you don't have for the Zilch inline assembly.

  7. #82
    Senior Member duff's Avatar
    Join Date
    Jan 2013
    Location
    Las Vegas
    Posts
    909
    My latest version of Zlich currently only supports T3.2 and works with all optimizations except Fastest w/ LTO.
    Last edited by duff; 09-14-2017 at 08:10 PM.

  8. #83
    Senior Member
    Join Date
    Jan 2013
    Posts
    154
    Quote Originally Posted by duff View Post
    My latest version of Zlich currently only supports T3.2 and works with all optimizations except Fastest w/ LTO.
    @duff, FYI, your link (https://github.com/duff2013/Zilch_Beta) is broken and Zilch is misspelled in the above post. However, I DID find the Zilch project here: https://github.com/duff2013/Zilch

    Is that the correct one?

  9. #84
    Senior Member duff's Avatar
    Join Date
    Jan 2013
    Location
    Las Vegas
    Posts
    909
    Quote Originally Posted by markonian View Post
    @duff, FYI, your link (https://github.com/duff2013/Zilch_Beta) is broken and Zilch is misspelled in the above post. However, I DID find the Zilch project here: https://github.com/duff2013/Zilch

    Is that the correct one?
    Thanks, I'll fix that!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •