Forum Rule: Always post complete source code & details to reproduce any issue!
Page 106 of 167 FirstFirst ... 6 56 96 104 105 106 107 108 116 156 ... LastLast
Results 2,626 to 2,650 of 4151

Thread: Teensy 4.0 First Beta Test

  1. #2626
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    Quote Originally Posted by mjs513 View Post
    Figured it was time to put all this data into perspective
    Haha - it's a bit premature to make pretty graphs!

    The new code has absolutely no optimizations yet, and it still appears to have some serious bugs. I'm about to start looking into why the latency test is failing with 64 bytes.

    Please, would you do me a small favor and keep the benchmark stuff text only for now? These sorts of graphs do make a strong impression on people who don't read the finer details. I definitely would like to see this sort of graph comparison in a few weeks. But please, let's only do this when comparing well tested and optimized code against the very mature and highly optimized Teensy 3.x USB code, and maybe other Arduino compatible boards with stable released code.

    On benchmarking, here's another really simple speed test I wrote yesterday. Let it run for about 5 seconds, or until the lines/sec number becomes fairly stable.

    uint32_t count, prior_count;
    uint32_t prior_msec;
    uint32_t count_per_second;

    // Uncomment this for boards where SerialUSB needed for native port
    //#define Serial SerialUSB

    void setup() {
    Serial.begin(1000000);
    while (!Serial) ;
    count = 10000000;
    prior_count = count;
    count_per_second = 0;
    prior_msec = millis();
    }

    void loop() {
    Serial.print("count=");
    Serial.print(count);
    Serial.print(", lines/sec=");
    Serial.println(count_per_second);
    count = count + 1;
    uint32_t msec = millis();
    if (msec - prior_msec > 1000) {
    prior_msec = prior_msec + 1000;
    count_per_second = count - prior_count;
    prior_count = count;
    }
    }
    On this test too, Teensy 3.x with years of careful optimization handily beats the no-optimization code on Teensy 4.0. But that situation won't last long. I will do quite a lot of optimization on T4 soon. Its USB is 480 Mbit/sec speed and the USB controller works at the transfer level rather than the token level as we have on Teensy 3.x, and of course we have a much faster CPU, so the potential for amazing speed is there. But with USB, even incredibly fast hardware does poorly without careful optimization work in the software.

    If you run this speed test on some of Arduino's boards, it's pretty amazing how much the speed varies. Some are amazingly slow. It's also kind of interesting to compare to Teensy 2.0, where the code has a lot of optimization but the hardware is only 8 bits at 16 MHz without DMA.

  2. #2627
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    Quote Originally Posted by PaulStoffregen View Post
    Haha - it's a bit premature to make pretty graphs!

    The new code has absolutely no optimizations yet, and it still appears to have some serious bugs. I'm about to start looking into why the latency test is failing with 64 bytes.

    Please, would you do me a small favor and keep the benchmark stuff text only for now? These sorts of graphs do make a strong impression on people who don't read the finer details. I definitely would like to see this sort of graph comparison in a few weeks. But please, let's only do this when comparing well tested and optimized code against the very mature and highly optimized Teensy 3.x USB code, and maybe other Arduino compatible boards with stable released code.

    On benchmarking, here's another really simple speed test I wrote yesterday. Let it run for about 5 seconds, or until the lines/sec number becomes fairly stable.



    On this test too, Teensy 3.x with years of careful optimization handily beats the no-optimization code on Teensy 4.0. But that situation won't last long. I will do quite a lot of optimization on T4 soon. Its USB is 480 Mbit/sec speed and the USB controller works at the transfer level rather than the token level as we have on Teensy 3.x, and of course we have a much faster CPU, so the potential for amazing speed is there. But with USB, even incredibly fast hardware does poorly without careful optimization work in the software.

    If you run this speed test on some of Arduino's boards, it's pretty amazing how much the speed varies. Some are amazingly slow. It's also kind of interesting to compare to Teensy 2.0, where the code has a lot of optimization but the hardware is only 8 bits at 16 MHz without DMA.
    Sorry Paul
    I went ahead and deleted the graphs, with all the numbers being thrown out I was loosing perspective I understand there is still a lot to do with USB including optimizations but I am a kind of visual person so I like pretty pictures but fully understand your point. To be honest was a bit leery about running any of the tests on USB but wanted to get a baseline of where we are now compared to where it ends up when you get done with USB. Like you said it, probably a bit premature for pictures.

    Going to go ahead and run the tests and maybe just put it in a spreadsheet for now - no graphs I promise.

  3. #2628
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    @PaulStoffregen

    Ran your speed test on a variety of boards besides the T4 for later comparison. Wow you are right abou the Arduino boards. Even my Nucleo-H7 board has some issues. Couldn't find my Arduino Due board - forgot where I put it.

    Code:
    	T4 (9f4d02c)	T3.6	T.3.5	2.0++	Uno	mega	Nucleo144-H7	101    DUE
    Avg	    3999	26873.5	26877.5	8577	1268	1267.5	2818.5         475.5   376.5
    min	    3998	26841	26837	8568	1267	1266	2818	        475     376
    max	    4000	26906	26918	8586	1269	1269	2819	        476     377
    Last edited by mjs513; 05-01-2019 at 01:55 PM. Reason: Updated for DUE

  4. #2629
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    You can also expect to see different speed with versus without a USB hub. You might think a hub would slow things down, but in many cases you'll see better performance for 12 Mbit/sec devices due to how the hub's transaction translator works. The TT effectively creates another fully dedicated 12 Mbit channel to the downstream (from the hub) device. Many hubs have TTs that are surprisingly good at estimating what packets they can schedule before the SOF, which also translates into a little more speed in some cases.

    Don't be shy about putting T4's numbers on the forum. That's perfectly ok. But please keep comparisons text only. Maybe also mention which version/commit you tested since we're in rapidly changing beta...

  5. #2630
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,060
    @PaulStoffregen and @mjs513, I tried a few boards....

    T4 appeared to be about 3990
    OpenCM9.04 2670
    OpenCR1 15515
    T3.6 26300

    So T3... goes pretty fast

  6. #2631
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    @KurtE and @PaulStoffregen

    Just added T4 with the latest commit (as of 3hours ago) to post #2628.... lots of stuff left on USB.. don't envy you Paul

  7. #2632
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    Quote Originally Posted by defragster View Post
    WOW - the flashcode text has 5 Views! Paul hopefully you are one of them?
    Yes, one of those was me.

    I looked at this briefly yesterday, doing a few quick visual checks of the hexdump (msg #2582) against the original .hex file (msg #2305). While it looked ok, nothing like 256 byte chunks all 00 or FF, I quickly realized I didn't have a copy of the original in hexdump format, and neither an easy way to turn the hexdump back into intel hex .HEX format.

    I didn't want to compare against a hexdump of another board, in case there's a bug in my test fixture code which wrote those images into the top sector of the flash. In that case, every board might have the same corrupted restore program.

    Just now I wrote a little program which turns the original .HEX file into the same hexdump as that code on msg #2576 creates. Here's the hexdump of the original code.

    Code:
    0000: 46 43 46 42 00 00 01 56 00 00 00 00 01 03 03 00
    0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0040: 00 00 00 00 01 04 02 00 00 00 00 00 00 00 00 00
    0050: 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00
    0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0080: EB 04 18 0A 06 32 04 26 00 00 00 00 00 00 00 00
    0090: 05 04 04 24 00 00 00 00 00 00 00 00 00 00 00 00
    00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00B0: 06 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00D0: 20 04 18 08 00 00 00 00 00 00 00 00 00 00 00 00
    00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0100: D8 04 18 08 00 00 00 00 00 00 00 00 00 00 00 00
    0110: 02 04 18 08 04 20 00 00 00 00 00 00 00 00 00 00
    0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0130: 60 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    01A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    01B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    01C0: 00 01 00 00 00 10 00 00 01 00 00 00 00 00 00 00
    01D0: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
    01E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    01F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0200: D1 00 20 40 2C 10 00 60 00 00 00 00 00 00 00 00
    0210: 20 10 00 60 00 10 00 60 00 00 00 00 00 00 00 00
    0220: 00 00 00 60 10 16 00 00 00 00 00 00 00 60 00 20
    0230: 91 10 00 60 F0 B5 15 4B 40 20 4F F4 80 31 4F F4
    0240: 80 54 4F F4 00 55 4F F4 40 46 43 F2 42 07 1A 69
    0250: 12 F0 02 0F 03 D0 9E 61 59 61 9F 61 F7 E7 12 F4
    0260: 00 5F 01 D1 5D 61 F2 E7 12 F4 80 5F 01 D1 5C 61
    0270: ED E7 00 2A EB DA 12 F4 80 3F 01 D0 99 61 E6 E7
    0280: 52 06 01 D4 58 61 E2 E7 F0 BD 00 BF 00 80 0D 40
    0290: 70 B5 72 B6 00 23 35 4A 35 4C 36 49 98 18 88 42
    02A0: 03 D2 18 59 98 50 04 33 F8 E7 33 4A 00 20 33 49
    02B0: 8A 42 4F F0 00 03 02 D2 10 60 04 32 F8 E7 30 49
    02C0: 10 22 30 48 30 4C 03 60 0A 60 30 4A 14 60 30 4C
    02D0: 0C 60 30 4C 14 60 30 4C 0C 60 30 4C 14 60 30 4C
    02E0: 0C 60 30 49 11 60 01 21 01 60 BF F3 4F 8F BF F3
    02F0: 6F 8F C2 F8 B0 31 BF F3 4F 8F BF F3 6F 8F 50 F8
    0300: 80 2C 6F F0 7F 44 A0 F5 58 60 27 4E 42 F4 40 32
    0310: C0 F8 00 2D 25 4A 13 60 04 60 25 4C 23 60 11 60
    0320: FF F7 88 FF 00 F0 B2 F8 22 4A 93 6E 43 F0 40 43
    0330: 93 66 05 22 20 4B C3 F8 48 21 1C 22 C3 F8 38 23
    0340: A3 F5 70 33 08 22 5A 60 C3 F8 84 20 25 68 00 F0
    0350: 1B F9 22 68 AB 1A 23 F0 7F 43 B3 42 F7 D9 08 21
    0360: 16 4B 15 46 C3 F8 8C 10 F1 E7 00 BF 00 00 00 20
    0370: 10 16 00 60 60 01 00 20 00 10 00 20 C0 32 00 20
    0380: 9C ED 00 E0 94 ED 00 E0 25 00 08 03 A0 ED 00 E0
    0390: 12 00 00 20 25 00 08 13 14 00 00 40 33 00 10 13
    03A0: 15 00 00 60 2F 00 0B 07 A0 86 01 00 10 E0 00 E0
    03B0: 18 E0 00 E0 00 C0 0F 40 00 80 1F 40 00 C0 1B 40
    03C0: 30 B5 23 4C E9 B9 00 2A 22 4B 48 F2 80 02 4F F0
    03D0: 01 01 08 BF 80 22 A3 64 19 60 18 BF 4F F4 80 31
    03E0: 5A 60 4F F0 00 02 08 BF 00 21 9A 60 1A 4B E2 64
    03F0: D3 F8 B0 21 42 F4 80 32 C3 F8 B0 21 17 4A 11 60
    0400: 21 E0 09 04 16 4B 01 25 41 F0 80 01 98 60 A3 60
    0410: 59 60 00 F5 80 51 1D 60 D9 60 00 F5 00 51 19 61
    0420: 00 F5 40 51 00 F5 80 40 59 61 98 61 00 23 E3 60
    0430: 09 4B D3 F8 B0 11 29 43 C3 F8 B0 11 D3 F8 B0 11
    0440: 00 29 FB D1 BF E7 D3 F8 B0 21 00 2A FB D1 30 BD
    0450: 00 30 00 20 20 20 00 20 00 00 2E 40 00 10 00 20
    0460: 00 20 00 20 08 4B 00 22 6F F0 7F 41 1A 60 64 22
    0470: 50 43 1A 46 13 68 C3 F1 FF 13 03 F5 7F 43 83 42
    0480: F8 D3 8B 42 F6 D0 70 47 18 E0 00 E0 35 4A 00 20
    0490: D2 F8 80 30 43 F0 03 03 70 B5 C2 F8 80 30 03 23
    04A0: 31 4A 53 71 02 23 04 32 98 54 01 33 2F 49 16 2B
    04B0: FA D1 2F 4B 0A 24 2F 4A D3 F8 20 32 23 F0 7F 43
    04C0: 93 42 9C BF 0A 22 53 43 02 22 B3 FB F4 F0 04 FB
    04D0: 10 33 30 33 8B 54 03 46 90 1C 0B B1 02 46 F4 E7
    04E0: 21 4B 18 71 02 23 93 42 06 D2 C8 5C 8C 5C CC 54
    04F0: 02 33 88 54 02 3A F6 E7 40 F6 61 72 1E 4B 1F 4C
    0500: C3 F8 20 21 4F F0 00 43 63 63 1D 4B D3 F8 40 21
    0510: 42 F0 02 02 C3 F8 40 21 D3 F8 40 51 18 4E 15 F0
    0520: 02 05 F9 D1 4F F0 00 43 32 20 A3 63 FF F7 9A FF
    0530: 4F F0 80 43 4F F4 20 72 29 46 A3 63 0A 23 25 60
    0540: 10 4C C6 F8 A8 31 20 46 00 F0 5A F9 4F F4 81 03
    0550: 23 60 4F F4 80 03 23 64 01 23 C6 F8 58 41 C6 F8
    0560: 40 31 70 BD 00 C0 0F 40 00 10 00 20 04 10 00 20
    0570: 00 44 1F 40 7F 96 98 00 00 80 0D 40 00 90 0D 40
    0580: 00 00 2E 40 00 30 00 20 2D E9 F8 43 90 4D D5 F8
    0590: 44 61 F0 07 C5 F8 44 61 0C D4 71 06 40 F1 10 81
    05A0: 8B 4B D3 F8 AC 21 C3 F8 AC 21 D3 F8 BC 21 C3 F8
    05B0: BC 21 FA E0 D5 F8 BC 31 BB B1 86 4C C5 F8 BC 31
    05C0: 22 68 13 42 11 D0 84 4B 00 27 1A 68 83 4B 27 60
    05D0: 9A 42 08 D1 01 20 FF F7 45 FF C5 F8 40 71 32 20
    05E0: FF F7 40 FF FB BE 00 23 A3 76 79 4C D4 F8 AC 31
    05F0: 25 46 27 46 00 2B D0 D0 79 49 C4 F8 AC 31 8B 6A
    0600: D1 F8 2C E0 D4 F8 40 21 42 F4 00 52 C4 F8 40 21
    0610: D4 F8 40 21 92 04 F5 D5 D5 F8 40 21 22 F4 00 52
    0620: C5 F8 40 21 4F F0 01 12 C5 F8 B4 21 D7 F8 B4 21
    0630: 12 F0 01 18 FA D1 9A B2 40 F2 81 60 82 42 0D D8
    0640: B2 F5 D0 6F 3C D2 B2 F5 A0 6F 40 F0 A5 80 1B 0C
    0650: 5B 06 43 F0 80 73 C5 F8 54 31 97 E0 40 F6 21 10
    0660: 82 42 00 F0 85 80 40 F6 21 20 82 42 00 F0 8E 80
    0670: B2 F5 10 6F 40 F0 90 80 C3 F3 07 43 DF F8 54 91
    0680: 58 4A 40 46 89 F8 1B 30 57 4B C5 F8 C4 21 C5 F8
    0690: C8 31 C5 F8 CC 21 4F F4 00 12 C5 F8 D0 31 4F F4
    06A0: 80 03 C1 F8 00 21 42 46 C1 F8 C0 30 C1 F8 C0 31
    06B0: C1 F8 00 32 41 46 FF F7 83 FE 89 F8 1A 80 71 E0
    06C0: 1B 0C 4A 4A 1F FA 8E FC 90 68 00 28 64 D0 B2 F8
    06D0: 00 80 98 45 4A D1 B2 F8 02 80 E0 45 46 D1 1B 0A
    06E0: 03 2B 4F EA 1E 43 0C BF 02 78 92 88 93 42 28 BF
    06F0: 13 46 93 B9 3E 4B 01 22 4F F0 80 08 8B 60 83 E8
    0700: 04 01 00 22 9A 60 CA 60 D4 F8 B0 31 43 F0 01 03
    0710: C4 F8 B0 31 2F 4B 1A 60 23 E0 1B 04 35 4A 4F F0
    0720: 01 0E 43 F0 80 03 90 60 C2 F8 00 E0 53 60 00 F5
    0730: 80 53 8A 64 D3 60 00 F5 00 53 13 61 00 F5 40 53
    0740: 00 F5 80 40 53 61 00 23 90 61 CB 64 D4 F8 B0 31
    0750: 43 F4 80 33 C4 F8 B0 31 D5 F8 B0 31 00 2B FB D1
    0760: C8 E7 D5 F8 B0 31 00 2B FB D1 1B E0 0C 32 AB E7
    0770: 21 4A 93 42 10 D1 21 4B 9E 45 0D D1 15 4B 9A 7E
    0780: 52 B9 01 22 04 21 14 48 9A 76 02 E0 42 46 41 46
    0790: 40 46 FF F7 15 FE 05 E0 D4 F8 C0 31 43 F0 01 13
    07A0: C4 F8 C0 31 D4 F8 AC 31 24 E7 D3 F8 B0 11 08 4A
    07B0: 00 29 FA D1 4F F0 FF 33 C2 F8 B4 31 D2 F8 84 31
    07C0: 73 07 02 D5 02 4B D3 F8 84 31 BD E8 F8 83 00 BF
    07D0: 00 00 2E 40 00 10 00 20 80 32 00 20 A9 45 C2 6B
    07E0: 00 30 00 20 02 00 CC 00 CC 00 02 00 00 00 00 20
    07F0: 20 20 00 20 00 20 00 20 21 09 00 03 01 00 04 00
    0800: 02 44 03 46 93 42 02 D0 03 F8 01 1B FA E7 70 47
    0810: 00 01 00 00 12 00 00 00 D1 00 00 20 00 02 00 00
    0820: 49 00 00 00 88 00 00 20 00 22 01 00 21 00 00 00
    0830: 21 01 00 20 00 21 01 00 09 00 00 00 BA 00 00 20
    0840: 00 22 00 00 1C 00 00 00 42 01 00 20 00 21 00 00
    0850: 09 00 00 00 9A 00 00 20 00 03 00 00 04 00 00 00
    0860: 84 00 00 20 01 03 09 04 18 00 00 00 09 01 00 20
    0870: 02 03 09 04 26 00 00 00 E3 00 00 20 03 03 09 04
    0880: 16 00 00 00 04 10 00 20 00 00 00 00 00 00 00 00
    0890: 00 00 00 00 04 03 09 04 09 02 49 00 02 01 00 C0
    08A0: 32 09 04 00 00 02 03 00 00 00 09 21 11 01 00 01
    08B0: 22 1C 00 07 05 83 03 40 00 01 07 05 04 03 40 00
    08C0: 01 09 04 01 00 02 03 00 00 00 09 21 11 01 00 01
    08D0: 22 21 00 07 05 81 03 40 00 01 07 05 02 03 20 00
    08E0: 02 12 01 10 01 00 00 00 40 C0 16 86 04 79 02 01
    08F0: 02 03 01 26 03 54 00 65 00 65 00 6E 00 73 00 79
    0900: 00 64 00 75 00 69 00 6E 00 6F 00 20 00 52 00 61
    0910: 00 77 00 48 00 49 00 44 00 18 03 54 00 65 00 65
    0920: 00 6E 00 73 00 79 00 64 00 75 00 69 00 6E 00 6F
    0930: 00 06 C9 FF 09 04 A1 5C 75 08 15 00 26 FF 00 95
    0940: 40 09 75 81 02 95 20 09 76 91 02 95 04 09 76 B1
    0950: 02 C0 06 AB FF 0A 00 02 A1 01 75 08 15 00 26 FF
    0960: 00 95 40 09 01 81 02 95 40 09 02 91 02 C0 00 00
    It is indeed a perfect match to the first 2416 bytes of hexdump from your board. So it seems something is going wrong when the restore process copies it back to the first sector of the flash chip. At least your board and probably all the other beta #2 boards have a correctly written restore image in their top flash sector. That's really good news, since I set the permanent sector lock in the flash chips, so those top 4096 bytes can never be erased or altered.

    The problem Frank described in msg #2579 is probably also the restore process failing to copy the entire image correctly. Looks like I'll be making a bootloader update in the next week or two....

  8. #2633
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    Quote Originally Posted by KurtE View Post
    T3.6 26300

    So T3... goes pretty fast
    Yup, a *lot* of optimization work when into that.

    Those lines are 33 bytes, when the speed prints as a 5 digit number. So 26300 lines/sec is 867.9 kbytes/sec.

    When I run it here on a fast Linux desktop and through a good quality multi-TT hub, I see 31030 lines/sec. At 33 bytes/line, that's 1023.9 kbytes/sec. When everything is running very efficiently, you tend to see the bandwidth in a multiple of 64000 bytes/sec. The packets are 64 bytes. So 1024 kbytes/sec means the system is managing to fit 16 max size packets into each 1ms USB frame. The absolute theoretical USB maximum (not accounting for bit stuffing overhead) is 19 packets per frame.

    This analysis and admittedly most of my device-side experience is based on 12 Mbit speed where the max packet size for bulk protocol is 64 bytes. On T4 we have 480 Mbit speed. The frames are 125 us. We're currently still using only 64 byte packets, but the protocol allows 512. On my to-do list is moving the different USB descriptors depending on the speed, so we can go up to 512 but still be compatible if 12 Mbit speed is used.

    But before I really start on optimizations, I need to figure out why the latency test is failing in certain sizes. Probably a bug on my part...

  9. #2634
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,091
    Quote Originally Posted by mjs513 View Post
    @PaulStoffregen

    Ran your speed test on a variety of boards besides the T4 for later comparison. Wow you are right abou the Arduino boards. Even my Nucleo-H7 board has some issues. Couldn't find my Arduino Due board - forgot where I put it.

    Code:
    	T4 (9f4d02c)	T3.6	T.3.5	2.0++	Uno	mega	Nucleo144-H7	101    DUE
    Avg	    3999	26873.5	26877.5	8577	1268	1267.5	2818.5         475.5   376.5
    min	    3998	26841	26837	8568	1267	1266	2818	        475     376
    max	    4000	26906	26918	8586	1269	1269	2819	        476     377
    With T4B2 1.47-beta1 and latest github core with printf's in USB core commented out, i get (lines/sec)
    Code:
           T4B2  T3.6
    avrg  25299  29083
    min   23277  28982
    max   25587  29092
    host is linux laptop Ubuntu (16.04), and I cut/paste from Sermon.

    Code:
                    latency @ 1 bytes: 1.00 ms average, 1.11 maximum
                    latency @ 2 bytes: 1.00 ms average, 1.34 maximum
                    latency @ 12 bytes: 1.00 ms average, 1.08 maximum
                    latency @ 30 bytes: 1.00 ms average, 1.04 maximum
                    latency @ 62 bytes: 1.00 ms average, 1.08 maximum
                    latency @ 71 bytes: 1.00 ms average, 1.05 maximum
                    error reading result, r=0
                    error reading result, r=0
                    error reading result, r=0
                    latency @ 128 bytes: 30.24 ms average, 1000.00 maximum
                    latency @ 500 bytes: 0.97 ms average, 1.13 maximum
                    latency @ 1000 bytes: 1.97 ms average, 2.08 maximum
                    latency @ 2000 bytes: 3.94 ms average, 4.13 maximum
                    latency @ 4000 bytes: 7.88 ms average, 10.26 maximum
                    latency @ 8000 bytes: 15.52 ms average, 16.10 maximum
    On an older Ubuntu 14.04 with USB hub (part of Dell monitor)
    Code:
           T4B2  T3.6
    avrg  25463  34738
    min   25290  34612
    max   25595  34910
    redirected kermit output to a file for 5 seconds or do
    stty -F /dev/ttyACM0 115200
    cat /dev/ttyACM0 > /tmp/tmp

    and ctrl-c after 5 seconds or so, and get rid of first 100000 lines or so.
    Code:
    T3.6 to linux64
    tail -n 300000 /tmp/tmp | awk -F = '{print $3}' | stats
    300000 pts min 35626.000000 max 35709.000000 avrg 35656.383903 cov 0.000697 rmse 24.860356
    Last edited by manitou; 05-04-2019 at 10:22 AM.

  10. #2635
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    @manitou
    I just commented all the printf's in usb_serial and still get the same lines/sec on the Windows10x64 machine. Is there some place else I need to comment out printf's? Your numbers are quite a bit different than what I am getting for the T4B2

  11. #2636
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,060
    @mjs513 - Look in usb.c line 597

    I think I mentioned the output in post #2620... His results loook similar to what I was seeing except I did not update yet to continue after error...

  12. #2637
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    Quote Originally Posted by KurtE View Post
    @mjs513 - Look in usb.c line 597

    I think I mentioned the output in post #2620... His results loook similar to what I was seeing except I did not update yet to continue after error...
    Commented that line out as well but no change in lines/sec. Now here is something strange. I left all the printf's in and got the same lines/sec.

  13. #2638
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    Quote Originally Posted by mjs513 View Post
    Your numbers are quite a bit different than what I am getting for the T4B2
    Pretty sure this is a Linux vs Windows difference. I'm seeing ~11000 lines/sec here, regardless of the debug printing.

    Which chip your computer's motherboard has could also make quite a difference. T4 is extremely fast but the code has no optimization at all. You can expect widely varying results depending on small USB details that normally wouldn't be very important!

    To completely turn off all debug printing, comment out the first line in this file.

    https://github.com/PaulStoffregen/co...debug/printf.h

    However, looks like some direct calls to printf_debug() slipped into the code some time ago. Fixed it just now. Grab the latest commit to be able to use this to turn off all Serial4 printing.

  14. #2639
    Senior Member+ manitou's Avatar
    Join Date
    Jan 2013
    Posts
    2,091
    Quote Originally Posted by mjs513 View Post
    @manitou
    I just commented all the printf's in usb_serial and still get the same lines/sec on the Windows10x64 machine. Is there some place else I need to comment out printf's? Your numbers are quite a bit different than what I am getting for the T4B2
    Actually, I'm not sure how one is supposed to capture the output from the sketch? I was copying and pasting from Sermon on Ubuntu, but that seems like the speed of the display might be a factor ?? There seems to a be some jitter in lines/second. I'm not sure best way to capture free-running /dev/ttyACM0 ...

  15. #2640
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    Quote Originally Posted by PaulStoffregen View Post
    Pretty sure this is a Linux vs Windows difference. I'm seeing ~11000 lines/sec here, regardless of the debug printing.

    Which chip your computer's motherboard has could also make quite a difference. T4 is extremely fast but the code has no optimization at all. You can expect widely varying results depending on small USB details that normally wouldn't be very important!

    To completely turn off all debug printing, comment out the first line in this file.

    https://github.com/PaulStoffregen/co...debug/printf.h

    However, looks like some direct calls to printf_debug() slipped into the code some time ago. Fixed it just now. Grab the latest commit to be able to use this to turn off all Serial4 printing.
    I just downloaded the latest core change and no difference to speak of, instead of 3999 I am seeing 4000. I did try to comment that line out but like you said some got through so got errors thank your for fixing.

    I don't have any machines that run Linux so have nothing to compare to. Never realized that there would be that much variation. But feel better now that its not a problem on my end with USB. Its a Linux vs Windows issue from what you are indicating. Still some difference between the two OS's.

  16. #2641
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,253
    got this error
    Code:
    C:\Users\Walter\Documents\arduino-1.8.8\hardware\teensy\avr\cores\teensy4\usb_desc.h:212:28: error: 'ENDPOINT_RECEIVE_ONLY' undeclared (first use in this function)
    
       #define ENDPOINT2_CONFIG ENDPOINT_RECEIVE_ONLY

  17. #2642
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,060
    @mjs513 @Paul... -
    If it Run it on my Linux box, (My Old dev machine I7 about gen 2) with those lines commented out... Have not picked up the last changes yet...

    I think I am seeing the T4 output about 27500 lines/sec...

    Now Same sketch moved to my Windows machine (did not reprogram T4)...
    On Windows 10 machine running on Hub: 4000

    Without hub still 4000.
    Using Same HUb that was on Linux box still 4K

    Using Same HUB with TyCommander: still 4K...

    So it does look like Windows...

    And for what it is worth if I connect it up to an RPI3 board and do cat /dev/ttyACM0
    Which is going through my putty (KiTTy) window.

    I am getting about 6950 lines/sec
    Last edited by KurtE; 05-01-2019 at 05:02 PM.

  18. #2643
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,695
    Quote Originally Posted by KurtE View Post
    @mjs513 @Paul... -
    If it Run it on my Linux box, (My Old dev machine I7 about gen 2) with those lines commented out... Have not picked up the last changes yet...

    I think I am seeing the T4 output about 27500 lines/sec...

    Now Same sketch moved to my Windows machine (did not reprogram T4)...
    On Windows 10 machine running on Hub: 4000

    Without hub still 4000.
    Using Same HUb that was on Linux box still 4K

    Using Same HUB with TyCommander: still 4K...

    So it does look like Windows...
    @KurtE

    Have no doubt that its a windows problem - just surprising since I never ran Linux before - sometime this week going to look for one of old laptops and actually install Linux but that is going to be a whole nother adventure for me

  19. #2644
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,051
    Hopefully the edited added feedback and stat points in the updated latency_test.exe will allow confirming functionality and give a quick way to watch perf changes by comaprison.

    Quote Originally Posted by PaulStoffregen View Post
    ...

    The new code has absolutely no optimizations yet, and it still appears to have some serious bugs. I'm about to start looking into why the latency test is failing with 64 bytes.

    On benchmarking, here's another really simple speed test I wrote yesterday. Let it run for about 5 seconds, or until the lines/sec number becomes fairly stable.
    ...
    Mods to that into lps_test.c { made for Windows Only } takes the SerMon out of the picture: This is clearly a Windows limitation - it reads blocks of 8K chars and prints limited view and repeats for 50 times::
    ** NOT See following post - same code runs MUCH faster on T_3.x :: instead of 4K it gets 55K to 71K
    Code:
    T:\T_Downloads\pjrc_latency_test>lps_test.exe COM25
    port COM25 opened
    
     #0 :
    >>, lines/sec=3956
    count=10229436, lines/sec=1991
    count=10229437, lines/sec=1
    c<<
    
    // …
    
     #49 :
    >>
    count=10242165, lines/sec=3957
    count=10242166, lines/sec=3957
    count=10242167,<< ------         elapsed time 3.277 secs for 404800 bytes
    Leave the provided #2626 sketch run on Teensy,
    Attached Files Attached Files
    Last edited by defragster; 05-02-2019 at 12:35 AM.

  20. #2645
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,051
    Should have tried that on T_3.2 before posting:

    #49 :
    >>c=55520
    count=10712847, lines/sec=55520
    count=10712848, lines/sec=55520
    count<<
    TyCommander Sermon T_3.1:
    Code:
    count=20561130, lines/sec=26814
    count=20561131, lines/sec=26814
    count=20561132, lines/sec=26814
    T_3.6 at 180 MHz:
    #49 :
    >>t=11525506, lines/sec=71516
    count=11525507, lines/sec=71516
    count=11525508, li<<
    TyCommander Sermon T_3.6:
    Code:
    count=22780433, lines/sec=26862
    count=22780434, lines/sec=26862
    count=22780435, lines/sec=26862
    TyCommander Sermon T_4:
    Code:
    count=10019464, lines/sec=3995
    count=10019465, lines/sec=3995
    count=10019466, lines/sec=3995

  21. #2646
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    Quote Originally Posted by WMXZ View Post
    got this error
    Code:
    C:\Users\Walter\Documents\arduino-1.8.8\hardware\teensy\avr\cores\teensy4\usb_desc.h:212:28: error: 'ENDPOINT_RECEIVE_ONLY' undeclared (first use in this function)
    So far, Serial is the only USB Type setting which compiles. Not going to work on the others until more issues are resolved.

  22. #2647
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    Quote Originally Posted by defragster View Post
    #49 :
    >>t=11525506, lines/sec=71516
    count=11525507, lines/sec=71516
    count=11525508, li<<
    I don't understand this. For 12 Mbit/sec USB, 36848 lines/sec should be the theoretical maximum.

  23. #2648
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,051
    Quote Originally Posted by PaulStoffregen View Post
    I don't understand this. For 12 Mbit/sec USB, 36848 lines/sec should be the theoretical maximum.
    I can't speak to that ... unless 50 iterations aren't enough to stabilize … Teensy sketch is printing that information? Perhaps a bug there in calc as on some occasions SerMon will show numbers faster than that on T_3.x. It will show numbers of 134K at startup and 44K T_3.1

    Unless there is LOSS of data in the process? where buffered data is dumped?

    The posted ZIP has the mods I made to get this from Windows. Somebody with Linux could examine my code and make it work on Linux. There may be settings on the port that don't preclude data dumping to bit bucket.

    I'll go up the count from 50 to 100 ...

  24. #2649
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,051
    opps - it just was not stabilized with 50 8K reads. Here is the same with 200 64K reads:


    T_3.1 Com 19:
    Code:
     #199 :
    >> lines/sec=25258
    count=112147896, lines/sec=25258
    count=112147897, lines/sec=2<< ------    elapsed time 15.180 secs for 1619200 bytes
    T_3.6 Com 8:
    Code:
     #199 :
    >>299982629, lines/sec=25946
    count=299982630, lines/sec=25946
    count=299982631, l<< ------      elapsed time 15.034 secs for 1619200 bytes
    No change for T_4-2 COM 25:
    Code:
      #199 :
    >>count=10500344, lines/sec=3967
    count=10500345, lines/sec=3967
    count=10500346, << ------        elapsed time 104.117 secs for 1619200 bytes
    Source for :: lps_test.c - LINUX/MAC code not updated to above in receive_bytes_X()
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdarg.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <sys/time.h>
    #include <fcntl.h>
    #include <unistd.h>
    #include <errno.h>
    
    // One of these must be defined, usually via the Makefile
    //#define MACOSX
    //#define LINUX
    #define WINDOWS
    
    #if defined(MACOSX) || defined(LINUX)
    #include <termios.h>
    #include <sys/select.h>
    #define PORTTYPE int
    #define BAUD B115200
    #if defined(LINUX)
    #include <sys/ioctl.h>
    #include <linux/serial.h>
    #endif
    #elif defined(WINDOWS)
    #include <windows.h>
    #define PORTTYPE HANDLE
    #define BAUD 115200
    #define gettimeofday gettimeofday_
    #else
    #error "You must define the operating system\n"
    #endif
    
    // function prototypes
    PORTTYPE open_port_and_set_baud_or_die(const char *name, long baud);
    void close_port(PORTTYPE port);
    int receive_bytes_X(PORTTYPE port, int cnt);
    void delay(double sec);
    void die(const char *format, ...) __attribute__ ((format (printf, 1, 2)));
    
    int gettimeofday(struct timeval * tp, struct timezone * tzp);
    
    
    int receive_bytes_X(PORTTYPE port, int cnt)
    {
    	int count=0;
    	char buf[65537];
    	int len;
    	len = sizeof(buf)-1;
    #if defined(MACOSX) || defined(LINUX)
    	int r;
    	int retry=0;
    
    	// non-blocking read mode
    	fcntl(port, F_SETFL, fcntl(port, F_GETFL) | O_NONBLOCK);
    	while (count < len) {
    		r = read(port, buf + count, len - count);
    		//printf("read, r = %d\n", r);
    		if (r < 0 && errno != EAGAIN && errno != EINTR) return -1;
    		else if (r > 0) count += r;
    		else {
    			// no data available right now, must wait
    			fd_set fds;
    			struct timeval t;
    			FD_ZERO(&fds);
    			FD_SET(port, &fds);
    			t.tv_sec = 1;
    			t.tv_usec = 0;
    			r = select(port+1, &fds, NULL, NULL, &t);
    			//printf("select, r = %d\n", r);
    			if (r < 0) return -1;
    			if (r == 0) return count; // timeout
    		}
    		retry++;
    		if (retry > 1000) return -100; // no input
    	}
    	fcntl(port, F_SETFL, fcntl(port, F_GETFL) & ~O_NONBLOCK);
    #elif defined(WINDOWS)
    	COMMTIMEOUTS timeout;
    	DWORD n;
    	BOOL r;
    	int waiting=0;
    
    	GetCommTimeouts(port, &timeout);
    	timeout.ReadIntervalTimeout = MAXDWORD; // non-blocking
    	timeout.ReadTotalTimeoutMultiplier = 0;
    	timeout.ReadTotalTimeoutConstant = 0;
    	SetCommTimeouts(port, &timeout);
    	for ( int ii=0; ii<cnt; ii++ ) {
    		count = 0;
    		while (count < len) {
    			r = ReadFile(port, buf + count, len - count, &n, NULL);
    			if (!r) die("read error\n");
    			if (n > 0) count += n;
    			else {
    				if (waiting) break;  // 1 sec timeout
    				timeout.ReadIntervalTimeout = MAXDWORD;
    				timeout.ReadTotalTimeoutMultiplier = MAXDWORD;
    				timeout.ReadTotalTimeoutConstant = 1000;
    				SetCommTimeouts(port, &timeout);
    				waiting = 1;
    			}
    		}
    		buf[80]=0;
    		if ( !(ii%8) || (ii>cnt-8) ) printf( "\n #%d : \n>>%s<<", ii, buf );
    	}
    #endif
    	return count;
    }
    
    int main(int argc, char **argv)
    {
    	PORTTYPE fd;
    	int blkcnt = 200;
    
    	if (argc < 2) die("Usage: :) latency_test3 <comport>\n");
    	fd = open_port_and_set_baud_or_die(argv[1], BAUD);
    //	if (argc > 2 && argv[2][0] >'0' && argv[2][0] <= '9' ) loopNum = argv[2][0] - '0';
    //	if (argc > 3 && argv[3][0] >'0' && argv[3][0] <= '9' ) numLengths = ( numLengths / (argv[3][0] - '0'));
    	printf("port %s opened\n", argv[1]);
    
    	struct timeval begin, end;
    	double elapsed;
    	int totBytes;
    
    	int r;
    	// test begin
    	gettimeofday(&begin, NULL);
    	r = receive_bytes_X( fd, blkcnt );
    	gettimeofday(&end, NULL);
    	elapsed = (double)(end.tv_sec - begin.tv_sec) * 1000.0;
    	elapsed += (double)(end.tv_usec - begin.tv_usec) / 1000.0;
    	printf(" ------\t elapsed time %.3f secs for %d bytes\n", elapsed/1000.0, blkcnt*8096 );
    	close_port(fd);
    	return 0;
    }
    
    
    /**********************************/
    /*  Serial Port Functions         */
    /**********************************/
    
    
    PORTTYPE open_port_and_set_baud_or_die(const char *name, long baud)
    {
    	PORTTYPE fd;
    #if defined(MACOSX)
    	struct termios tinfo;
    	fd = open(name, O_RDWR | O_NONBLOCK);
    	if (fd < 0) die("unable to open port %s\n", name);
    	if (tcgetattr(fd, &tinfo) < 0) die("unable to get serial parms\n");
    	if (cfsetspeed(&tinfo, baud) < 0) die("error in cfsetspeed\n");
    	tinfo.c_cflag |= CLOCAL;
    	if (tcsetattr(fd, TCSANOW, &tinfo) < 0) die("unable to set baud rate\n");
    	fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) & ~O_NONBLOCK);
    #elif defined(LINUX)
    	struct termios tinfo;
    	struct serial_struct kernel_serial_settings;
    	int r;
    	fd = open(name, O_RDWR);
    	if (fd < 0) die("unable to open port %s\n", name);
    	if (tcgetattr(fd, &tinfo) < 0) die("unable to get serial parms\n");
    	if (cfsetspeed(&tinfo, baud) < 0) die("error in cfsetspeed\n");
    	if (tcsetattr(fd, TCSANOW, &tinfo) < 0) die("unable to set baud rate\n");
    	r = ioctl(fd, TIOCGSERIAL, &kernel_serial_settings);
    	if (r >= 0) {
    		kernel_serial_settings.flags |= ASYNC_LOW_LATENCY;
    		r = ioctl(fd, TIOCSSERIAL, &kernel_serial_settings);
    		if (r >= 0) printf("set linux low latency mode\n");
    	}
    #elif defined(WINDOWS)
    	COMMCONFIG cfg;
    	COMMTIMEOUTS timeout;
    	DWORD n;
    	char portname[256];
    	int num;
    	if (sscanf(name, "COM%d", &num) == 1) {
    		sprintf(portname, "\\\\.\\COM%d", num); // Microsoft KB115831
    	} else {
    		strncpy(portname, name, sizeof(portname)-1);
    		portname[n-1] = 0;
    	}
    	fd = CreateFile(portname, GENERIC_READ | GENERIC_WRITE,
    		0, 0, OPEN_EXISTING, 0, NULL);
    	if (fd == INVALID_HANDLE_VALUE) die("unable to open port %s\n", name);
    	GetCommConfig(fd, &cfg, &n);
    	//cfg.dcb.BaudRate = baud;
    	cfg.dcb.BaudRate = 115200;
    	cfg.dcb.fBinary = TRUE;
    	cfg.dcb.fParity = FALSE;
    	cfg.dcb.fOutxCtsFlow = FALSE;
    	cfg.dcb.fOutxDsrFlow = FALSE;
    	cfg.dcb.fOutX = FALSE;
    	cfg.dcb.fInX = FALSE;
    	cfg.dcb.fErrorChar = FALSE;
    	cfg.dcb.fNull = FALSE;
    	cfg.dcb.fRtsControl = RTS_CONTROL_ENABLE;
    	cfg.dcb.fAbortOnError = FALSE;
    	cfg.dcb.ByteSize = 8;
    	cfg.dcb.Parity = NOPARITY;
    	cfg.dcb.StopBits = ONESTOPBIT;
    	cfg.dcb.fDtrControl = DTR_CONTROL_ENABLE;
    	SetCommConfig(fd, &cfg, n);
    	GetCommTimeouts(fd, &timeout);
    	timeout.ReadIntervalTimeout = 0;
    	timeout.ReadTotalTimeoutMultiplier = 0;
    	timeout.ReadTotalTimeoutConstant = 1000;
    	timeout.WriteTotalTimeoutConstant = 0;
    	timeout.WriteTotalTimeoutMultiplier = 0;
    	SetCommTimeouts(fd, &timeout);
    #endif
    	return fd;
    
    }
    
    
    
    void close_port(PORTTYPE port)
    {
    #if defined(MACOSX) || defined(LINUX)
    	close(port);
    #elif defined(WINDOWS)
    	CloseHandle(port);
    #endif
    }
    
    
    /**********************************/
    /*  Misc. Functions               */
    /**********************************/
    
    void delay(double sec)
    {
    #if defined(MACOSX) || defined(LINUX)
    	usleep(sec * 1000000);
    #elif defined(WINDOWS)
    	Sleep(sec * 1000);
    #endif
    }
    
    
    void die(const char *format, ...)
    {
    	va_list args;
    	va_start(args, format);
    	vfprintf(stderr, format, args);
    	exit(1);
    }
    
    #if defined(WINDOWS)
    static const unsigned __int64 epoch = ((unsigned __int64) 116444736000000000ULL);
    int
    gettimeofday_(struct timeval * tp, struct timezone * tzp)
    {
        FILETIME    file_time;
        SYSTEMTIME  system_time;
        ULARGE_INTEGER ularge;
    
        GetSystemTime(&system_time);
        SystemTimeToFileTime(&system_time, &file_time);
        ularge.LowPart = file_time.dwLowDateTime;
        ularge.HighPart = file_time.dwHighDateTime;
    
        tp->tv_sec = (long) ((ularge.QuadPart - epoch) / 10000000L);
        tp->tv_usec = (long) (system_time.wMilliseconds * 1000);
    
        return 0;
    }
    #endif

  25. #2650
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,321
    I see you're using Windows GetSystemTime() as a replacement for Unix gettimeofday(). Not good. Use GetSystemTimePreciseAsFileTime(). Though for this sort of use, redesigning the code to use QueryPerformanceCounter() is probably the most correct way, and that would work on Windows 7 and earlier.

    Still, I don't understand how the results could be so wrong. If this is still unresolved *much* later after the USB code has been pretty well optimized, remind me and maybe I'll look into it. But for now, trying to focus on debugging Teensy rather than debugging benchmarks.
    Last edited by PaulStoffregen; 05-01-2019 at 07:56 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •