Teensy 4.0 First Beta Test

Status
Not open for further replies.
FWIW, in https://www.pjrc.com/teensy/usb_serial.html there was a USBsend sketch and host receiver (serial_listen.c or serial_read.c). For T3.6 it reports 1152.09 kbytes/sec to linux host with USB hub and 1152.11 kbytes/sec for T3.2. For T4B2 (latest github no debug printf's) it reports 7729.36 kbytes/sec -- a bit too fast me thinks, so maybe it's not a valid benchmark.

Code:
// https://forum.pjrc.com/threads/29078-USB-Transmission-speed
// https://www.pjrc.com/teensy/usb_serial.html
// USB Serial Transmit Bandwidth Test
// Written by Paul Stoffregen, paul@pjrc.com
// This benchmark code is in the public domain.
//
// Within 5 seconds of opening the port, this program
// will send a message as rapidly as possible, for 10 seconds.
//
// To run this benchmark test, use serial_read.exe (Windows) or
// serial_listen (Mac, Linux) program can read the data efficiently
// without saving it.
// http://www.pjrc.com/teensy/serial_listen.c
// http://www.pjrc.com/teensy/serial_read.c
// http://www.pjrc.com/teensy/serial_read.exe
//
// You can also run a terminal emulator and select the option
// to capture all text to a file.  However, some terminal emulators
// may limit the speed, depending upon how they update the screen
// and how efficiently their code processes the imcoming data.  The
// Arduino Serial Monitor is particularly slow.  Only use it to
// verify this sketch works.  For actual benchmarks, use the
// efficient receive tests above.
//
// Full disclosure: Paul is the author of Teensyduino. 
//
// Results can vary depending on the number of other USB devices
// connected.  For fastest results, disconnect all others.


#define USBSERIAL Serial       // for Leonardo, Teensy, Fubarino
//#define USBSERIAL SerialUSB  // for Due, Maple

void setup()
{
  USBSERIAL.begin(115200);
}

void loop()
{
  // wait for serial port to be opened
  while (!USBSERIAL) ;

  // give the user 5 seconds to enable text capture in their
  // terminal emulator, or do whatever to get ready
  for (int n=5; n; n--) {
    USBSERIAL.print("10 second speed test begins in ");
    USBSERIAL.print(n);
    USBSERIAL.println(" seconds.");
    if (!USBSERIAL) break;
    delay(1000);
  }

  // send a string as fast as possible, for 10 seconds
  unsigned long beginMillis = millis();
  do {
    USBSERIAL.print("USB Fast Serial Transmit Bandwidth Test, capture this text.\r\n");
  } while (millis() - beginMillis < 10000);
  USBSERIAL.println("done!");

  // after the test, wait forever doing nothing,
  // well, at least until the terminal emulator quits
  while (USBSERIAL) ;
}

EDIT:
win10x64 + USB hub: T3.6 870.32 KBs, T4B2 1197.95 KBs
macos: T3.6 1168.07 KBs, T4B2 6209.18 KBs
 
Last edited:
For T4B2 (latest github no debug printf's) it reports 7729.36 kbytes/sec -- a bit too fast me thinks, so maybe it's not a valid benchmark.

Might be a valid result, since T4 is using 480 Mbit/sec USB speed, and that test uses fairly large message sizes with little other work being done.

I made the lines/sec test to intentionally exercise the commonly used parts of Arduino's Print class and try to optimize this common case of lines assembled from several small fragments. Until the optimization is very good, we can expect to see much slower speeds in the lines/sec test than we get in this older and much "easier" test where a single, fairly large and fixed message is repeatedly sent.

According to the USB 2.0 spec, with the 64 byte max packet size we're using now, the theoretical best speed is 32,256,000 bytes/sec.

speeds.png

So 7.7 Mbyte/sec isn't too shocking, only about 24% of what should be possible with perfect optimization.

My hope is to eventually get into that 40-50 Mbyte/sec range! ;) ... and to achieve that speed when people use the Print class in ordinary ways to print text and numbers.
 
Last edited:
I committed a fix for USB serial receive.

Performance is still far from where I want to get, but it should at least pass the latency test now. Please let me know if you're able to get the test to fail again?
 
Just ran @defragster's latency test (one before last update) and it got through the test with no errors like before, again I am on a Win10x64 machine so...
Code:
port COM23 opened
waiting for board to be ready:
.ok
latency @    1 bytes: 3.28 ms average,  13 max hits,    0.00 2nd max,   16.00 maximum
latency @    2 bytes: 3.29 ms average,  14 max hits,    0.00 2nd max,   16.00 maximum
latency @   12 bytes: 3.44 ms average,  15 max hits,    15.00 2nd max,  16.00 maximum
latency @   16 bytes: 3.27 ms average,  13 max hits,    15.00 2nd max,  16.00 maximum
latency @   30 bytes: 3.44 ms average,  14 max hits,    0.00 2nd max,   16.00 maximum
latency @   31 bytes: 3.44 ms average,  14 max hits,    0.00 2nd max,   16.00 maximum
latency @   63 bytes: 3.44 ms average,  14 max hits,    0.00 2nd max,   16.00 maximum
latency @   64 bytes: 3.43 ms average,  14 max hits,    15.00 2nd max,  16.00 maximum
latency @   65 bytes: 6.72 ms average,  27 max hits,    0.00 2nd max,   16.00 maximum
latency @   71 bytes: 6.56 ms average,  26 max hits,    0.00 2nd max,   16.00 maximum
latency @  126 bytes: 6.72 ms average,  27 max hits,    0.00 2nd max,   16.00 maximum
latency @  127 bytes: 6.72 ms average,  27 max hits,    0.00 2nd max,   16.00 maximum
latency @  128 bytes: 6.72 ms average,  27 max hits,    0.00 2nd max,   16.00 maximum
latency @  129 bytes: 9.84 ms average,  40 max hits,    15.00 2nd max,  16.00 maximum
latency @  500 bytes: 26.56 ms average,         20 max hits,    31.00 2nd max,  32.00 maximum
latency @  512 bytes: 26.41 ms average,         23 max hits,    31.00 2nd max,  32.00 maximum
latency @  640 bytes: 33.08 ms average,         10 max hits,    46.00 2nd max,  47.00 maximum
latency @ 1000 bytes: 52.85 ms average,         12 max hits,    53.00 2nd max,  63.00 maximum
latency @ 1278 bytes: 65.99 ms average,         11 max hits,    69.00 2nd max,  79.00 maximum
latency @ 1279 bytes: 66.07 ms average,         10 max hits,    78.00 2nd max,  79.00 maximum
latency @ 1280 bytes: 66.09 ms average,          5 max hits,    63.00 2nd max,  79.00 maximum
latency @ 1281 bytes: 69.21 ms average,         15 max hits,    78.00 2nd max,  79.00 maximum
latency @ 2000 bytes: 105.46 ms average,        31 max hits,    93.00 2nd max,  110.00 maximum
latency @ 2047 bytes: 105.61 ms average,        29 max hits,    109.00 2nd max,         110.00 maximum
latency @ 2048 bytes: 105.45 ms average,        30 max hits,    109.00 2nd max,         110.00 maximum
latency @ 2049 bytes: 108.73 ms average,        34 max hits,    0.00 2nd max,   110.00 maximum
latency @ 4000 bytes: 207.62 ms average,        20 max hits,    0.00 2nd max,   219.00 maximum
latency @ 4095 bytes: 210.87 ms average,         7 max hits,    219.00 2nd max,         223.00 maximum
latency @ 4096 bytes: 210.85 ms average,        34 max hits,    218.00 2nd max,         219.00 maximum
latency @ 4097 bytes: 214.23 ms average,        46 max hits,    219.00 2nd max,         224.00 maximum
latency @ 8000 bytes: 411.52 ms average,        26 max hits,    421.00 2nd max,         422.00 maximum
 UP ----- pass #1        elapsed time 253.120 secs for 4106700 bytes
 
I committed a fix for USB serial receive.

Performance is still far from where I want to get, but it should at least pass the latency test now. Please let me know if you're able to get the test to fail again?

OK fetched latest cores from github and disabled debug printf, T4B2 latency_test on linux laptop
Code:
latency @ 1 bytes: 0.14 ms average, 0.35 maximum
latency @ 2 bytes: 0.13 ms average, 0.35 maximum
latency @ 12 bytes: 0.16 ms average, 2.50 maximum
latency @ 30 bytes: 0.12 ms average, 0.15 maximum
latency @ 62 bytes: 0.14 ms average, 1.21 maximum
latency @ 71 bytes: 0.25 ms average, 0.31 maximum
latency @ 128 bytes: 0.25 ms average, 0.33 maximum
latency @ 500 bytes: 0.38 ms average, 0.48 maximum
latency @ 1000 bytes: 0.50 ms average, 0.51 maximum
latency @ 2000 bytes: 0.79 ms average, 0.91 maximum
latency @ 4000 bytes: 1.36 ms average, 2.25 maximum
latency @ 8000 bytes: 2.39 ms average, 3.36 maximum

looks good. zoom zoom

on windows 10x64
Code:
latency @ 1 bytes: 0.16 ms average, 15.57 maximum
latency @ 2 bytes: 0.31 ms average, 15.62 maximum
latency @ 12 bytes: 0.31 ms average, 15.77 maximum
latency @ 30 bytes: 0.16 ms average, 15.63 maximum
latency @ 62 bytes: 0.31 ms average, 15.63 maximum
latency @ 71 bytes: 0.16 ms average, 15.62 maximum
latency @ 128 bytes: 0.31 ms average, 15.62 maximum
latency @ 500 bytes: 0.31 ms average, 15.62 maximum
latency @ 1000 bytes: 0.62 ms average, 15.63 maximum
latency @ 2000 bytes: 0.78 ms average, 15.63 maximum
latency @ 4000 bytes: 1.41 ms average, 15.68 maximum
latency @ 8000 bytes: 2.66 ms average, 15.68 maximum
 
Last edited:
@manitou
Reran on my win10x64 machine - you are right with debug_printf off I get pretty much the same numbers as you:
Code:
port COM23 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.22 ms average, 15.57 maximum
latency @ 2 bytes: 0.16 ms average, 15.62 maximum
latency @ 12 bytes: 0.31 ms average, 15.62 maximum
latency @ 30 bytes: 0.31 ms average, 15.62 maximum
latency @ 62 bytes: 0.22 ms average, 15.62 maximum
latency @ 71 bytes: 0.16 ms average, 15.62 maximum
latency @ 128 bytes: 0.31 ms average, 15.62 maximum
latency @ 500 bytes: 0.31 ms average, 15.62 maximum
latency @ 1000 bytes: 0.53 ms average, 15.62 maximum
latency @ 2000 bytes: 0.69 ms average, 15.64 maximum
latency @ 4000 bytes: 1.32 ms average, 15.65 maximum
latency @ 8000 bytes: 2.32 ms average, 15.67 maximum
Do have some variation which I expected based on Paul's earlier comment on machine chip configuration

EDIT: If I read my notes right its now better than the T3.6 as well. As @manitou said "vroom vroom vroom"
 
Last edited:
@...

I picked up the latest stuff and I tried on Windows10 64 bit and it does complete. Also does make a big difference when I have turned off the two printf statements (one in usb.c and other in usb_serial.c)
As you can see in:

Code:
C:\Users\kurte\Desktop\latency_test>latency_test.exe COM7
port COM7 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 3.74 ms average, 7.45 maximum
latency @ 2 bytes: 3.53 ms average, 7.59 maximum
latency @ 12 bytes: 3.61 ms average, 7.36 maximum
latency @ 30 bytes: 3.91 ms average, 7.42 maximum
latency @ 62 bytes: 4.18 ms average, 7.99 maximum
latency @ 71 bytes: 7.44 ms average, 11.37 maximum
latency @ 128 bytes: 7.46 ms average, 11.04 maximum
latency @ 500 bytes: 27.21 ms average, 31.16 maximum
latency @ 1000 bytes: 53.78 ms average, 57.18 maximum
latency @ 2000 bytes: 106.42 ms average, 109.88 maximum
latency @ 4000 bytes: 208.34 ms average, 211.96 maximum
latency @ 8000 bytes: 412.80 ms average, 416.85 maximum

C:\Users\kurte\Desktop\latency_test>latency_test.exe COM7
port COM7 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.26 ms average, 1.03 maximum
latency @ 2 bytes: 0.26 ms average, 1.18 maximum
latency @ 12 bytes: 0.26 ms average, 1.22 maximum
latency @ 30 bytes: 0.26 ms average, 1.24 maximum
latency @ 62 bytes: 0.25 ms average, 1.15 maximum
latency @ 71 bytes: 0.27 ms average, 1.21 maximum
latency @ 128 bytes: 0.25 ms average, 1.24 maximum
latency @ 500 bytes: 0.36 ms average, 1.35 maximum
latency @ 1000 bytes: 0.47 ms average, 1.42 maximum
latency @ 2000 bytes: 0.74 ms average, 1.64 maximum
latency @ 4000 bytes: 2.01 ms average, 3.83 maximum
latency @ 8000 bytes: 3.39 ms average, 5.04 maximum

C:\Users\kurte\Desktop\latency_test>
 
Updated cores for USB fix and results agree with KurtE - No Errors on Latency test { of course I'm using the updated version - went back to deprecated gettimeofday } - and much faster with those two debug print's gone.
** I removed the TWO bothersome prints - but left on "status =" with PRINT_DEBUG_STUFF define … opps …
> But that didn't make a change to numbers - though for the group below it goes up 0.1 secs in general when the other Teensy's are on the hub and active.

>> lps_test is the same or lower at ~2800 lines per second.

Updated version uses running 'A-Z' data for verification on receive, and it shows only ~5% of the 100 are at the MAX time - on multiple runs the ones over 5% change - so it isn't a particular size transfer with a problem
Code:
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM25
port COM25 opened
waiting for board to be ready:
.ok
latency @    1 bytes: 0.25 ms average,   5 max hits,    0.50 2nd max,   0.51 maximum
latency @    2 bytes: 0.25 ms average,   3 max hits,    0.50 2nd max,   0.51 maximum
latency @   12 bytes: 0.25 ms average,   3 max hits,    0.50 2nd max,   0.51 maximum
latency @   16 bytes: 0.25 ms average,   2 max hits,    0.50 2nd max,   0.69 maximum
latency @   30 bytes: 0.25 ms average,   3 max hits,    0.50 2nd max,   0.51 maximum
latency @   31 bytes: 0.25 ms average,   3 max hits,    0.50 2nd max,   0.51 maximum
latency @   63 bytes: 0.25 ms average,   4 max hits,    0.51 2nd max,   0.52 maximum
latency @   64 bytes: 0.26 ms average,   4 max hits,    0.63 2nd max,   0.70 maximum
latency @   65 bytes: 0.26 ms average,   4 max hits,    0.50 2nd max,   0.55 maximum
latency @   71 bytes: 0.25 ms average,   7 max hits,    0.50 2nd max,   0.63 maximum
latency @  126 bytes: 0.26 ms average,   2 max hits,    0.51 2nd max,   0.51 maximum
latency @  127 bytes: 0.26 ms average,   3 max hits,    0.50 2nd max,   0.71 maximum
latency @  128 bytes: 0.26 ms average,   3 max hits,    0.50 2nd max,   0.51 maximum
latency @  129 bytes: 0.27 ms average,   3 max hits,    0.50 2nd max,   0.75 maximum
latency @  500 bytes: 0.35 ms average,   4 max hits,    0.59 2nd max,   0.64 maximum
latency @  512 bytes: 0.35 ms average,   8 max hits,    0.51 2nd max,   0.99 maximum
latency @  640 bytes: 0.39 ms average,   9 max hits,    0.51 2nd max,   0.73 maximum
latency @ 1000 bytes: 0.50 ms average,   3 max hits,    0.99 2nd max,   1.00 maximum
latency @ 1278 bytes: 0.56 ms average,   2 max hits,    1.00 2nd max,   1.01 maximum
latency @ 1279 bytes: 0.56 ms average,   3 max hits,    1.00 2nd max,   1.01 maximum
latency @ 1280 bytes: 0.55 ms average,   7 max hits,    0.99 2nd max,   0.99 maximum
latency @ 1281 bytes: 0.57 ms average,   5 max hits,    0.99 2nd max,   1.04 maximum
latency @ 2000 bytes: 0.76 ms average,   8 max hits,    1.02 2nd max,   1.14 maximum
latency @ 2047 bytes: 0.76 ms average,   4 max hits,    1.04 2nd max,   1.04 maximum
latency @ 2048 bytes: 0.80 ms average,   6 max hits,    1.08 2nd max,   1.26 maximum
latency @ 2049 bytes: 0.78 ms average,   6 max hits,    1.06 2nd max,   1.10 maximum
latency @ 4000 bytes: 1.34 ms average,   7 max hits,    1.63 2nd max,   1.65 maximum
latency @ 4095 bytes: 1.39 ms average,   4 max hits,    1.60 2nd max,   1.61 maximum
latency @ 4096 bytes: 1.34 ms average,   4 max hits,    1.56 2nd max,   1.75 maximum
latency @ 4097 bytes: 1.37 ms average,   2 max hits,    1.49 2nd max,   1.96 maximum
latency @ 8000 bytes: 2.42 ms average,   5 max hits,    2.79 2nd max,   2.86 maximum
 UP ----- pass #1        elapsed time 1.867 secs for 4106700 bytes
 
Last edited:
I committed a fix for USB serial receive.

Performance is still far from where I want to get, but it should at least pass the latency test now. Please let me know if you're able to get the test to fail again?

Noted above - no fail on latency_test.

Paul - did you see the lines per sec sketch edit to keep the wild T_3.x numbers line based on Serial.availableForWrite()? Is that the best fix for those puzzling numbers when Serial is non-blocking?
 
SDFat Library

Started playing with the SDFat library, made a couple of changes and ran the SDInfo sketch using an external reader (had to reduce the SPI clock to 8Mhz - old reader). It compiles and runs part ways:
Code:
Card type: SDXC

Manufacturer ID: 0X3
OEM ID: SD
Product: SC64G
Version: 8.0
Serial number: 0XA2D04DD0
Manufacturing date: 9/2012

cardSize: 63864.57 MB (MB = 1,000,000 bytes)
flashEraseSize: 128 blocks
eraseSingleBlock: true
OCR: 0XC0FF8000

SD Partition Table
part,boot,type,start,length
1,0X0,0XC,63,124735425
2,0X0,0X0,0,0
3,0X0,0X0,0,0
4,0X0,0X0,0,0
error: 
File System initialization failed.
Have zero familiarity with this lib so surprised I got this far. Guess more debugging to do.

EDIT:
SDFormatter seems to work no problem on a 128GB card.

Well FreeCluster seems to be working:
Code:
Please edit SdFatConfig.h and set
MAINTAIN_FREE_CLUSTER_COUNT nonzero for
maximum freeClusterCount() performance.

Type any character to start

First call to freeClusterCount scans the FAT.

freeClusterCount() call time: 31425528 micros
freeClusters: 1953455
freeSpace: 128021.625 MB (MB = 1,000,000 bytes)

Create and write to Cluster.test

Second freeClusterCount call is faster if
MAINTAIN_FREE_CLUSTER_COUNT is nonzero.

Ok after reformatting the SD Card I reran the SDInfo sketch and this time it ran fine so it was a matter of formatting:
Code:
init time: 3 ms

Card type: SDXC

Manufacturer ID: 0X95
OEM ID: SU
Product:      
Version: 0.2
Serial number: 0X520AB128
Manufacturing date: 3/2015

cardSize: 128042.66 MB (MB = 1,000,000 bytes)
flashEraseSize: 128 blocks
eraseSingleBlock: true
OCR: 0XC0FF8000

SD Partition Table
part,boot,type,start,length
1,0X0,0XC,8192,250075136
2,0X0,0X0,0,0
3,0X0,0X0,0,0
4,0X0,0X0,0,0

Volume is FAT32
blocksPerCluster: 128
clusterCount: 1953456
freeClusters: 1953454
freeSpace: 128021.56 MB (MB = 1,000,000 bytes)
fatStartBlock: 10436
fatCount: 2
blocksPerFat: 15262
rootDirStart: 2
dataStartBlock: 40960
I will post the lib if you all want to play - I haven't tried SDIO_ext - not sure how that works

EDIT:
Just pushed to WIP repository: https://github.com/mjs513/WIP/tree/master/SdFat

EDIT2: I also tested it on the Audio Shield and it works there as well.
 
Last edited:
Paul - did you see the lines per sec sketch edit to keep the wild T_3.x numbers line based on Serial.availableForWrite()? Is that the best fix for those puzzling numbers when Serial is non-blocking?

Nope, not yet. Spent the time tracking down that receive size bug, and also looking at ways to minimize the receive latency on T4.

A lot has been posted lately. Can you point me to the best message / code I should review that makes the problem occur *more* than all others? I'm interested in learning why it's happening, what's really going on to cause the timing to go so far off the rails. Also really want to know if anyone can reproduce this on Linux? For me, testing stuff with Windows takes about 10X longer....

Until I fully understand what's really happening (which honestly might not happen until long after T4 release), really not so interested in the workarounds. Not going to spend time on figuring out which way is best to avoid the not-yet-understood problem.
 
Nope, not yet. Spent the time tracking down that receive size bug, and also looking at ways to minimize the receive latency on T4.

A lot has been posted lately. Can you point me to the best message / code I should review that makes the problem occur *more* than all others? I'm interested in learning why it's happening, what's really going on to cause the timing to go so far off the rails. Also really want to know if anyone can reproduce this on Linux? For me, testing stuff with Windows takes about 10X longer....

Until I fully understand what's really happening (which honestly might not happen until long after T4 release), really not so interested in the workarounds. Not going to spend time on figuring out which way is best to avoid the not-yet-understood problem.

Paul - this is the post #2665 - this sketch change resolves the problem as I see it.

Answers coded below in LPS_TEST.INO - SerMon's and lps_test.exe agree OverRun Surge is fixed for T_3's.

Oddity Question:: Why does this not affect Linux the same?

> T4 does not yet have a working Serial.availableForWrite(), and it already stops when dis-connected.
> Using this on T_3's to limit the count increment solves the issue:: if ( Serial.availableForWrite() > 15 )

My test code from before to show the millis is still in place and holds at zero. This code updated on github.com/Defragster/T4_demo/... /pjrc_latency_test

Code:
// https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=204681&viewfull=1#post204681
uint32_t count, prior_count;
uint32_t prior_msec;
uint32_t count_per_second;

// Uncomment this for boards where SerialUSB needed for native port
//#define Serial SerialUSB

void setup() {
  Serial.begin(1000000);
  while (!Serial) ;
  count = 10000000;
  prior_count = count;
  count_per_second = 0;
  prior_msec = millis();
}
int blog = 0;
void loop() {
  Serial.print("c#");
  Serial.print(count);
  Serial.print(" b#");
  Serial.print(blog);
  Serial.print(", lines/s=");
  Serial.println(count_per_second);
[B]  #if !defined(__IMXRT1062__)
[U]  if ( Serial.availableForWrite() > 15 )[/U]
  #endif
    count = count + 1;[/B]
  uint32_t msec = millis();
  if (msec - prior_msec > 1000) {
    prior_msec = prior_msec + 1000;
    blog = (msec - prior_msec) / 10;
    count_per_second = count - prior_count;
    prior_count = count;
  }
}

BTW: Since it has to run a second or two now to get counts ... that next param is important

ALSO - the counts reported now are more like 20K lines/sec - even when they were working the free running count++ in loop() was over estimating on T_3's.

And opening a TyComm second instance to second Teensy drops the count on both - as one would expect with the machine bandwidth getting stretched.

Here is a fresh run showing it starting at 0 and then going up to expected and believable value with that sketch edit - including the 'b#0' showing time skew from the 1 sec check at 0:
Code:
T:\T_Downloads\pjrc_latency_test>lps_test.exe COM8 4
port COM8 opened
repeat 80
surge 0 delay
#0 : __>> c#225627184 b#0, lines/s=12870 <<__
#1 : __>> c#225628932 b#0, lines/s=0 <<__
#2 : __>> c#225630687 b#0, lines/s=0 <<__
#3 : __>> c#225632443 b#0, lines/s=0 <<__
#4 : __>> c#225634198 b#0, lines/s=0 <<__
#5 : __>> c#225635954 b#0, lines/s=0 <<__
#6 : __>> c#225637709 b#0, lines/s=0 <<__
#7 : __>> c#225639465 b#0, lines/s=0 <<__
#8 : __>> c#225641467 b#0, lines/s=12546 <<__
#9 : __>> c#225643515 b#0, lines/s=12546 <<__
#10 : __>> c#225645563 b#0, lines/s=12546 <<__
#11 : __>> c#225647611 b#0, lines/s=12546 <<__
#12 : __>> c#225649659 b#0, lines/s=12546 <<__
#13 : __>> c#225651707 b#0, lines/s=12546 <<__
#14 : __>> c#225653755 b#0, lines/s=12546 <<__
#15 : __>> c#225655803 b#0, lines/s=12546 <<__
#16 : __>> c#225657851 b#0, lines/s=12546 <<__
#17 : __>> c#225659899 b#0, lines/s=12546 <<__
#18 : __>> c#225661947 b#0, lines/s=12546 <<__
#19 : __>> c#225663995 b#0, lines/s=12546 <<__
#20 : __>> c#225666043 b#0, lines/s=25668 <<__
#21 : __>> c#225668091 b#0, lines/s=25668 <<__
#22 : __>> c#225670139 b#0, lines/s=25668 <<__
 
Paul - FYI side note - your changes to Cores on T4 took the lps_test from 4k down to 3K lines/sec:

Today:
Code:
#78 : __>> c#10172166 b#0, lines/s=2853 <<__

Yesterday:
Code:
#7 : __>> count=10645383, lines/sec=3957 <<__
 
I still see ~11000 lines/sec with Linux. :)

YAY for you :)

Odd. All debug prints disabled here …

IDE SerMon and TyComm and CmdLine report

give counts about :: 2833, 2856 and 2842 here on Windows 10.

I just thought I'd mention it while what you changed was fresh because it had an effect. For 1062 that sketch change didn't alter anything, so it would seem to be Teensy USB edits.

Odd the EXE is a couple lps UNDER TyComm - would expect it to be faster

Doubled the 64K buffer to 128K for sustained reads and no change, moved the line per buffer print from the inner loop to outside and no change
 
@defragster
Just saw these posts and just wanted to let you know I ran your sketch only and am seeing about 4000 lines/sec. This is prior to your latest change and direct from the sketch. If I run your .exe I am seeing 3999-4000 lps. I never got around to updating to your latest GitHub changes. Maybe something in your new push changed? Again this is on my Win10x64 machine with printf's turned on.
 
@mjs513 ...
odd … I went back to zip (3) and (4) exe's and they are both giving me the 2850 and (2) doesn't have that EXE created yet ...

Maybe the printf's speed it up :)
 
Looks like a size .exe I have and same results against my T4-2

Code:
#3 : __>> c#11714742 b#0, lines/s=2849 <<__

That is against the updated sketch … is that what you see?
 
Did you try running yesterday's code to check that it still gives ~4000 lines/sec on your machine? You know, just to rule out the possibility something may have changed with your computer...

If it sounds as if I'm unconcerned whether the speed is 4000 or 2800 or 11000 or 27000, that's because indeed I really do not care at this point. Testing the performance of utterly unoptimized code may be fun, but it's kind of pointless. It really don't mean much. It's not even really an indication Linux is better than Windows.

As I start optimizing, we're going to see these numbers climb well into the 6 digit range. That's when they'll matter.

What is really important at this early stage is correctness. This is the time to focus on making sure errors aren't happening. Fixing errors, like the receive bug with the latency test, becomes harder as the code becomes more optimized.
 
@Paul - as noted that was just FYI - in case it rang a bell while those changes were fresh in your mind. I did not run the old sketch - IIRC the change was to add one line - with an ifdef !1062 so I assumed there would not be any change. I'll see if I can confirm.

IT WAS the old sketch - it shows the 4,000 - found here Teensy-4-0-First-Beta-Test
> Didn't compare yet - but there was something else …
I added two more 'Serial.print()' and that was it - back to 4K without them



it isn't a worry - I'm sure it will all come together in the end. I only wanted you to know that the T_3's were reporting expected numbers and that you saw my change as a valid reason why - since they don't block on serial they were inordinately high on start and the loop() flow was and somehow kept over incrementing the count into following seconds for some time


@mjs513 - that link I posted for MinGW had some clear steps and that is all I did - that and making sure another reasonable thing or two were checked IIRC.
 
Well that got to the bottom of that!

>> Serial.printf( "count= %d, lines/sec=%d \n", count, count_per_second );

Getting 20K lines/sec in SerMon, TyComm and with the lps_test.exe
#75 : __>> count= 10203664, lines/sec=19978 <<__

Serial.print() is WAY slow at this time - changing to Serial.printf().

Doesn't help T_3.6 @180 or 256 now 19K lps, so T_3.6 almost 1K faster with .print()

Code:
// https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=204681&viewfull=1#post204681
uint32_t count, prior_count;
uint32_t prior_msec;
uint32_t count_per_second;

void setup() {
  Serial.begin(1000000);
  while (!Serial) ;
  count = 10000000;
  prior_count = count;
  count_per_second = 0;
  prior_msec = millis();
}

void loop() {
[B][U]  Serial.printf( "count= %d, lines/sec=%d \n", count, count_per_second );[/U][/B]
#if !defined(__IMXRT1062__)
  if ( Serial.availableForWrite() > 15 )
#endif
    count = count + 1;
  uint32_t msec = millis();
  if (msec - prior_msec > 1000) {
    prior_msec = prior_msec + 1000;
    count_per_second = count - prior_count;
    prior_count = count;
  }
}
 
changing to Serial.printf().

Please don't.

By doing this, you're changing the benchmark from what is was meant to test to something else, more similar to the very old benchmark Manitou ran yesterday (showing 7.7 Mbyte/sec speed with his fast Linux system).

FWIW, with this change I get 59370 lines/sec on Linux. It's not "fixing" anything, just changing to measuring something else which runs faster.

If you do keep running this, please call it something very different. A speed benchmark is meaningful only if we all do it the same way, so the numbers are comparable.
 
@Paul:
INDEED :: On my machine I already left the old code when I made the change for that reason - it is under #ifdef as I just wanted to the diff since adding two lines dropped it 25%. I'll change it to a comment for reference.

Amazing it makes that much difference on Linux! Though that is an increase proportional to what that the PC saw.

Just started writing the INVERSE test - so far showing 18K lines/sec if what I have is working:
> Using PC code to send similar block of text: len = sprintf( buf, "count= %d, lines/sec=%d \n", count, count_per_second );

For now the sketch just counts '\n' chars per second as it is doing : c = Serial.read();
> Then once per second print out Serial4 and seeing 18K


Very crude and no double check yet the data is right as my Teensy doesn't have a display and it just happened ...
 
Paul - just did 7+ 15 sec Restore holds - on computer then on USB battery all failed.

After 15 secs a blip of the red bootloader LED - then release and nothing.

Then I realized the two Teensy Debug Serial Rx/Tx pin devices were powered - same USB Hub.

So the same effect for normal power On button, or plug of powering USB onto T4 halts the CPU startup for bootloader controlled restore.

<edit>: The entry in MSG #6 updated with added link to this post and put a comment about UART power and MCU startup.
 
Last edited:
Status
Not open for further replies.
Back
Top