Teensy 4.0 First Beta Test

Status
Not open for further replies.
@manitou and @defragster

Thanks for the links. Just ran the latency_test code in post https://forum.pjrc.com/threads/7826-USB-to-digital-I-O-delay?p=18682&viewfull=1#post18682 just as a test and curiousity to see it would work on a Windows 10 (64) machine. These are the results with the current USB implementation - have to go look to see what its doing, probably will have to recompile - but first to get VS again:

Code:
port COM23 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 3.12 ms average, 15.65 maximum
latency @ 2 bytes: 3.28 ms average, 15.66 maximum
latency @ 12 bytes: 3.13 ms average, 15.68 maximum
latency @ 30 bytes: 3.28 ms average, 15.66 maximum
latency @ 62 bytes: 3.28 ms average, 15.67 maximum
latency @ 71 bytes: 3.28 ms average, 15.68 maximum
error reading result, r=0

EDIT: just as a sanity check for a T3.5 on Windows 10 x64
Code:
port COM7 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.00 ms average, 0.00 maximum
latency @ 2 bytes: 0.16 ms average, 15.57 maximum
latency @ 12 bytes: 0.16 ms average, 15.62 maximum
latency @ 30 bytes: 0.16 ms average, 15.62 maximum
latency @ 62 bytes: 0.31 ms average, 15.62 maximum
latency @ 71 bytes: 0.47 ms average, 15.62 maximum
latency @ 128 bytes: 0.63 ms average, 15.70 maximum
latency @ 500 bytes: 1.87 ms average, 15.68 maximum
latency @ 1000 bytes: 3.59 ms average, 15.66 maximum
latency @ 2000 bytes: 7.03 ms average, 15.66 maximum
latency @ 4000 bytes: 13.75 ms average, 15.67 maximum
latency @ 8000 bytes: 27.34 ms average, 31.27 maximum
 
Last edited:
You probably need to write to the IOMUXC_GPT2_IPP_IND_CAPIN1_SELECT_INPUT register, since by default EMC41 is routed to GPT2 capture input 1. Details on page 998 of the 1060 ref manual.

Ugh. 1052/1062 architecture is not getting many "style points". :(

Thanks Paul, setting the IOMUXC reg got GPT2 capture working on Teensy 4 beta2 (1062).

Just for the record, one of my GPS breakouts has a 1 us pps pulse width. That width is too small for GPT capture with the GPT timer being clocked by 32khz crystal. You need a width of at least 30 us. My ultimate GPS has a 100 ms pps pulse width so GPT@32khz works fine. And both GPS pps work with GPT running from 24mhz crystal.
 
However, it seems it does not because this crashes:

#define TESTADR (*( uint32_t *)0x20020000)
TESTADR = 7;

I've committed startup code to automatically partition the memory, on 1062. ITCM gets configured with the smallest size needed to accommodate the amount of code you compiled, then all the rest gets used as DTCM.

https://github.com/PaulStoffregen/cores/commit/24620b1a715b356865c48d075adbb70da96b8c0d

1062 always has 512K of RAM on AXI bus at 0x20200000, so I didn't leave any of the TCM-capable RAM accessible on the AXI bus at 0x20280000.

1052 is still 128K-128K-256K. Probably not going to mess with changes for 1052.
 
PaulStoffregen said:
I've committed startup code to automatically partition the memory, on 1062. ITCM gets configured with the smallest size needed to accommodate the amount of code you compiled, then all the rest gets used as DTCM.
That's a nice approach for memory - a couple of sketches I had would hit the DTCM limit on the 1052. Have to go find them.
 
Ugh. 1052/1062 architecture is not getting many "style points". :(

Especially with the pins names, it feels like NXP's engineers played a game of trying to come up with the most error-prone naming convention possible. Can't recall how many times I've mistakenly wrote a B1 instead of B0, so used something like B1_00 when I meant AD_B1_00, or B0_01 when I meant B1_00. It's just crazy. Who names stuff this way?!
 
...and we can detect the needed ITCM at runtime, so it's possible to automatically select the needed ITCM/DTCM sizes in 32KB steps - at runtime.
I'll do that tomorrow.

was about to do a very similar aproach.. even your lines
Code:
    _flexram_bank_config = 0xAAAAAAAA | ((1 << (_itcm_block_count * 2)) - 1);
_itcm_block_count = (SIZEOF(.text) + 0x7FFE) >> 15;
looked similiar, when I planned it today..
 
@manitou and @defragster

Thanks for the links. Just ran the latency_test code in post https://forum.pjrc.com/threads/7826-USB-to-digital-I-O-delay?p=18682&viewfull=1#post18682 just as a test and curiousity to see it would work on a Windows 10 (64) machine. These are the results with the current USB implementation - have to go look to see what its doing, probably will have to recompile - but first to get VS again:

[

Hi... - I thought I would try it out on my Linux box... But so far the latency app is not talking to the T4 beta 2 board... For the heck of it I also tried to have it talk to OpenCM9.04 board, it also did not talk to it...

I installed the Arduino 1.8.9 I installed latest beta Teensyduino I git cloned my version of cores... Probably should check that Paul did not update it overnight. And if I try to run it...

I rebuilt the app (changed makefile for Linux). updated board with sketch file...

When I try running ./latency_test /dev/ttyACM0

Waiting for board to be ready:
.... unable to write, r = -1...

If it I try it running: sudo ./latency_test/dev/ttyACM0

It shows set linux low latency mode
port /dev/ttyACM0 opened
waiting for board to be ready:
...........unable to write, r= -1

Yes udev rules installed...

Ubuntu 18.04.2 LTS.

Will try again later..
 
Hi...
It shows set linux low latency mode
port /dev/ttyACM0 opened
waiting for board to be ready:
...........unable to write, r= -1

Yes udev rules installed...

Ubuntu 18.04.2 LTS.

Will try again later..

Kurt, know this is going to sound funny but did you have a serial monitor open somewhere. Got that message the fist I ran it, I had the serial monitor open at the time.
 
Not getting latency_test to work.

I tried the Transmit Bandwidth Benchmark program serial_read.exe and it works:

Run to run has some variability … it seems

T4-2:
Total bytes read: 73220096
Speed 7322.01 kbytes/sec

// no Send_Now
Total bytes read: 71761920
Speed 7176.19 kbytes/sec

T_3.6 @ 256MHz:
Total bytes read: 13926400
Speed 1392.64 kbytes/sec

// no Send_Now
Total bytes read: 28639232
Speed 2863.92 kbytes/sec

sketch:
char oBuf[65];
void setup() {
Serial.begin(115200);
Serial.flush();
for ( int ii = 0; ii < 65; ii++ )
oBuf[ii] = 32 + ii;
}
void loop() {
Serial.write(oBuf, 64);
// Serial.send_now();
}
 
Code:
port COM23 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 3.12 ms average, 15.65 maximum
latency @ 2 bytes: 3.28 ms average, 15.66 maximum
latency @ 12 bytes: 3.13 ms average, 15.68 maximum
latency @ 30 bytes: 3.28 ms average, 15.66 maximum
latency @ 62 bytes: 3.28 ms average, 15.67 maximum
latency @ 71 bytes: 3.28 ms average, 15.68 maximum
error reading result, r=0

Most of this latency is probably due to the printf() stuff in rx_event() in usb_serial.c.

The debugging printf() are designed to use polling, so they complete putting all the bytes into the serial hardware before returning. That's slow, but very helpful when debugging the code which might crash soon after the printing.
 
If you want to see the FlexRAM Mapping:
Code:
PROGMEM
void flexRamInfo(void)
{
#if defined(__IMXRT1052__) || defined(__IMXRT1062__)
  int itcm = 0;
  int dtcm = 0;
  int ocram = 0;
  Serial.print("FlexRAM-Banks: [");
  for (int i = 15; i >= 0; i--) {
    switch ((IOMUXC_GPR_GPR17 >> (i * 2)) & 0b11) {
      case 0b00: Serial.print("."); break;
      case 0b01: Serial.print("O"); ocram++; break;
      case 0b10: Serial.print("D"); dtcm++; break;
      case 0b11: Serial.print("I"); itcm++; break;
    }
  }
  Serial.print("] ITCM: ");
  Serial.print(itcm * 32);
  Serial.print(" KB, DTCM: ");
  Serial.print(dtcm * 32);
  Serial.print(" KB, OCRAM: ");
  Serial.print(ocram * 32);
#if defined(__IMXRT1062__)
  Serial.print("(+512)");
#endif
  Serial.println(" KB");
  extern unsigned long _stext;
  extern unsigned long _etext;
  extern unsigned long _sdata;
  extern unsigned long _ebss;
  extern unsigned long _flashimagelen;
  extern unsigned long _heap_start;

  Serial.print("MEM (static usage): ITCM:");
  Serial.print((unsigned)&_etext - (unsigned)&_stext);
  Serial.print(", DTCM:");
  Serial.print((unsigned)&_ebss - (unsigned)&_sdata);
  Serial.print("(");
  Serial.print(dtcm * 32768 - ((unsigned)&_ebss - (unsigned)&_sdata));
  Serial.print(" Bytes free)");
  Serial.print(", OCRAM:");
  Serial.print((unsigned)&_heap_start - 0x20200000);
  Serial.print(", Flash:");
  Serial.println((unsigned)&_flashimagelen);
  Serial.println();
#endif
}
}
Output looks like this:
Code:
FlexRAM-Banks: [DDDDDDDDDDDDDDDI] ITCM: 32 KB, DTCM: 480 KB, OCRAM: 0(+512) KB
MEM (static usage): ITCM:22944, DTCM:17088(474432 Bytes free), OCRAM:0, Flash:35008
 
Last edited:
Most of this latency is probably due to the printf() stuff in rx_event() in usb_serial.c.

The debugging printf() are designed to use polling, so they complete putting all the bytes into the serial hardware before returning. That's slow, but very helpful when debugging the code which might crash soon after the printing.

Thanks Paul. Forgot about debug on Serial4. I thought I read somewhere in these pages there was a way to disable it. Don't want to get rid of it since it has come in real handy at times.
 
Kurt, know this is going to sound funny but did you have a serial monitor open somewhere. Got that message the fist I ran it, I had the serial monitor open at the time.

I didn't, and tried a few different things and then rebooted. I was trying out the OpenCM9.04 board, to see if it output anything so tried with monitor window and typed x and then got it to respond, so new inputs were working...
Then tried Latency test on it:
(STM32F103CB (ARM Cortex-M3) - http://emanual.robotis.com/docs/en/parts/controller/opencm904/
Code:
latency_test  latency_test.c  latency_test.exe  latency_test.pde  Makefile
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 3.00 ms average, 3.06 maximum
latency @ 2 bytes: 3.00 ms average, 3.04 maximum
latency @ 12 bytes: 3.00 ms average, 3.06 maximum
latency @ 30 bytes: 3.00 ms average, 3.10 maximum
latency @ 62 bytes: 3.00 ms average, 3.13 maximum
latency @ 71 bytes: 3.00 ms average, 3.10 maximum
latency @ 128 bytes: 3.00 ms average, 3.06 maximum
latency @ 500 bytes: 6.00 ms average, 6.05 maximum
latency @ 1000 bytes: 6.00 ms average, 6.07 maximum
latency @ 2000 bytes: 6.00 ms average, 6.07 maximum
latency @ 4000 bytes: 12.00 ms average, 12.09 maximum
latency @ 8000 bytes: 21.00 ms average, 21.09 maximum
kurt@kurt-XPS-8300:~/Downloads/latency_test$

Then tried T3.6 and again it did not work until I first tried talking to it with the Serial monitor and closing...
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.............^C
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 5.00 ms average, 5.07 maximum
latency @ 2 bytes: 5.00 ms average, 5.06 maximum
latency @ 12 bytes: 5.00 ms average, 5.07 maximum
latency @ 30 bytes: 5.00 ms average, 5.06 maximum
latency @ 62 bytes: 5.00 ms average, 5.06 maximum
latency @ 71 bytes: 5.00 ms average, 5.09 maximum
latency @ 128 bytes: 5.00 ms average, 5.07 maximum
latency @ 500 bytes: 6.00 ms average, 6.08 maximum
latency @ 1000 bytes: 6.36 ms average, 7.07 maximum
latency @ 2000 bytes: 8.99 ms average, 9.06 maximum
latency @ 4000 bytes: 12.00 ms average, 12.06 maximum
latency @ 8000 bytes: 20.00 ms average, 20.07 maximum
kurt@kurt-XPS-8300:~/Downloads/latency_test$
Now for T4 Beta 2, again tried terminal monitor first, then ran the app. It went a ways and then failed.
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.............^C
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 5.00 ms average, 5.07 maximum
latency @ 2 bytes: 5.00 ms average, 5.06 maximum
latency @ 12 bytes: 5.00 ms average, 5.07 maximum
latency @ 30 bytes: 5.00 ms average, 5.06 maximum
latency @ 62 bytes: 5.00 ms average, 5.06 maximum
latency @ 71 bytes: 5.00 ms average, 5.09 maximum
latency @ 128 bytes: 5.00 ms average, 5.07 maximum
latency @ 500 bytes: 6.00 ms average, 6.08 maximum
latency @ 1000 bytes: 6.36 ms average, 7.07 maximum
latency @ 2000 bytes: 8.99 ms average, 9.06 maximum
latency @ 4000 bytes: 12.00 ms average, 12.06 maximum
latency @ 8000 bytes: 20.00 ms average, 20.07 maximum
kurt@kurt-XPS-8300:~/Downloads/latency_test$
Tried it again and it died same way...

Edit: Thought I would try T3.2:
Code:
latency @ 1 bytes: 5.00 ms average, 5.13 maximum
latency @ 2 bytes: 5.00 ms average, 5.12 maximum
latency @ 12 bytes: 5.00 ms average, 5.11 maximum
latency @ 30 bytes: 5.00 ms average, 5.08 maximum
latency @ 62 bytes: 5.00 ms average, 5.11 maximum
latency @ 71 bytes: 5.00 ms average, 5.10 maximum
latency @ 128 bytes: 5.00 ms average, 5.09 maximum
latency @ 500 bytes: 6.00 ms average, 6.07 maximum
latency @ 1000 bytes: 8.00 ms average, 8.11 maximum
latency @ 2000 bytes: 11.00 ms average, 11.09 maximum
latency @ 4000 bytes: 17.00 ms average, 17.09 maximum
latency @ 8000 bytes: 29.00 ms average, 29.09 maximum
Since I had done a few, different boards, thought I would try the more expensive Robotis OpenCR 1.0 board:
STM32F746ZGT6 / 32-bit ARM Cortex®-M7 with FPU (216MHz, 462DMIPS) - http://emanual.robotis.com/docs/en/parts/controller/opencr10/
Code:
urt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.00 ms average, 1.06 maximum
latency @ 2 bytes: 1.00 ms average, 1.09 maximum
latency @ 12 bytes: 1.00 ms average, 1.05 maximum
latency @ 30 bytes: 1.00 ms average, 1.04 maximum
latency @ 62 bytes: 1.00 ms average, 1.04 maximum
latency @ 71 bytes: 1.00 ms average, 1.03 maximum
latency @ 128 bytes: 1.00 ms average, 1.01 maximum
latency @ 500 bytes: 2.00 ms average, 2.07 maximum
latency @ 1000 bytes: 3.00 ms average, 3.06 maximum
latency @ 2000 bytes: 5.00 ms average, 5.08 maximum
latency @ 4000 bytes: 8.73 ms average, 9.07 maximum
latency @ 8000 bytes: 16.00 ms average, 16.06 maximum
 
Last edited:
Oops on the above, I forgot I removed the call to send_now...
So with that T3.6 is:
Code:
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.10 ms average, 0.23 maximum
latency @ 2 bytes: 0.10 ms average, 0.20 maximum
latency @ 12 bytes: 0.11 ms average, 0.16 maximum
latency @ 30 bytes: 0.13 ms average, 0.20 maximum
latency @ 62 bytes: 0.20 ms average, 0.26 maximum
latency @ 71 bytes: 0.25 ms average, 0.30 maximum
latency @ 128 bytes: 0.37 ms average, 0.42 maximum
latency @ 500 bytes: 1.13 ms average, 1.25 maximum
latency @ 1000 bytes: 2.13 ms average, 2.27 maximum
latency @ 2000 bytes: 4.08 ms average, 4.18 maximum
latency @ 4000 bytes: 7.99 ms average, 8.12 maximum
latency @ 8000 bytes: 15.77 ms average, 15.90 maximum
kurt@kurt-XPS-8300:~/Downloads/latency_test$
Which is a tad different!
Not much different than just calling flush:
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.11 ms average, 0.19 maximum
latency @ 2 bytes: 0.11 ms average, 0.18 maximum
latency @ 12 bytes: 0.11 ms average, 0.17 maximum
latency @ 30 bytes: 0.14 ms average, 0.20 maximum
latency @ 62 bytes: 0.20 ms average, 0.24 maximum
latency @ 71 bytes: 0.25 ms average, 0.29 maximum
latency @ 128 bytes: 0.35 ms average, 0.45 maximum
latency @ 500 bytes: 1.13 ms average, 1.24 maximum
latency @ 1000 bytes: 2.10 ms average, 2.30 maximum
latency @ 2000 bytes: 4.06 ms average, 4.19 maximum
latency @ 4000 bytes: 8.00 ms average, 8.14 maximum
latency @ 8000 bytes: 15.77 ms average, 15.90 maximum
kurt@kurt-XPS-8300:~/Downloads/latency_test$
And again OpenCM board with flush:
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.00 ms average, 1.02 maximum
latency @ 2 bytes: 1.00 ms average, 1.05 maximum
latency @ 12 bytes: 1.00 ms average, 1.05 maximum
latency @ 30 bytes: 1.00 ms average, 1.06 maximum
latency @ 62 bytes: 1.00 ms average, 1.08 maximum
latency @ 71 bytes: 1.00 ms average, 1.05 maximum
latency @ 128 bytes: 2.00 ms average, 2.08 maximum
latency @ 500 bytes: 5.00 ms average, 5.08 maximum
latency @ 1000 bytes: 5.00 ms average, 5.08 maximum
latency @ 2000 bytes: 5.01 ms average, 5.95 maximum
latency @ 4000 bytes: 10.00 ms average, 10.12 maximum
latency @ 8000 bytes: 19.04 ms average, 20.00 maximum
kurt@kurt-XPS-8300:~/Downloads/latency_test$
 
@KurtE

Ok you inspired me to breakout my Nucleo-144 H743Z1 board which is again a 32bit, M7 that can get up to 480Mhz. I am using the STM32duino core so I can use the Arduino IDE:

Code:
latency_test COM25
port COM25 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.94 ms average, 15.64 maximum
latency @ 2 bytes: 1.09 ms average, 15.70 maximum
latency @ 12 bytes: 1.88 ms average, 15.67 maximum
latency @ 30 bytes: 4.06 ms average, 15.65 maximum
latency @ 62 bytes: 6.09 ms average, 15.65 maximum
latency @ 71 bytes: 7.03 ms average, 15.67 maximum
latency @ 128 bytes: 11.87 ms average, 15.66 maximum
latency @ 500 bytes: 43.90 ms average, 46.90 maximum
latency @ 1000 bytes: 87.95 ms average, 93.77 maximum
latency @ 2000 bytes: 174.33 ms average, 187.47 maximum
latency @ 4000 bytes: 347.98 ms average, 361.25 maximum
latency @ 8000 bytes: 694.92 ms average, 703.11 maximum
No idea why so much different unless I have some setting wrong - never used it before so just a shot in the dark.
 
@mjs513 - Did you try adding in something like Serial.flush() ?

For what it is worth, on the OpenCM board, it used to not implement it, which I found painful, so I put in a hack (which they took), that when you called flush(), it zeroed out the count down counter to do the timeout and force the write, so in the case I saw that most of the smaller ones were 1ms made told sense...

I tried running the stuff again on T4, I commented out the printf that Paul mentioned and ran it again:
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.34 ms average, 1.41 maximum
latency @ 2 bytes: 1.31 ms average, 1.41 maximum
latency @ 12 bytes: 1.34 ms average, 1.63 maximum
latency @ 30 bytes: 1.33 ms average, 1.42 maximum
latency @ 62 bytes: 1.34 ms average, 1.46 maximum
error reading result, r=0
kurt@kurt-XPS-8300:~/Downloads/latency_test$
I also tried running it with sudo, which did not make much difference.
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ sudo ./latency_test /dev/ttyACM0
set linux low latency mode
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.32 ms average, 1.41 maximum
latency @ 2 bytes: 1.31 ms average, 1.40 maximum
latency @ 12 bytes: 1.31 ms average, 1.40 maximum
latency @ 30 bytes: 1.32 ms average, 1.43 maximum
latency @ 62 bytes: 1.32 ms average, 1.43 maximum
error reading result, r=0
Still dies once packets are > 62 bytes... Not sure what packet size is... Can probably guess 64?
 
@KurtE

Yeah - tried it with and without - only minor change in the decimal place. Haven't really looked at their core for Serial but it was on my todo list to see if it would work. So at least I could cross one out.

As for the T4, glad the numbers latency improved. What concerns me is that it dies at 62bytes for you and I got to 71bytes. I should give it a try. Ok I commented it out as well just now and got almost the same numbers:
Code:
d:\Users\Merli\Documents>latency_test COM23
port COM23 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.25 ms average, 15.66 maximum
latency @ 2 bytes: 1.41 ms average, 15.63 maximum
latency @ 12 bytes: 1.41 ms average, 15.69 maximum
latency @ 30 bytes: 1.41 ms average, 15.67 maximum
latency @ 62 bytes: 1.41 ms average, 15.66 maximum
latency @ 71 bytes: 1.41 ms average, 15.67 maximum
error reading result, r=0
but I am on Windows10x64 so that may add a little overhead
 
Now I hooked up my FTDI cable to Serial4 at 115200 and then reprogrammed the T4 and found there was still getting Serial outputs ("run_callbacks"), so I commented out that line as well in usb.c

Ran the test again:
Code:
kurt@kurt-XPS-8300:~/Downloads/latency_test$ ./latency_test /dev/ttyACM0
port /dev/ttyACM0 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.00 ms average, 1.08 maximum
latency @ 2 bytes: 1.00 ms average, 1.06 maximum
latency @ 12 bytes: 1.00 ms average, 1.07 maximum
latency @ 30 bytes: 1.00 ms average, 1.06 maximum
latency @ 62 bytes: 1.00 ms average, 1.06 maximum
Will update in second from windows achine with output shown for Serial4... Nothing interesting.
Update:
Code:
***********IMXRT Startup**********
test 1 -1234567 3
CCM_ANALOG_PLL_USB1=80003000
  enable USB clocks
CCM_ANALOG_PLL_USB1=80003040
Increasing voltage to 1250 mV
need to switch to alternate clock during reconfigure of ARM PLL
USB PLL is running, so we can use 120 MHz
Freq: 12 MHz * 100 / 2 / 1
ARM PLL=80002042
ARM PLL needs reconfigure
ARM PLL=80002064
New Frequency: ARM=600000000, IPG=150000000
USB reset took 3 loops
analogInit
usb_serial_reset
before C++ constructors
after C++ constructors
before setup
usb_serial_reset
usb_serial_configure
usb_cdc_line_coding, baud=0
after setup
usb_cdc_line_coding, baud=9600

And for what it is worth, I then tried the latency_test.exe on my main Windows 10 machine and I commented out the two printf statements like I did on the Linux box...
Code:
C:\Users\kurte\Desktop\latency_test>latency_test.exe COM7
port COM7 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.00 ms average, 1.02 maximum
latency @ 2 bytes: 1.00 ms average, 2.00 maximum
latency @ 12 bytes: 1.00 ms average, 1.01 maximum
latency @ 30 bytes: 1.00 ms average, 1.01 maximum
latency @ 62 bytes: 1.00 ms average, 1.01 maximum
latency @ 71 bytes: 1.00 ms average, 2.02 maximum
error reading result, r=0
So again look like timing out at 1ms timer... and as @mjs513 it shows result for 71 bytes before it dies, versus 62 bytes on my Linux machine.
 
Last edited:
I end up with this - see edits below : VERY ODD IT DIES THERE !!!!:
Code:
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM25
port COM25 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 0.85 ms average, 16.00 maximum
latency @ 2 bytes: 1.00 ms average, 16.00 maximum
latency @ 12 bytes: 1.01 ms average, 16.00 maximum
latency @ 30 bytes: 1.00 ms average, 16.00 maximum
latency @ 62 bytes: 1.00 ms average, 16.00 maximum
latency @ 71 bytes: 1.00 ms average, 16.00 maximum
[COLOR="#FF0000"]error reading result, r=0
error reading result, r=0
latency @ 128 bytes: 20.16 ms average, 1000.00 maximum[/COLOR]
latency @ 500 bytes: 1.00 ms average, 16.00 maximum
latency @ 1000 bytes: 2.01 ms average, 16.00 maximum
latency @ 2000 bytes: 4.02 ms average, 16.00 maximum
latency @ 4000 bytes: 8.02 ms average, 22.00 maximum
latency @ 8000 bytes: 15.66 ms average, 31.00 maximum

I hacked out the 'die' on error at end of 'do_test()' - once it completed without error - this prints and continues:
Code:
	gettimeofday_(&end, NULL);
	elapsed = (double)(end.tv_sec - begin.tv_sec) * 1000.0;
	if (r < 1) printf("error reading result, r=%d\n", r); // NOT DIE
	else
	{
  	  if (memcmp(buf, "012x", 4) != 0) die("error: incorrect response\n");
	  elapsed += (double)(end.tv_usec - begin.tv_usec) / 1000.0;
	  //printf("  len=%d, elased: %.2f ms\n", len, elapsed);
	}
	return elapsed;
}


installed MinGW - got yelled at for gettimeofday() so swapped it with this stackoverflow version as '_'()::
Code:
static const unsigned __int64 epoch = ((unsigned __int64) 116444736000000000ULL);
gettimeofday_(struct timeval * tp, struct timezone * tzp)
{
    FILETIME    file_time;
    SYSTEMTIME  system_time;
    ULARGE_INTEGER ularge;

    GetSystemTime(&system_time);
    SystemTimeToFileTime(&system_time, &file_time);
    ularge.LowPart = file_time.dwLowDateTime;
    ularge.HighPart = file_time.dwHighDateTime;

    tp->tv_sec = (long) ((ularge.QuadPart - epoch) / 10000000L);
    tp->tv_usec = (long) (system_time.wMilliseconds * 1000);

    return 0;
}

Added this to undefined PRINT_DEBUG_STUFF:: #define printf_debug(...)
And put that in the two places the noted spew still came from.
 
Last edited:
In case it is a useful edit - I made it try more values - and showed length where it would die - here on 64:

Here is the list tried: int lengths[] = { 1,2,12,16,30,31,63,64,65,71,126,127,128,129,500,512,640,1000,1278,1279,1280,1281,2000,2047,2048,2049,4000,4095,4096,4097,8000 };

Moving 64 up in the list makes it FAIL there - then it is fine?

Updated "latency_test.c" to show if a BAD return string is received. AND modified SKETCH {below} to send a BAD reply when char before 'x' is not '0' as sent. It does not trigger, and does not seem to add to reported latency timing. Next step would be to send and test for characters other than '0' in succession.



Here is partial output - it repeats this with no errors:
Code:
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM25
port COM25 opened
waiting for board to be ready:
.ok
latency @ 1 bytes: 1.00 ms average, 16.00 maximum
latency @ 2 bytes: 1.00 ms average, 16.00 maximum
latency @ 12 bytes: 1.00 ms average, 16.00 maximum
latency @ 16 bytes: 0.85 ms average, 16.00 maximum
latency @ 30 bytes: 0.84 ms average, 16.00 maximum
latency @ 31 bytes: 0.85 ms average, 16.00 maximum
latency @ 61 bytes: 0.85 ms average, 16.00 maximum
latency @ 62 bytes: 1.00 ms average, 16.00 maximum
latency @ 63 bytes: 1.00 ms average, 16.00 maximum
error reading len=64 result, r=0
error reading len=64 result, r=0
error reading len=64 result, r=0
error reading len=64 result, r=0
error reading len=64 result, r=0
error reading len=64 result, r=0
error reading len=64 result, r=0
latency @ 64 bytes: 70.15 ms average, 1000.00 maximum
latency @ 65 bytes: 1.01 ms average, 16.00 maximum
latency @ 71 bytes: 1.00 ms average, 16.00 maximum
latency @ 126 bytes: 1.00 ms average, 16.00 maximum
latency @ 127 bytes: 0.84 ms average, 16.00 maximum
latency @ 128 bytes: 0.15 ms average, 15.00 maximum
latency @ 129 bytes: 0.85 ms average, 16.00 maximum
latency @ 500 bytes: 0.99 ms average, 16.00 maximum
latency @ 512 bytes: 1.01 ms average, 16.00 maximum
latency @ 640 bytes: 1.38 ms average, 16.00 maximum
latency @ 1000 bytes: 2.00 ms average, 16.00 maximum
latency @ 1278 bytes: 3.01 ms average, 16.00 maximum
latency @ 1279 bytes: 3.00 ms average, 16.00 maximum
latency @ 1280 bytes: 2.47 ms average, 16.00 maximum
latency @ 1281 bytes: 3.00 ms average, 16.00 maximum
latency @ 2000 bytes: 4.00 ms average, 16.00 maximum
latency @ 2047 bytes: 4.01 ms average, 16.00 maximum
latency @ 2048 bytes: 4.00 ms average, 16.00 maximum
latency @ 2049 bytes: 5.01 ms average, 16.00 maximum
latency @ 4000 bytes: 7.87 ms average, 16.00 maximum
latency @ 4095 bytes: 8.03 ms average, 23.00 maximum
latency @ 4096 bytes: 8.02 ms average, 23.00 maximum
latency @ 4097 bytes: 9.02 ms average, 22.00 maximum
latency @ 8000 bytes: 15.64 ms average, 31.00 maximum
 ---
latency @ 8000 bytes: 15.50 ms average, 22.00 maximum
latency @ 4097 bytes: 9.01 ms average, 22.00 maximum
latency @ 4096 bytes: 8.02 ms average, 23.00 maximum
latency @ 4095 bytes: 8.04 ms average, 16.00 maximum
latency @ 4000 bytes: 8.02 ms average, 22.00 maximum
latency @ 2049 bytes: 5.01 ms average, 16.00 maximum
latency @ 2048 bytes: 4.01 ms average, 16.00 maximum
latency @ 2047 bytes: 4.01 ms average, 16.00 maximum
latency @ 2000 bytes: 4.00 ms average, 16.00 maximum
latency @ 1281 bytes: 2.87 ms average, 16.00 maximum
latency @ 1280 bytes: 2.63 ms average, 16.00 maximum
latency @ 1279 bytes: 3.00 ms average, 16.00 maximum
latency @ 1278 bytes: 3.00 ms average, 16.00 maximum
latency @ 1000 bytes: 2.00 ms average, 16.00 maximum
latency @ 640 bytes: 1.22 ms average, 16.00 maximum
latency @ 512 bytes: 1.00 ms average, 16.00 maximum
latency @ 500 bytes: 1.00 ms average, 16.00 maximum
latency @ 129 bytes: 1.00 ms average, 16.00 maximum
latency @ 128 bytes: 0.16 ms average, 16.00 maximum
latency @ 127 bytes: 1.00 ms average, 16.00 maximum
latency @ 126 bytes: 1.01 ms average, 16.00 maximum
latency @ 71 bytes: 1.00 ms average, 16.00 maximum
latency @ 65 bytes: 1.00 ms average, 16.00 maximum
latency @ 64 bytes: 0.31 ms average, 16.00 maximum
latency @ 63 bytes: 1.01 ms average, 16.00 maximum
latency @ 62 bytes: 1.00 ms average, 16.00 maximum
latency @ 61 bytes: 1.00 ms average, 16.00 maximum
latency @ 31 bytes: 1.00 ms average, 16.00 maximum
latency @ 30 bytes: 1.01 ms average, 16.00 maximum
latency @ 16 bytes: 1.00 ms average, 16.00 maximum
latency @ 12 bytes: 0.84 ms average, 16.00 maximum
latency @ 2 bytes: 1.00 ms average, 16.00 maximum
latency @ 1 bytes: 1.01 ms average, 16.00 maximum
 ---

Sketch:
Code:
void setup() {
  Serial.begin(115200);
  Serial.flush();
}

void loop() {
  while (Serial.available()) {
    byte c = Serial.read();
    if (c == 'x') {      // 'x' is end of input message
      Serial.write('0');
      Serial.write('1');
      Serial.write('2');
      Serial.write('x');
      Serial.send_now(); // comment out for non-Teensy boards
    }
    else if (c != '0') {      // 'x' is end of input message
      Serial.write('B');
      Serial.write('A');
      Serial.write('D');
      Serial.write( c );
      Serial.send_now(); // comment out for non-Teensy boards
    }
  }
}
 

Attachments

  • latency_test.c
    9.4 KB · Views: 79
Last edited:
Sorry - not sorry - CMDLINE is too much like fun … I went to the trouble of getting minGW and is fast, I should add it to SublimeText.

Paul: There seems to be an issue the first time a Full 64 bytes is used.

I was wondering about how many times a MAX is hit higher ( >= 2nd max ) and what the 2nd highest Max is.
ZERO 2nd max means the max shown is the norm Max for what ever count there is:
Code:
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM25
port COM25 opened
waiting for board to be ready:
.ok
latency @    1 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @    2 bytes: 1.00 ms average,   3 max hits,    0.00 2nd max,   16.00 maximum
[COLOR="#FF0000"]error reading len=64 result, r=0
error reading len=64 result, r=0 // plus 5 repeats
[/COLOR]latency @   64 bytes: 70.15 ms average,          7 max hits,    0.00 2nd max,   1000.00 maximum
latency @   12 bytes: 1.00 ms average,   3 max hits,    0.00 2nd max,   16.00 maximum
latency @   16 bytes: 1.00 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   30 bytes: 1.01 ms average,   5 max hits,    0.00 2nd max,   16.00 maximum
latency @   31 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @   63 bytes: 1.00 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   64 bytes: 0.16 ms average,   1 max hits,    0.00 2nd max,   16.00 maximum
latency @   65 bytes: 1.00 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   71 bytes: 1.01 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @  126 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum

Rather than upload here is the change for this:
Code:
void do_test_100_times(PORTTYPE port, int len)
{
	const int num = 100;
	int maxn=0;
	double ms, total=0, max2=9876543210, max=0;
	int i;

	for (i=0; i<num; i++) {
		ms = do_test(port, len);
		total += ms;
		if (ms > max) {
			if (ms != max2) max2 = max;
			max = ms;
			maxn++;
		}
		else if (ms == max && max >= max2) maxn++;
	}
	printf("latency @ %4d bytes: ", len);
	printf("%4.2f ms average, ", total / num);
	printf("\t%2d max hits, ", maxn);
	printf("\t%4.2f 2nd max, ", max2);
	printf("\t%4.2f maximum\n", max);
}
 
Code:
Updated the Code/Sketch: [ATTACH]16518._xfImport[/ATTACH]

Here are bottom line results - same exe and sketch - T_3.6 {faster and NO errors} .vs. T4-2 {no debug output} .vs. T_3.1 {WINNER}:
Code:
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM8 1 1 [B][U]T_3.6 @240[/U][/B]
port COM8 opened
[B] DOWN --- pass #1        elapsed time 5.635 secs for 4106700 bytes
[/B]
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM25 1 1 [B][U]T4-2 @600[/U][/B]
port COM25 opened
[B] DOWN --- pass #1        elapsed time 9.488 secs for 4106700 bytes

T:\T_Downloads\pjrc_latency_test>latency_test.exe COM19 1 1 [B][U]T3.1 @96[/U][/B]
port COM19 opened
 DOWN --- pass #1        elapsed time 5.344 secs for 4106700 bytes
[/B]

Updates include:
>> Rather than write only '0' it writes 'A-Z,...' for test length w/exe's [prefilled buffer] and like sketch resets on 'x' and indexes per character and validates each from 'A-Z' on the same schedule.
>> Added optional param 2 :: # > single digit 1-9 tells it how many times to loop through the list Up then Down
>> Added optional param 3 :: # > single digit 1-9 tells it what part of the list to run. That is length of the array of test sizes is divided by the entered number
---- For example this says repeat twice and do one third of the list (which includes 64): >latency_test.exe COM25 2 3
>> Added @# where '@0 error reading len=64 ' shows it failing on first seven test iterations - then works fine for the last 93.
>> Shows overall time for each pass Up and Down: DOWN --- pass #1 elapsed time 0.918 secs for 35500 bytes
>> Just added bytes per test loop
>> ZIP above has source for .c and sketch as they work, also Win EXE and gcc BLD.bat
>> Bonus Linux folder to see if gcc on my Windows 'WSL' machine works [ ubuntu 18.04 ] - I can't test as I can't access COM's [feedback please!]
]] exe/Sketch can be upgraded past 8000 bytes
]] Sketch will need update to test with read(buf, cnt)
]] Param 3 could also specify which 'list' to use - for long/short/different test

Code:
T:\T_Downloads\pjrc_latency_test>latency_test.exe COM25 2 3
port COM25 opened
waiting for board to be ready:
.ok
latency @    1 bytes: 1.00 ms average,   5 max hits,    15.00 2nd max,  16.00 maximum
latency @    2 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @   12 bytes: 1.00 ms average,   3 max hits,    0.00 2nd max,   16.00 maximum
latency @   16 bytes: 1.00 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   30 bytes: 1.01 ms average,   5 max hits,    0.00 2nd max,   16.00 maximum
latency @   31 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @   63 bytes: 1.00 ms average,   3 max hits,    0.00 2nd max,   16.00 maximum
[B]@0 error reading len=64 result, r=0
@1 error reading len=64 result, r=0
@2 error reading len=64 result, r=0
@3 error reading len=64 result, r=0
@4 error reading len=64 result, r=0
@5 error reading len=64 result, r=0
@6 error reading len=64 result, r=0
latency @   64 bytes: 70.15 ms average,          7 max hits,    0.00 2nd max,   1000.00 maximum[/B]
latency @   65 bytes: 1.01 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   71 bytes: 1.00 ms average,   5 max hits,    15.00 2nd max,  16.00 maximum
 UP ----- pass #1        elapsed time 7.927 secs for 35500 bytes
latency @   71 bytes: 1.01 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   65 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @   64 bytes: 0.15 ms average,   1 max hits,    0.00 2nd max,   15.00 maximum
latency @   63 bytes: 1.01 ms average,   5 max hits,    0.00 2nd max,   16.00 maximum
latency @   31 bytes: 1.00 ms average,   5 max hits,    15.00 2nd max,  16.00 maximum
latency @   30 bytes: 1.00 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @   16 bytes: 1.01 ms average,   4 max hits,    0.00 2nd max,   16.00 maximum
latency @   12 bytes: 0.99 ms average,   4 max hits,    15.00 2nd max,  16.00 maximum
latency @    2 bytes: 1.01 ms average,   5 max hits,    0.00 2nd max,   16.00 maximum
latency @    1 bytes: 1.00 ms average,   5 max hits,    15.00 2nd max,  16.00 maximum
 DOWN --- pass #1        elapsed time 0.918 secs for 35500 bytes
 
Last edited:
@KurtE and @defragster

Figured it was time to put all this data into perspective, so for a Windows10 assuming we all have x64's:

… Graph Deleted ….

Other than the obvious that the T4 latency lies between the T3.6 and the T3.5 at this point in USB development there are a couple of points.

As we all pointed out there is an issue at sending 64bytes and it dies at 71bytes for an error.

The second is there appears to be discontinuities in the data Tim presented for the T4 at 2000/4000 for example. Don't know why - but we don't have anything to compare it too yet.

Also, low byte transfers T4 a bit worse than T3.5/3.6

EDIT: for those that prefer log graphs:

… Graph Deleted ….

Graphs for personal use only see next post
 
Last edited:
Status
Not open for further replies.
Back
Top