Teensyduino 1.56 Beta #1

Status
Not open for further replies.
I tried running the test here. I programmed a Teensy 4.1 with this SerialTestT.ino, first using the core library files currently staged for 1.56-beta2 (same as on my github account today).

Then I compiled and ran this serialtestT.c. Here is the result I see.

Capture1.PNG

Next I replaced usb_serial.c with Frank's usb_serial.c having extra memory barriers.

When I run it again, I see this.

Capture2.PNG
 
I'm struggling to understand why 64 bytes was chosen as the string length. That's an exact integer divide of the 512 byte packet size and 2048 transaction size.
 
Ok, I've run it several dozen times now. Often it starts with many lines counts less than 100000, but then settles to working smoothly. I tried removing some of the memory barriers or even all of them. Even without any memory barriers in usb_serial.c, it takes longer to settle to a working state, but once it does I see Lines-Delta 100000 printed. Here's a screenshot from a test without any memory barriers, where it printed a *lot* of junk, then settled to running normally.

capture.png
 
Here is the test program ported to Linux.


Code:
// Code taken from
// https://www.pololu.com/docs/0J73/15.6
// and modified for my needs.

// modified for Linux

// gcc serialtestT_linux.c -O3 -Wall -s -o serialtest

// Frank B, 10/2021

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

// Opens the specified serial port, configures its timeouts, and sets its
// baud rate.  Returns a handle on success, or INVALID_HANDLE_VALUE on failure.
int open_serial_port(const char * device, uint32_t baud_rate)
{

  int fd = open(device, O_RDWR);
  if (fd < 0) {
    printf("could not open %s\n", device);
    exit(1);
  }
  return fd;
}

// Writes bytes to the serial port, returning 0 on success and -1 on failure.
int write_port(int port, uint8_t * buffer, size_t size)
{ 
  int r = write(port, buffer, size);
  return r;
}

// Reads bytes from the serial port.
// Returns after all the desired bytes have been read, or if there is a
// timeout or other error.
// Returns the number of bytes successfully read into the buffer, or -1 if
// there was an error reading.
int read_port(int port, char * buffer, size_t size)
{
  int n = read(port, buffer, size);
  if (n < 0) {
    printf("Failed to read from port");
    exit(1);
  }
  return n;
}
int main(int argc, char **argv)
{
  const char *device;

  if ( argc != 2 ) {
      fprintf(stderr, "Usage: %s COMx\n", argv[0]);
      return 1;
  }
  device = argv[1];

  printf("Port: %s\n", device);

  uint32_t baud_rate = 9600;

  int port = open_serial_port(device, baud_rate);

  printf("After start, you'll probably see some garbage. Things will normalize after a few seconds.\n");
  printf("Waiting...");

  unsigned int lines = 0;
  unsigned int pos = 0;
  unsigned int lcnt = 0;

  int r;

  char line[256];
  char buf[1000000];

  memset(buf, 0, sizeof buf);
  memset(line, 0, sizeof line);

  do {
    r = read_port(port, buf, 1);
    //if (r) printf("%c", buf[0]);
  } while(  r < 1 || buf[0] != '\n' );

  printf("Start.\n");
  unsigned int gCnt = 0;
  unsigned int zCnt = 1;
  do {

    r = read_port(port, buf, sizeof buf);
    pos = 0;
    while (r-- && pos < sizeof buf) {
      char ch = buf[pos++];
      if (lcnt < sizeof line) {
        line[lcnt++] = ch;
      }
      if (ch == '\n') {
        line[lcnt] = '\0';
        if (lcnt != 64 || '1' != line[0] || 'z' != line[30] ) {
          if ( lines == 100000) {
            printf( "." );
            fflush(stdout);
            if ( !( gCnt % 50) )
              printf("\nLines-Delta: %d. Received: %s", lines, line);
            gCnt++;
          }
          else {
            if ( 0 != zCnt ) {
              printf("\nLines-Delta: %d. Received: %s", lines, line);
            }
            else {
              printf( "._X" );
            }
            zCnt = lines;
          }
          lines = 0;
        }
        else lines++;
        lcnt = 0;
      }
    }

  } while (1);


  close(port);
  return 0;
}

During this run, I compiled the Audio lib Guitar example, launched a virtualbox machine with Windows XP and Corel Draw, and ran the Gimp to take this screenshot.

screenshot.png
 
I'm struggling to understand why 64 bytes was chosen as the string length. That's an exact integer divide of the 512 byte packet size and 2048 transaction size.

serialtestT.c updated as below on : github.com/defragster/cdcbench
> quick updated require 3 params - updated to allow default of 63.

**<edit> note tested to be :: char [0] == '1' and [30] == 'z'

Assumed 64 would give best throughput without troubles- though trouble areas from mixed buffers is a worthy test.
> there is a 63 in the .c file - seems a 0 based count. As Frank noted it could have been given a cmdline override.
> in the INO change that to match sizeof : szTest[]:
//char szTest[] = "13579abcdefghijklmnopqrstuvwxyz02468ABCDEFGHIJKLMNOPQRSTUVWXYZ\n"; // SZ63
char szTest[] = "13579abcdefghijklmnopqrstuvwxyz0\n"; //SZ33

Will updated github with changed .c for overriding default of 63:
Code:
Usage: serialtestT.exe COMx [SZx]

Have seen some starts sync like butter - others churn with a screen of garbage false tries before sync. Wondered if the open_serial_port() might do another Flush...() - just edited that in to try. Added this before 'return port;':
Code:
  success = FlushFileBuffers(port);
  if (!success)
  {
    print_error("Failed to flush serial port #2");
    CloseHandle(port);
    return INVALID_HANDLE_VALUE;

Here are ends of three parallel stats runs after some hours:
Code:
stats: 100K=108001 less=54 repeated 20  //PowerShell T_4.1 com11
stats: 100K=282001 less=191 repeated 289 // T_MM com15
stats: 100K=116001 less=113 repeated 3176 // T_4.1 com37
> where repeated is an added fail after a fail to minimize screen spew (if coded right)

Assume that is Windows on Mac at half the speed of Frank and Linux box 10% more than Frank's.
 
Last edited:
Using above edited code on 3 1062 Teensys at the same time:
with : serialtestT.exe COM15 SZ33
for : char szTest[] = "13579abcdefghijklmnopqrstuvwxyz0\n"; //SZ33

Code:
Lines-Delta: 100000. Received: 3400000 Bytes in 135543 us = 200.67 Megabits per second
        stats: 100K=4001 less=2 repeated 0

Lines-Delta: 100000. Received: 3400000 Bytes in 130493 us = 208.44 Megabits per second
        stats: 100K=8001 less=6 repeated 0

Lines-Delta: 100000. Received: 3400000 Bytes in 136862 us = 198.74 Megabits per second
        stats: 100K=8001 less=0 repeated 0

One started with lots of NOISE after the other two were running, and they started after single sync to group.
 
Confirmed the usb_serial.c in use matches what Frank has posted on github.

@Paul - is there some magic number for the transfer? Franks's does 10B, and I've now tested 64B, 34B and 33B to work.

Ran the three in parallel - they were very good - then using machine introduced a few skips - with the three running in parallel - but nothing dramatic - quick to recover.

Running: serialtestT.exe COM15 SZ33 { also as posted as github.com/FrankBoesing/cdcbench/tree/main/AltVersion }
Code:
Lines-Delta: 100000. Received: 3400000 Bytes in 128842 us = 211.11 Megabits per second
        stats: 100K=104001 less=42 repeated 1

Lines-Delta: 100000. Received: 3400000 Bytes in 137223 us = 198.22 Megabits per second
        stats: 100K=98001 less=30 repeated 64

Lines-Delta: 100000. Received: 3400000 Bytes in 138422 us = 196.50 Megabits per second
        stats: 100K=98001 less=20 repeated 0

Switched to : .\SerialTestT COM11 SZ32 with INO : char szTest[] = "13579abcdefghijklmnopqrstuvwxyz\n"; //SZ32
>> AT FIRST seemed to NOT be running as smoothly - though early misses to to have corrected.
Mbps Sum of the three stayed about the same ~600 Mbps from 64B to 34B seems similar
Code:
Lines-Delta: 100000. Received: 3300000 Bytes in 124594 us = 211.89 Megabits per second
        stats: 100K=8001 less=79 repeated 2

        stats: 100K=8001 less=30 repeated 37

        stats: 100K=10001 less=34 repeated 1

Did a restart on all three and all three started quickly and are doing well with 33 byte packets. And all three just completed 2001 packets without error:
Code:
Lines-Delta: 100000. Received: 3300000 Bytes in 127398 us = 207.22 Megabits per second
        stats: 100K=2001 less=0 repeated 0

**The code with error counting ignores errors until the first completed successful transfer of the 100,000 packets.

The DOS box 'select and copy' caused 3 errors on one Teensy, the only errors up to 6001 sets of 3.3 MB - the other two showed NO errors up to the same 6001 transfer sets.
 
But it makes no sense to works with code that has known bugs. I only want to see problems with my new code.

Do we really have known bugs? Or is it just assumed bugs so far?


Otherwise, it will only distract us and lead us down the wrong path.

Yes, indeed....

Please try running the tests with this copy of usb_serial.c, which completely removes tx_noautoflush, usb_serial_flush_callback, and use of a timer to run it. If the known bug is a problem with memory synchronization where usb_serial_flush_callback() is interfering because tx_noautoflush isn't updated properly, then surely this copy without any usb_serial_flush_callback(), without any tx_noautoflush variable should work, right?

My skepticism is adding the memory barriers may be slow the transfers just enough to lessen the Windows reception issues when tested using a very fast desktop Windows system. Slower speed might give the false impression of fixing a bug. Maybe. It is really just a guess at this point, which is why I made this copy of usb_serial.c without any autoflush to possibly interfere with normal sending. Obviously it's not usable for normal programs without autoflush, but for the sake of testing this possible bug, hopefully this can confirm or deny whether incorrect autoflush really is the cause of these problems?

FWIW, this is Frank's usb_serial.c with everything about autoflush commented out. So if you wish to test any of the memory barriers Frank added, just remove the comments to put them back into the code.
 

Attachments

  • usb_serial.c
    13.2 KB · Views: 54
@Paul - is there some magic number for the transfer? Franks's does 10B, and I've now tested 64B, 34B and 33B to work.

I really do not know at this point. If there is a bug on the Teensy side, and I'm not trying to claim my code is bug free, I do not yet understand the problem well enough to answer this sort of question.

Sometimes when a bug is a subtle timing problem or race condition, changes like buffer and write sizes (or almost anything really) affect it indirectly by altering the timing of other parts of the program. These sorts of problems are very frustrating to diagnose.
 
I really do not know at this point. If there is a bug on the Teensy side, and I'm not trying to claim my code is bug free, I do not yet understand the problem well enough to answer this sort of question.

Sometimes when a bug is a subtle timing problem or race condition, changes like buffer and write sizes (or almost anything really) affect it indirectly by altering the timing of other parts of the program. These sorts of problems are very frustrating to diagnose.

Not sure there is a problem this shows - beyond the 'edges' where Windows drops the ball when the traffic gets mishandled at these speeds.


Just dropped the p#183 in CORES and rebuilt for the two T_4.1's and then T_MM under test 'together' (on unique USB ports) on this machine using the last part of p#182 for SZ32.
They all started funky on first run it seemed. So restarted them and they have passed the 4001 and a bit beyond with no errors.
Code:
        stats: 100K=4001 less=0 repeated 0
*Note they may not all be adding up to 600Mbps - seems one of the three can be closer to 150 thatn 200 MBps - see image:
serialTestPJRCusb_.png
> One show 130 Mbps - if this slowdown for safe data expected that is okay.
Will know in some hours as they are left now.

**NOTE: One odd thing - I walk away and the screen blanks. When I return and wake the screens - it is typical I to failed 100K groups in the preceding lines. All returns to normal by the time the screens are lit.

Something did cause this just after the screen shot: { the other two both passed 8001 with ZERO errors }
Code:
Lines-Delta: 100000. Received: 3300000 Bytes in 130195 us = 202.77 Megabits per second
..................................................
Lines-Delta: 100000. Received: 3300000 Bytes in 128250 us = 205.85 Megabits per second
..................................................
Lines-Delta: 100000. Received: 3300000 Bytes in 129912 us = 203.21 Megabits per second
................................................
X_Lines-Delta: 58531. Received: 1
X_Lines-Delta: 0. Received: .X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_.X_
X_Lines-Delta: 63. Received: 13579abcdefghijklmnopqrstuvwxyz3579abcdefghijklmnopqrstuvwxyz

X_Lines-Delta: 4540. Received: 3300000 Bytes in 174869 us = 150.97 Megabits per second
..
Lines-Delta: 100000. Received: 3300000 Bytes in 130002 us = 203.07 Megabits per second
** at 8001 (and 10001) that one is less=10 and repeated 162 (the .X_'s)
Another perfect 0 loss at 10001, and loss=3 on the third.
 
Last edited:
I'll test your code later - have other tasks now.. maybe this evening or tomorrow.
Paul, do you remember, back in july (or june?) we fixed a very similar issue (data loss). VMuzik reported that the issue was _not_ fixed for him.
(Patch was never done for the other usb_serial*.c)

So, have I got it right that you don't see missing data - as opposed to your MAC screenshot? (On my system there is a difference, and it works better with the barriers.)
Your speed shows that Windows USB serial is not slower than linux. (at least on my machine). The slower speed of the monitor must have an other reason. GUI?
 
I'll test your code later - have other tasks now.. maybe this evening or tomorrow.
Paul, do you remember, back in july (or june?) we fixed a very similar issue (data loss). VMuzik reported that the issue was _not_ fixed for him.
(Patch was never done for the other usb_serial*.c)

So, have I got it right that you don't see missing data - as opposed to your MAC screenshot? (On my system there is a difference, and it works better with the barriers.)
Your speed shows that Windows USB serial is not slower than linux. (at least on my machine). The slower speed of the monitor must have an other reason. GUI?

p.s. Why _exactly_ does it take some time after start to run smoothly? Can we fix that?
 
Paul, do you remember, back in july (or june?) we fixed a very similar issue (data loss).

Yes, I believe it was this commit.

https://github.com/PaulStoffregen/c...725c09bbb5d7a23f12c1b2d2c7f7e9e274ff20bb93814

It has both writes to tx_noautoflush and memory barriers *before* some of those writes. No real work went into investigating whether the memory barriers were really necessary.


VMuzik reported that the issue was _not_ fixed for him.

I don't have a link to that message.


(Patch was never done for the other usb_serial*.c)

I can apply it to the others. Do you believe I should apply the same patch now, or do something else?

I am planning to package up 1.56-beta2, mainly to roll up the recently changes for MTP. I could apply the change to the other 2 now, with the same, or with more, or without any memory barriers. Any advice on which I should do for 1.56-beta2 ?


So, have I got it right that you don't see missing data - as opposed to your MAC screenshot?

I have not tested recently on MacOS. I haven't even compiled serialtestT.c on MacOS yet.

I did run it several times on Linux. The result always looks similar to the screenshot on msg #179, with or without the memory barriers. So far I've only tested Linux on a fast desktop. Might be interesting to see how it does with a slow system on Raspberry Pi or Jetson Nano.


(On my system there is a difference, and it works better with the barriers.)

Is the improvement because the memory barriers fix a bug on Teensy, or only because the memory barriers slow the speed?


Your speed shows that Windows USB serial is not slower than linux. (at least on my machine). The slower speed of the monitor must have an other reason. GUI?

I do not know. I do not even know what is truly causing the problems with this much simpler test.
 
p.s. Why _exactly_ does it take some after start to run smoothly? Can we fix that?

Not sure why it gets off to a bad start SOME TIMES. Made an edit to flush the buffers a 2nd time after OPEN and then didn't hurt, but some first runs - maybe Windows USB is asleep with the ports closed - reprogrammed then suddenly pumping data?

As posted saw this again and restarted all three and then NO TROUBLE starting.

This may be a reflection of the PJRC_lps test that starts slow # lines - then doubles over time as Windows compensates?

All three at 18001 sets of 100K reads - no new errors.
 
Earlier (a few days ago) I saw, or at least I thought I saw, a substantial difference with vs without the memory barriers.

But now I can't reproduce it. Maybe it was my imagination? Or a mistake. Probably a mistake! Sorry :(
 
Ugh, *another* mistake! I can indeed reproduce it. Just now I was copying the file to the wrong name, so I ended up retesting the same no-barrier code over and over. (maybe it's about time for me to give this a rest....)
 
Here is the lines/sec benchmark with availableForWrite() wait, using usb_serial.c with memory barriers, received by Teensyduino serial monitor on Linux.

screenshot1.png


Here is the same test, but using usb_serial.c from msg #183 (just commenting out all the memory barriers).

screenshot2.png


Edit: Just to be clear, I see no signs of missing or corrupted lines on Linux when testing with the availableForWrite() check. MacOS still has some issues, and this is admittedly just using the GUI rather than code which checks every line, but everything I can see shows Linux is receiving all lines perfectly.



This is the code Arduino compiled for both tests, just the usual lines/sec benchmark but with availableForWrite() added.

Code:
uint32_t count, prior_count;
uint32_t prior_msec;
uint32_t count_per_second;

// Uncomment this for boards where "SerialUSB" needed for native port
//#define Serial SerialUSB

void setup() {
  Serial.begin(1000000); // edit for highest baud your board can use
  while (!Serial) ;
  count = 10000000; // starting with 8 digits gives consistent chars/line
  prior_count = count;
  count_per_second = 0;
  prior_msec = millis();
}

void loop() {
  while (Serial.availableForWrite() < 50) ; //wait
  Serial.print("count=");
  Serial.print(count);
  Serial.print(", lines/sec=");
  Serial.println(count_per_second);
  count = count + 1;
  uint32_t msec = millis();
  if (msec - prior_msec > 1000) {
    // when 1 second as elapsed, update the lines/sec count
    prior_msec = prior_msec + 1000;
    count_per_second = count - prior_count;
    prior_count = count;
  }
}
 
Last edited:
However, with SerialTestT.ino, I see only a very slight difference in speed, just barely more than the margin of error / variance.

But the lines/sec test shows there definitely is a difference. It's very likely the precise low-level timing changes, even if the overall speed of the simpler test using a fixed string isn't sensitive to those small changes. The more complex test with 2 numbers that change and 5 Serial.write() calls per line definitely shows the memory barriers do have an effect.
 
One more observation, then I'm going to set this down for a while...

CPU speed appears to have almost no impact on SerialTestT.ino. But it does greatly influence usb_serial_print_speed.ino.

Both change if the transmit buffer size is increased to 8192 or 16384 bytes. The ZLP setting also affects both.


At this point, I really have only 1 immediate question: for 1.56-beta2, should I apply the same patch to usb_serial2.c and usb_serial3.c, or put in the path plus extra memory barriers, or just leave them as-is and revisit this for 1.56-beta3 or later?
 
That was exactly the test where you saw dropped data on MAC - right?

We're just going around in circles here.

It's not about maximum speed.
It's about maximum reliability.
 
I don't care what you do with the other usb_serial2/3
They don't have the fix. For me fixing eveything would be priority. But I don't care - personally I use them very rarely.
 
If it helps (perhaps for the USB Garbage problem too) I wouldn't even care if the speeds would drop to 25%. As long it fixes problems.
 
Status
Not open for further replies.
Back
Top