Serial Communication Problems Teensy 4.0

Status
Not open for further replies.

juwi

New member
Hi!

OS: Win10 pro 64bit
PC: Thinkpad extreme
Arduino: 1.8.12
Python: 3.7.6
pyserial: pyserial-3.4-py2.py3-none-any.whl


I expierience wrong data in my serial data stream and cannot find myself doing something wrong :(

Arduino code:

Code:
/*
 Name:		usbserialtest.ino
 Created:	4/25/2020 10:49:44 PM
 Author:	ju
*/

// the setup function runs once when you press reset or power the board
void setup() {
	Serial.begin(512000);
}

// the loop function runs over and over again until power down or reset
void loop() {
	Serial.println(233.3423423, 3);
}

Python Code:

Code:
import serial
import time

sp = serial.Serial(port="COM3", baudrate=512000, timeout=None)

count=0
lt = time.time()
while True:
    try:
        data_encoded = sp.readline()
        data = data_encoded.decode("ascii").strip()
    except Exception as e:
        print(e, data_encoded)

    count += 1
    if count % 1000 == 0:
        print(count, time.time() - lt)
        lt = time.time()


Error manifestation:

Code:
1000 0.11618208885192871
2000 0.11321306228637695
3000 0.10345697402954102
'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128) b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff[...(a lot of ffs deleted here)]\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\r\n'
4000 0.3161895275115967
5000 0.12297606468200684

Anybody an Idea?
This is already a testcase, my real application is sending angle data from encoders. Thats where I ran into this... The error is not showing up regularly, but often enough (about every 3rd or 4th run)

Best regards
Julian
 
This code on Teensy 4 with overwhelm the PC with output - it needs a sensible delay mechanism of some sort to let the PC keep up.

Code:
// the loop function runs over and over again until power down or reset
void loop() {
	Serial.println(233.3423423, 3);
  //delayMicroseconds(200);
  delay(1);
}

One of these overt delays might give better results to start with out 1000 or 5000 prints per second - without those it will be 500,000 or more per second.
 
Hi!

This code on Teensy 4 with overwhelm the PC with output - it needs a sensible delay mechanism of some sort to let the PC keep up.

Code:
// the loop function runs over and over again until power down or reset
void loop() {
	Serial.println(233.3423423, 3);
  //delayMicroseconds(200);
  delay(1);
}

One of these overt delays might give better results to start with out 1000 or 5000 prints per second - without those it will be 500,000 or more per second.

Thanks for the fast response.
But I still think this is not right. What transmission function should transmit garbage when called often?
And, worse, in my case it is not an option to delay anyting. My original code is sending updates when changes occur. Delaying that by default would be wrong.
Changes are user interactions, so sometimes there are a lot of them, sometimes not.

For my understanding the print function shall buffer into a USB buffer and when the buffer is full it should block. But sending garbage? No.

Best regads
Julian
 
Again always hard do know exactly what is going on.
But my first guess would not be the Teensy with the issue.

My first guess would be the Python/PySerial can not handle the volume of data. Who knows maybe it is overflowing some internal buffers or the underlying system code.

So first thing I would do is to try to isolate which side has the issue.
 
Hmm, ok, how may I do so?

I can use a different software written in C++ or C# to be the receiving end?

Best regards
Julian
 
But my first guess would not be the Teensy with the issue.
My first guess would be the Python/PySerial can not handle the volume of data.
No, there are odd issues even with C code that can easily handle the full USB bandwidth (using termios in Linux), but keels over with Teensy 4.0.

It looks like a "lockup" of some sort when either endpoint sends a NAK. It shouldn't; the URB should simply be resent later.

The test code I used on Linux (would also run on Mac) is here.

This is something I intend to look into, but thus far have been blocked by real life (except for trivial tests like the linked one). (Apologies for not supplying Windows source; I don't have Windows on any of my machines, and haven't used it for over a decade.)
 
Hi Nominal Animal - As you said there may be issues with the interaction of the Teensy 4 USB...
But I also in the past had major issues with pyserial (at least on RPIs) .

What I was maybe going to suggest is something similar to what you have using termios. But only in one direction. After all this test is only doing output from T4...

So I was going to suggest simple test on the Linux side that simply received the packets and only output something to debug output there if there is an issue and maybe some progress report like every 100 or 1000 messages... And see if that works. And go from there.
 
Just going through this thread and its kind of reminding me that even if you set the baudrate on the T4 at 115200 or whatever the teensy is still going to dump data to the serial port as fast as it can which in this case is probably pretty close to 480mbs (I think - forgot our actual number on a windows machine). So if you set the baud in PySerial at 51200 its trying to read data at a much slower rate that the T4 is dumping out to serial port.

To test this test I changed to a baud of 4000000 in the Python script:
Code:
sp = serial.Serial(port="COM35", baudrate=4000000 , timeout=None)
and I start seeing the data streaming with the occasional drop out:
Code:
233.342
233.342
(8000, 2.6690001487731934)
233.342
233.342
Just a thought
 
For my understanding the print function shall buffer into a USB buffer and when the buffer is full it should block

This is what it does, but it doesn't wait forever. After a timeout print will return and the printed data is lost which might explain your messed up serial stream. A Teensy 4.0 will transmit >15 MByte / second if you don't throttle it (https://forum.pjrc.com/threads/58629-Win10-amp-T4-Serial-Communication-Tests). I'm not sure if Python is quick enough to read that away... I'd try defragters suggestion and add some delay just to see if it works then.
 
For what it is worth, I tried a setup almost identical to your code, but I setup my Linux machine running Ubuntu 18.04...

I modified the Arduino sketch slightly to not start outputting anything until it is connected to USB:
Code:
/*
 Name:    usbserialtest.ino
 Created: 4/25/2020 10:49:44 PM
 Author:  ju
*/

// the setup function runs once when you press reset or power the board
void setup() {
  while(!Serial) ;
  Serial.begin(512000);
}

// the loop function runs over and over again until power down or reset
void loop() {
  Serial.println(233.3423423, 3);
}

And updated your Python script to:
Code:
import serial
import time

sp = serial.Serial(port="/dev/ttyACM0", baudrate=512000, timeout=None)

count=0
lt = time.time()
while True:
    try:
        data_encoded = sp.readline()
        data = data_encoded.decode("ascii").strip()
    except Exception as e:
        print(e, data_encoded)

    count += 1
    if count % 1000 == 0:
        print(count, time.time() - lt)
        lt = time.time()

Ran the python in terminal window: python3 ./test.py

Ran for a long time, until it ctrl-c out of it.
Code:
...
6024000 0.05694746971130371
6025000 0.059212446212768555
6026000 0.059168100357055664
6027000 0.05825209617614746
6028000 0.0617518424987793
6029000 0.057840824127197266
6030000 0.06173419952392578
6031000 0.05881047248840332
6032000 0.057424068450927734
6033000 0.062371015548706055
6034000 0.057694196701049805
6035000 0.061856985092163086
6036000 0.05755043029785156
6037000 0.05863547325134277
 
But I also in the past had major issues with pyserial (at least on RPIs) .
Me too, that's why I do tests using raw termios on Linux.

What I was maybe going to suggest is something similar to what you have using termios. But only in one direction. After all this test is only doing output from T4...
Excellent idea; doing one-way tests will tell us a lot more about exactly what is happening.

Unfortunately, I just noticed I have a bulging battery issue on this laptop (am now draining it carefully, and taking a final backup of my files), so if you don't hear from me for a while, it is because I'm offline :(
 
But of course I should probably try this on Windows 10 as this is the actual setup that is specified here..
 
If I understand your python code correctly you increase the read speed by only printing out every 1000th line? Makes sense, looks like this enables the PC to empty the buffers quick enough to prevent buffer timeouts on the Teensy side. Or did I misunderstand the idea?
 
Just catching up on this - an unrestrained T4 will cause data loss/corruption/explosion sending USB to a PC - at some point. It can buffer only so much and then things get wonky.

Running the test { PJRC lines per second linked above it seems } of 500,000 lines of ~32 bytes per line is indeed 16 MB/sec. That test can start well with bump/burst to 800+K lps and stall lower. But stop the Teensy and the PC keeps filtering data to the PC APP for 10-20 seconds IIRC.

You can see this with SerMon or TyComm or a PC App. All run about the same the PC is the weak link where the Teensy sends data whenever the PC approves it - then the PC ends up presenting buffered data - VALID DATA typically - for a long time after the Teensy stops.

So the T_4 must only send as fast as the PC can reasonably and reliably handle it. The T_3.6 was able to blow away PC running IDE SerMon years back until Paul cleaned up the JAVA code and that was only 25,000 lines per second. It is now running 20 times faster and Paul spent a week to clean that up to be functional with his IDE Teensy SerMon.

Start with restrained T_4 output as suggested in post #2 - 1 ms then 200 us as you see it working. Find out how that goes when further reduced while keeping the PC end functional.

The T_4 USB stack may have bugs - but it is a functional fire hose … do not drink from it carelessly.


>> Using a hacked version of the Win PC app the GUI Print was limited but did some data testing and the data 'sampled' good and may have been 900K lps - but that is one place it ran long after T_4 output stopped. Has been too many months - but the T_4 had no trouble filling PC buffers over USB
 
@KurtE, @defragster -- If you happen to have a fast-ish Linux machine (virtual machine is perfectly okay), you might wish to test the following program pair. I finished these first; I'll implement the Teensy sender next. Both programs show usage when run without parameters.

The first program is receiver.c, a pthreaded Linux program that receives Xorshift64* sequences least significant byte first, gathering statistics on the transfer. It sends a single byte (value 1) when it is ready to receive, and another (value 0) to indicate it is satisfied. It can use any character device or named pipe, but it assumes it is a tty-like device (serial port or similar). I recommend using gcc -Wall -O2 receiver.c -pthread -o receiver to compile this one. The source is in public domain (CC0).
Code:
// SPDX-License-Identifier: CC0-1.0
// https://spdx.org/licenses/CC0-1.0.html
#define  _POSIX_C_SOURCE
#define  _GNU_SOURCE
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <limits.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <termios.h>
#include <signal.h>
#include <time.h>
#include <errno.h>

/* Receive size histogram upper edge */
#ifndef  READ_HISTOGRAM
# define  READ_HISTOGRAM  65536
#endif

/* Silence unused parameter warnings */
#ifndef  UNUSED
# define  UNUSED        __attribute__ ((unused))
#endif

/* Signal used internally to indicate worker thread is done */
#ifndef  WORKER_SIGNAL
# define  WORKER_SIGNAL   (SIGRTMIN+0)
#endif

/* Clock source used for measuring receive duration */
#ifndef  WORKER_CLOCK
# define  WORKER_CLOCK  CLOCK_MONOTONIC
#endif

/* Access mode used for the memory mapping file */
#ifndef  MAPPING_MODE
# define  MAPPING_MODE  0640
#endif

/* Flags used when memory mapping a file */
#ifndef  MAPPING_FLAGS
# ifdef  MAP_NORESERVE
#  define  MAPPING_FLAGS  MAP_SHARED | MAP_NORESERVE
# else
#  define  MAPPING_FLAGS  MAP_SHARED
# endif
#endif

/* Xorshift64* seed */
#ifndef  SEED
# define  SEED  1
#endif

/* Worker thread: Reads input from a character device.
*/
pthread_t        main_thread;
pthread_t        worker_thread;

struct timespec  receive_started;
struct timespec  receive_finished;

const union sigval  worker_failure = { .sival_ptr = (void *)(intptr_t)1 };
const union sigval  worker_success = { .sival_ptr = (void *)(intptr_t)0 };

const char      *mapping = NULL;   /* Name of memory-mapped file, if any. */
size_t           mapping_len = 0;  /* Size of the memory-map, if any. */
int              src = -1;
unsigned char   *data = NULL;    /* Pointer to storage */
size_t           size = 0;       /* Amount to read */
volatile size_t  have_ = 0;      /* Use data_have() instead! */

size_t          *read_histogram = NULL;
size_t           read_histogram_max = 0;

static inline size_t data_have(void) { return __atomic_load_n(&have_, __ATOMIC_SEQ_CST); }
static inline size_t data_have_add(const size_t bytes) { return __atomic_add_fetch(&have_, bytes, __ATOMIC_SEQ_CST); }

void *worker(void *payload UNUSED)
{
    unsigned char   start[1] = { 1 };
    unsigned char   end[1] = { 0 };
    size_t          have = data_have();
    ssize_t         n;

    /* Send start byte. */
    do {
        n = write(src, start, 1);
    } while (n == -1 && (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK));
    if (n != 1) {
        void *retval = (void *)(intptr_t)(n == -1 ? errno : EINVAL);
        pthread_sigqueue(main_thread, WORKER_SIGNAL, worker_failure);
        return retval;
    }

    /* Record start time. */
    clock_gettime(WORKER_CLOCK, &receive_started);

    /* Receive loop. */
    if (read_histogram) {
        while (have < size) {

            n = read(src, data + have, size - have);
            if (n == -1)
                return (void *)(intptr_t)errno;

            if (n < (ssize_t)1) {
                ++read_histogram[0];
            } else
            if ((size_t)n < read_histogram_max) {
                ++read_histogram[n];
            } else {
                ++read_histogram[read_histogram_max];
            }

            have = data_have_add(n);
        }
    } else {
        while (have < size) {

            n = read(src, data + have, size - have);
            if (n == -1)
                return (void *)(intptr_t)errno;

            have = data_have_add(n);
        }
    }

    /* Record stop time. */
    clock_gettime(WORKER_CLOCK, &receive_finished);

    /* Send end mark. */
    do {
        n = write(src, end, 1);
    } while (n == -1 && (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK));
    if (n != 1) {
        void *retval = (void *)(intptr_t)(n == -1 ? errno : EINVAL);
        pthread_sigqueue(main_thread, WORKER_SIGNAL, worker_failure);
        return retval;
    }

    /* Set completion signal. */
    pthread_sigqueue(main_thread, WORKER_SIGNAL, worker_success);
    return (void *)0;
}

static int parse_size(const char *s, size_t *to)
{
    const char     *end;
    unsigned long   uval, scale;
    size_t          val;

    if (!s || !*s)
        return errno = EINVAL;

    scale = 1;
    end = s;
    errno = 0;
    uval = strtoul(s, (char **)(&end), 0);
    if (errno)
        return errno;
    if (!end || end == s)
        return errno = EINVAL;

    while (isspace((unsigned char)(*end)))
        end++;

    if (end[0] == 'k') {
        scale = 1024;
        end++;
    } else
    if (end[0] == 'M') {
        scale = 1024 * 1024;
        end++;
    } else
    if (end[0] == 'G') {
        scale = 1024UL * 1024UL * 1024UL;
        end++;
    }

    while (isspace((unsigned char)(*end)))
        end++;

    if (*end)
        return errno = EINVAL;

    val = uval * scale;
    if ((unsigned long)(val / scale) != uval)
        return errno = ENOMEM;

    if (to)
        *to = val;

    return 0;
}

static uint64_t      state = SEED;
static uint64_t      left_state = 0;
static unsigned int  left = 0;

static inline uint64_t  next_state(void)
{
    uint64_t  x = state;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    state = x;
    return x * UINT64_C(2685821657736338717);
}

static inline unsigned char  next_byte(void)
{
    unsigned char  result;

    if (!left) {
        left = 8;
        left_state = next_state();
    }

    result = left_state;
    left_state >>= 8;
    left--;

    return result;
}

void cleanup(void)
{
    /* Unmap or free. */
    if (mapping_len > 0) {
        munmap(data, mapping_len);
    } else {
        free(data);
    }
    mapping_len = 0;
    data = NULL;

    /* Discard read histogram, if any. */
    if (read_histogram) {
        free(read_histogram);
        read_histogram = NULL;
    }

    /* Close device. */
    if (src != -1) {
        /* But first, discard any data in kernel buffers. */
        tcflush(src, TCIOFLUSH);
        if (close(src) == -1) {
            fprintf(stderr, "Error closing device: %s.\n", strerror(errno));
        }
        src = -1;
    }

    /* Unlink mapping, if any. */
    if (mapping) {
        if (unlink(mapping) == -1) {
            fprintf(stderr, "Error removing mapping: %s.\n", strerror(errno));
        }
        mapping = NULL;
    }
}

int main(int argc, char *argv[])
{
    pthread_attr_t  attrs;
    siginfo_t       info;
    sigset_t        signals;
    void           *retval;
    int             result, exit_status = EXIT_SUCCESS;

    setlocale(LC_ALL, "");

    if (argc < 3 || argc > 4) {
        const char *argv0 = (argc > 0) ? argv[0] : "(this)";

        printf("\n");
        printf("Usage: %s [ -h | --help ]\n", argv0);
        printf("       %s DEVICE SIZE [ MAPPING ]\n", argv0);
        printf("\n");
        printf("Where:\n");
        printf("       DEVICE   is the character device to talk to,\n");
        printf("       SIZE     is the number of bytes to receive,\n");
        printf("       MAPPING  is the name of the memory-mapped file\n");
        printf("                to use, if storing data to a file.\n");
        printf("\n");
        printf("You can use suffix k for units of 1024 bytes,\n");
        printf("                   M for units of 1048576 bytes, and\n");
        printf("                   G for units of 1073741824 bytes.\n");
        printf("\n");
        printf("When running, you can send an USR1 or USR2 signal\n");
        printf("to request progress updates.\n");
        printf("\n");
        if (argc == 1 || argc == 2)
            return EXIT_SUCCESS;
        else
            return EXIT_FAILURE;
    }

    /* Parse size. */
    if (parse_size(argv[2], &size) || size < 1) {
        fprintf(stderr, "%s: Invalid size.\n", argv[2]);
        return EXIT_FAILURE;
    }

    /* Open the device. */
    src = open(argv[1], O_RDWR | O_CLOEXEC);
    if (src == -1) {
        fprintf(stderr, "Cannot open device %s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }

    /* Set termios properties, if a termios-like device. */
    if (isatty(src)) {
        struct termios  settings;

        memset(&settings, 0, sizeof settings);
        if (tcgetattr(src, &settings) == 0) {

            settings.c_iflag &= ~(BRKINT | PARMRK | INPCK | ISTRIP | INLCR | IGNCR | ICRNL | IXON | IXOFF);
            settings.c_iflag |= IGNBRK | IGNPAR;

            settings.c_oflag &= ~(OPOST | ONLCR | ONOCR | ONLRET);

            settings.c_cflag &= ~(CSIZE | PARENB | CLOCAL);
            settings.c_cflag |= CS8 | CREAD | HUPCL;

            settings.c_lflag &= ~(ISIG | ICANON | ECHO | ECHOE | ECHOK | ECHONL | TOSTOP | IEXTEN);

            settings.c_cc[VMIN] = 0;
            settings.c_cc[VTIME] = 10;

            tcflush(src, TCIOFLUSH);
            tcsetattr(src, TCSANOW, &settings);
            tcflush(src, TCIOFLUSH);

            fprintf(stderr, "Device %s prepared for communications.\n", argv[1]);
            fflush(stderr);
        }
    }

    /* Block signals we respond to.  This will be inherited by the worker thread. */
    sigemptyset(&signals);
    sigaddset(&signals, SIGINT);
    sigaddset(&signals, SIGHUP);
    sigaddset(&signals, SIGTERM);
    sigaddset(&signals, SIGUSR1);
    sigaddset(&signals, SIGUSR2);
    sigaddset(&signals, WORKER_SIGNAL);
    if (sigprocmask(SIG_BLOCK, &signals, NULL) == -1) {
        fprintf(stderr, "Cannot block signals: %s.\n", strerror(errno));
        close(src);
        return EXIT_FAILURE;
    }

    if (argc == 4) {
        size_t  page, tail;
        int     mapfd;

        mapping = argv[3];
        page = sysconf(_SC_PAGESIZE);

        tail = size % page;
        if (tail) {
            mapping_len = size + page - tail;
        } else {
            mapping_len = size;
        }

        /* Create mapping file. */
        mapfd = open(mapping, O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, MAPPING_MODE);
        if (mapfd == -1) {
            fprintf(stderr, "Cannot create mapping file %s: %s.\n", mapping, strerror(errno));
            close(src);
            return EXIT_FAILURE;
        }
        if (ftruncate(mapfd, (off_t)mapping_len) == -1) {
            fprintf(stderr, "Cannot resize mapping file %s: %s.\n", mapping, strerror(errno));
            close(src);
            close(mapfd);
            unlink(mapping);
            return EXIT_FAILURE;
        }
        result = posix_fallocate(mapfd, (off_t)0, (off_t)mapping_len);
        if (result) {
            fprintf(stderr, "Cannot allocate enough storage space for mapping file %s: %s.\n", mapping, strerror(errno));
            close(src);
            close(mapfd);
            unlink(mapping);
            return EXIT_FAILURE;
        }
        data = mmap((void *)0, mapping_len, PROT_READ | PROT_WRITE, MAPPING_FLAGS, mapfd, (off_t)0);
        if ((void *)data == MAP_FAILED) {
            fprintf(stderr, "Cannot memory-map %s: %s.\n", mapping, strerror(errno));
            close(src);
            close(mapfd);
            unlink(mapping);
            return EXIT_FAILURE;
        }
        if (close(mapfd) == -1) {
            fprintf(stderr, "Error closing %s: %s.\n", mapping, strerror(errno));
            close(src);
            close(mapfd);
            munmap(data, mapping_len);
            unlink(mapping);
            return EXIT_FAILURE;
        }

        /* Clear the mapping to zeroes. */
        fprintf(stderr, "Clearing %s .. ", mapping);
        fflush(stderr);
        memset(data, 0, mapping_len);
        fprintf(stderr, "Done; mapping is okay.\n");
        fflush(stderr);

    } else {

        data = malloc(size);
        if (!data) {
            fprintf(stderr, "Cannot allocate %zu bytes of memory.\n", size);
            close(src);
            return EXIT_FAILURE;
        }

        fprintf(stderr, "Clearing receive buffer .. ");
        fflush(stderr);
        memset(data, 0, size);
        fprintf(stderr, "Done.\n");
        fflush(stderr);
    }

    /* Allocate receive size histogram. */
    read_histogram_max = READ_HISTOGRAM;
    read_histogram = calloc(read_histogram_max + 1, sizeof read_histogram[0]);
    if (!read_histogram) {
        fprintf(stderr, "Cannot allocate memory for receive statistics.\n");
        cleanup();
        return EXIT_FAILURE;
    }

    /* Initialize shared variables. */
    main_thread = pthread_self();
    clock_gettime(WORKER_CLOCK, &receive_started);

    /* Create the worker thread, using a smallish stack. */
    pthread_attr_init(&attrs);
    pthread_attr_setstacksize(&attrs, 2*PTHREAD_STACK_MIN);
    result = pthread_create(&worker_thread, &attrs, worker, NULL);
    if (result) {
        pthread_attr_destroy(&attrs);
        fprintf(stderr, "Cannot create receiver thread: %s.\n", strerror(result));
        cleanup();
        return EXIT_FAILURE;
    }

    /* Respond to signals. */
    while (1) {

        if (sigwaitinfo(&signals, &info) == -1) {
            fprintf(stderr, "Error catching signals: %s.\n", strerror(errno));
            pthread_cancel(worker_thread);
            exit_status = EXIT_FAILURE;
            break;
        }

        if (info.si_signo == SIGUSR1 || info.si_signo == SIGUSR2) {
            struct timespec  now;
            double           elapsed;
            size_t           bytes;

            bytes = data_have();
            clock_gettime(WORKER_CLOCK, &now);
            elapsed = (double)(now.tv_sec - receive_started.tv_sec)
                    + (double)(now.tv_nsec - receive_started.tv_nsec) / 1000000000.0;

            printf("Received %zu bytes in %.3f seconds (%.0f bytes/second)\n",
                   bytes, elapsed, (double)bytes / elapsed);
            fflush(stdout);
            continue;
        }

        if (info.si_signo == SIGINT || info.si_signo == SIGHUP || info.si_signo == SIGTERM) {
            const char *name = (info.si_signo == SIGINT) ? "INT" :
                               (info.si_signo == SIGHUP) ? "HUP" :
                               (info.si_signo == SIGTERM) ? "TERM" : "unknown";

            fprintf(stderr, "Aborted by %s signal.\n", name);
            fflush(stderr);

            pthread_cancel(worker_thread);
            break;
        }

        if (info.si_signo == WORKER_SIGNAL) {

            /* Ignore signals not sent by our worker thread. */
            if (info.si_pid != getpid() || info.si_code != SI_QUEUE)
                continue;

            /* Success? */
            if (info.si_value.sival_ptr == worker_success.sival_ptr) {
                double  elapsed = (double)(receive_finished.tv_sec - receive_started.tv_sec)
                                + (double)(receive_finished.tv_nsec - receive_started.tv_nsec) / 1000000000.0;
                size_t  bytes = data_have();
                double  rate = (double)bytes / elapsed;

                printf("Received %zu bytes in %.3f second (", bytes, elapsed);
                if (rate >= 1000000000.0)
                    printf("%.1f Gbytes/sec, ", rate/1073741824.0);
                else if (rate >= 1000000.0)
                    printf("%.1f Mbytes/sec, ", rate/1048576.0);
                else if (rate >= 1000.0)
                    printf("%.1f kbytes/sec, ", rate/1024.0);
                else
                    printf("%.0f bytes/sec, ", rate);

                rate *= 8.0;
                if (rate >= 1000000000.0)
                    printf("%.1f Gbits/sec)\n", rate/1073741824.0);
                if (rate >= 1000000.0)
                    printf("%.1f Mbits/sec)\n", rate/1048576.0);
                else if (rate >= 1000.0)
                    printf("%.1f kbits/sec)\n", rate/1024.0);
                else
                    printf("%.0f bits/sec)\n", rate);

                fflush(stdout);
            } else {
                fprintf(stderr, "Receiver thread failed.\n");
                fflush(stderr);
                exit_status = EXIT_FAILURE;
            }

            break;
        }

        fprintf(stderr, "(Ignored signal %d)\n", info.si_signo);
        fflush(stderr);
    }

    /* Reap worker thread. */
    retval = NULL;
    result = pthread_join(worker_thread, &retval);
    if (result) {
        fprintf(stderr, "Lost receiver thread: %s.\n", strerror(errno));
        exit_status = EXIT_FAILURE;
    } else
    if ((intptr_t)retval == 0) {
        printf("Received thread completed successfully.\n");
    } else {
        fprintf(stderr, "Receiver thread failed: %s.\n", strerror((intptr_t)retval));
        exit_status = EXIT_FAILURE;
    }

    if (read_histogram) {
        size_t  i, n = 0;

        for (i = 1; i <= read_histogram_max; i++)
            n += read_histogram[i];

        printf("# Receive statistics (%zu successful reads, %zu additional attempts):\n",
               n, read_histogram[0]);
        printf("# Size  Number of read()s\n");
        for (i = 1; i < read_histogram_max; i++) {
            if (read_histogram[i] > 0) {
                printf("%8zu %zu\n", i, read_histogram[i]);
            }
        }
        if (read_histogram[read_histogram_max] > 0) {
            printf("%8zu+ %zu\n", read_histogram_max, read_histogram[read_histogram_max]);
        }
    }

    if (data_have() == size) {
        size_t  i;

        printf("Verifying data .. ");
        fflush(stdout);

        for (i = 0; i < size; i++)
            if (data[i] != next_byte())
                break;

        if (i >= size) {
            printf("No errors.\n");
        } else
        if (i > 0) {
            printf("Only first %zu bytes match.\n", i);
        } else {
            printf("All of the data is wrong.\n");
        }
    }

    cleanup();
    return exit_status;
}
It takes two or three command-line parameters: the device path to use, the amount of data to receive, and optionally a file name if a file-backed memory map is to be used. Run without parameters to get usage.

The second program is a local emulation of a Teensy sending the data, using the POSIX pseudoterminal interface: sender.c. It too is in public domain / CC0. This introduces at least as much overhead as the USB CDC-ACM does in the Linux kernel, and in my experiments, the limiting factor seems to be the number of write syscalls -- however, this is an old, slow laptop. It too gathers statistics on the transfers.
Code:
// SPDX-License-Identifier: CC0-1.0
// https://spdx.org/licenses/CC0-1.0.html
#define _XOPEN_SOURCE 600
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <signal.h>
#include <locale.h>
#include <ctype.h>
#include <poll.h>
#include <pty.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

#ifndef  INTERVAL_MS
# define  INTERVAL_MS  1000
#endif

#ifndef  SEED
# define  SEED 1
#endif

static uint64_t  state = SEED;
static uint64_t  left_state = 0;
static int       left = 0;

static inline void reset(void)
{
    state = SEED;
    left = 0;
    left_state = 0;
}

static inline uint64_t next_state(void)
{
    uint64_t  x = state;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    state = x;
    return x * UINT64_C(2685821657736338717);
}

static unsigned char *refill(unsigned char *dst, const size_t len)
{
    unsigned char *const end = dst + len;

    /* Emit the leftovers of last state. */
    while (dst < end && left > 0) {
        *(dst++) = left_state;
        left_state >>= 8;
        left--;
    }

    /* Emit in units of 8 chars. */
    while (dst + 8 <= end) {
        uint64_t  u = next_state();
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
        u >>= 8;
        *(dst++) = u;
    }

    /* Leftovers */
    while (dst < end) {
        if (left < 1) {
            left = 8;
            left_state = next_state();
        }

        *(dst++) = left_state;
        left_state >>= 8;
        left--;
    }

    dst = end;

    return dst;
}

/* Stand and end characters
*/
const unsigned char  request_start = 1;
const unsigned char  request_stop = 0;

/* Signal handers
*/
static volatile sig_atomic_t  done = 0;

static void handle_done(int signum)
{
    if (!done) {
        if (signum > 0 && signum < 127)
            done = signum;
        else
            done = 127;
    }
}

static int install_done(int signum)
{
    struct sigaction  act;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_handler = handle_done;
    act.sa_flags = 0;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

static int parse_size(const char *s, size_t *to)
{
    const char     *end;
    unsigned long   uval, scale;
    size_t          val;

    if (!s || !*s)
        return errno = EINVAL;

    scale = 1;
    end = s;
    errno = 0;
    uval = strtoul(s, (char **)(&end), 0);
    if (errno)
        return errno;
    if (!end || end == s)
        return errno = EINVAL;

    while (isspace((unsigned char)(*end)))
        end++;

    if (end[0] == 'k') {
        scale = 1024;
        end++;
    } else
    if (end[0] == 'M') {
        scale = 1024 * 1024;
        end++;
    } else
    if (end[0] == 'G') {
        scale = 1024UL * 1024UL * 1024UL;
        end++;
    }

    while (isspace((unsigned char)(*end)))
        end++;

    if (*end)
        return errno = EINVAL;

    val = uval * scale;
    if ((unsigned long)(val / (size_t)scale) != uval)
        return errno = ENOMEM;

    if (to)
        *to = val;

    return 0;
}

int main(int argc, char *argv[])
{
    struct termios  settings;
    int             masterfd;
    int             sending = 0;

    size_t         *written = NULL;

    unsigned char  *data, *next, *ends;
    size_t          size, total;

    char           *temp;

    setlocale(LC_ALL, "");

    if (argc != 2) {
        const char *argv0 = (argc > 0) ? argv[0] : "(this)";

        printf("\n");
        printf("Usage: %s [ -h | --help ]\n", argv0);
        printf("       %s SIZE\n", argv0);
        printf("\n");
        printf("This program creates a pseudoterminal,\n");
        printf("prints the path to the slave tty, and\n");
        printf("spouts data as fast as it can, in chunks\n");
        printf("of SIZE bytes.\n");
        printf("\n");
        printf("Send INT (by pressing Ctrl+C) or HUP signal\n");
        printf("to have the program exit, cleaning up the\n");
        printf("pseudoterminal.\n");
        printf("\n");

        return (argc == 2) ? EXIT_SUCCESS : EXIT_FAILURE;
    }

    if (install_done(SIGINT) ||
        install_done(SIGHUP) ||
        install_done(SIGTERM)) {
        fprintf(stderr, "Cannot install signal handlers: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    if (parse_size(argv[1], &size) || size < 1) {
        fprintf(stderr, "%s: Invalid chunk size.\n", argv[1]);
        return EXIT_FAILURE;
    }
    total = 0;

    data = malloc(size);
    if (!data) {
        fprintf(stderr, "Not enough memory for %zu byte chunks.\n", size);
        return EXIT_FAILURE;
    }
    next = data;
    ends = data;

    written = calloc(size + 1, sizeof written[0]);
    if (!written) {
        fprintf(stderr, "Not enough memory for write statistics.\n");
        free(data);
        return EXIT_FAILURE;
    }

    masterfd = posix_openpt(O_RDWR | O_NOCTTY);
    if (masterfd == -1) {
        fprintf(stderr, "Cannot create a pseudoterminal: %s.\n", strerror(errno));
        free(written);
        free(data);
        return EXIT_FAILURE;
    }
    if (grantpt(masterfd) == -1) {
        fprintf(stderr, "Cannot grant access to pseudoterminal slave: %s.\n", strerror(errno));
        close(masterfd);
        free(written);
        free(data);
        return EXIT_FAILURE;
    }
    if (unlockpt(masterfd) == -1) {
        fprintf(stderr, "Cannot unlock pseudoterminal slave: %s.\n", strerror(errno));
        close(masterfd);
        free(written);
        free(data);
        return EXIT_FAILURE;
    }

    memset(&settings, 0, sizeof settings);
    if (tcgetattr(masterfd, &settings) == -1) {
        fprintf(stderr, "Cannot obtain pseudoterminal master termios settings: %s.\n", strerror(errno));
        close(masterfd);
        free(written);
        free(data);
        return EXIT_FAILURE;
    }
    settings.c_iflag &= ~(BRKINT | PARMRK | INPCK | ISTRIP | INLCR | IGNCR | ICRNL | IXON | IXOFF);
    settings.c_iflag |= IGNBRK | IGNPAR;
    settings.c_oflag &= ~(OPOST | ONLCR | OCRNL | ONOCR | ONLRET | OFILL);
    settings.c_cflag &= ~(CSIZE | PARENB | HUPCL);
    settings.c_cflag |= CS8 | CREAD | CLOCAL;
    settings.c_lflag &= ~(ISIG | ICANON | ECHO | ECHOE | ECHOK | ECHONL | IEXTEN);
    settings.c_cc[VMIN] = 0;
    settings.c_cc[VTIME] = 0;
    tcsetattr(masterfd, TCSANOW, &settings);

    fcntl(masterfd, F_SETFL, O_NONBLOCK);

    /* Note: 'temp' is only temporary available; see 'man ptsname'. */
    temp = ptsname(masterfd);
    if (!temp || !*temp) {
        fprintf(stderr, "Cannot obtain pseudoterminal slave path.\n");
        close(masterfd);
        free(written);
        free(data);
        return EXIT_FAILURE;
    }
    printf("%s\n", temp);
    fflush(stdout);

    reset();
    next = data;
    ends = refill(next, size);
    memset(written, 0, (size + 1) * sizeof written[0]);
    total = 0;

    while (!done) {
        struct pollfd  fds[1];
        unsigned char  buf[1];
        ssize_t        n;

        fds[0].fd = masterfd;
        fds[0].events = (sending) ? POLLIN | POLLOUT : POLLIN;
        fds[0].revents = 0;

        poll(fds, 1, INTERVAL_MS);

        if (fds[0].revents & POLLIN) {
            if (read(masterfd, buf, 1) == 1) {
                if (buf[0] == request_stop && sending) {
                    size_t  i, ncalls = 0;

                    for (i = 1; i <= size; i++)
                        ncalls += written[i];

                    fflush(stderr);
                    printf("# Size Number of successful write() calls\n");
                    for (i = 1; i <= size; i++) {
                        if (written[i]) {
                            printf("%12zu %zu\n", i, written[i]);
                        }
                    }
                    fflush(stdout);
                    if (written[0] > 0) {
                        fprintf(stderr, "Stopped sending; sent %zu bytes total using %zu write() calls plus %zu attempts.\n", total, ncalls, written[0]);
                    } else {
                        fprintf(stderr, "Stopped sending; sent %zu bytes total using %zu write() calls.\n", total, ncalls);
                    }
                    fflush(stderr);
                    sending = 0;
                    total = 0;
                    reset();
                    next = data;
                    ends = refill(data, size);
                    memset(written, 0, (size + 1) * sizeof written[0]);
                    continue;
                } else
                if (buf[0] == request_start && !sending) {
                    fprintf(stderr, "Started sending.\n");
                    sending = 1;
                    /* Fallthrough */
                } else
                    continue;
            }
        }

        if (!sending || !(fds[0].revents & POLLOUT))
            continue;

        do {
            n = write(masterfd, next, (size_t)(ends - next));
            if (n > 0) {
                ++written[n];
                next += n;
                total += n;
                if (next >= ends) {
                    next = data;
                    ends = refill(next, size);
                }
            } else {
                ++written[0];
            }
        } while (n > 0);

        if (n == -1 && (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK)) {
            fprintf(stderr, "Stopped sending due to error: %s.\n", strerror(errno));
            sending = 0;
            total = 0;
            reset();
            next = data;
            ends = refill(data, size);
            memset(written, 0, (size + 1) * sizeof written[0]);
        }
    }

    close(masterfd);
    free(written);
    free(data);

    return EXIT_SUCCESS;
}

Considering the typical 64-bit USB packet size, I would be interested in what kind of results others get with 64-byte chunks.
In one window, compile and run the sender first:
Code:
gcc -Wall -O2 sender.c -o sender && ./sender 64
It will output the slave tty device name, typically /dev/pts/N . Make note of that, and compile and run the receiver in another window:
Code:
gcc -Wall -O2 receiver.c -pthread -o receiver && ./receiver /dev/pts/N 100M
You can stop them by pressing Ctrl+C in each window. The pseudoterminal (and memory-mapped file, if you use one) will be cleaned up.

The sequence sent and verified is the Xorshift64* with least significant byte first, seed 1 (although you can change that at compile time using -DSEED=seed ; for Xorshift64*, it must be a positive (nonzero) 64-bit integer.

On this HP Pavilion 11 x360 (Celeron N2820), the sender uses 1638720 64-byte writes plus 808 attempts; the receiver reaches about 76 Mbit/s (9.5 Mbytes/s) with typical reads returning some multiple of 64 bytes of data, between 1 and 4095 bytes in general.

Of course, this does not mean that on this machine 76 Mbit/s would be any kind of a limit. If I tell the sender to use 1024 byte chunks instead of 64 byte chunks, the receiver exceeds 480 Mbit/s. I suspect that on this slow machine, the number of syscalls (by the writer in particular) is the limiting factor, as context switches are relatively slow on this CPU. When receiving USB data, there should be much less overhead (assuming a machine with not too much CPU load).

Next, I'll implement the Teensy 4.0 Arduino sketch that works like the sender.c here.
 
No true Linux here - or a Win VM - did set up the WSL - but not found it to get Teensy's USB last I tried before not caring.

Note: T_4's 480 Mbps USB uses 512 Byte packets not 64 Byte in case that factors into your testing.
 
Ah, right; with 512 byte packets, even this old HP Pavilion 11 x360 can do 270 Mbits/s (34 Mbytes/s) locally. I bet an i5/i7 or a relatively recent AMD can do >480Mbits/s locally with 512 byte packets.
 
@nominal - I do have a Linux setup I could use here, being my old i7 desktop Ubuntu 18.04...

But currently tied up with other higher priority things (at least for me).

I will assume you believe your test may be working, for your localized programs. But I am not sure how well that translates to actually using the actual subsystems like the ACM Serial support. Maybe there is an assumption on one side or the other with when an IOCTL is used to the driver, or maybe what the driver does when it receives data faster than it can typically handle...

I mean it could be real simple things like, I ran into issues with Linux (RPI, BBK, UP...) with doing servo support for Dynamixel servos. At one point I was using a board that had an FTDI chip on it, which was adding a lot of latency. So I added a call to tcdrain to tell it to flush the data at the end of my write packet. This worked great. But then I went to a Teensy and the tcdrain killed the performance. My guess at the time was (is) that the FTDI driver had the IOCTL support to detect that the output queue was finished and it output everything, so it did that and returned. My guess was the ACM driver did not have this support so it just put in a long logical sleep. So my code now only does the drain if the device has a name like /dev/ttyUSB... Note: this was not teensy specific, I ran into it with other boards like the Robotis OpcnCM board as well.

Obviously the best way to detect the issues would probably hook it up the Teensy to your linux box through something like https://www.totalphase.com/products/beagle-usb480-power-standard/
But I don't have one of these, and my Logic Analyzer USB protocol stuff can only handle the slower speeds (1.5 and 12)

Hopefully Paul will get a chance at some point, but probably not for a long while with only the two of them and trying to release the T4.1 plus keep all of the boards in stock and parts to make the boards...

I will again try another attempt at localizing this at some point soon, but this is beyond my pay grade (free ;) )
 
No worries! Anyway, here is the Teensyduino sketch for Teensy 4.0 with USB Serial:
Code:
// SPDX-License-Identifier: CC0-1.0

#define  SEED  1
#define  SIZE  512

static uint64_t  state;
static uint64_t  left_state;
static int       left = 0;

static inline uint64_t next(void)
{
  uint64_t  x = state;
  x ^= x >> 12;
  x ^= x << 25;
  x ^= x >> 27;
  state = x;
  return x * UINT64_C(2685821657736338717);
}

static inline void reseed(void)
{
  state = SEED;
  left = 0;
  left_state = 0;
}

void refill(uint8_t *dest, size_t len)
{
  uint8_t *const ends = dest + len;

  while (left > 0 && dest < ends) {
    *(dest++) = left_state;
    left_state >>= 8;
    left--;
  }

  while (dest + 8 <= ends) {
    uint64_t temp = next();
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;
    temp >>= 8;
    *(dest++) = temp;    
  }

  if (dest < ends) {
    left = 8;
    left_state = next();
    while (dest < ends) {
      *(dest++) = left_state;
      left_state >>= 8;
      left--;
    }
  }  
}

constexpr uint8_t  request_start = 1;
constexpr uint8_t  request_stop = 0;

constexpr int      ledpin = 13;

uint8_t  data[SIZE];
bool     sending;

void setup() {
  Serial.begin(9600);
  pinMode(ledpin, OUTPUT);
  sending = false;
}

void loop() {
  if (Serial) {
    int request;
    
    while (Serial.available() && (request = Serial.read()) >= 0) {
      if (request == (int)request_start) {
        reseed();
        sending = true;
        digitalWriteFast(ledpin, HIGH);
      } else
      if (request == (int)request_stop) {
        sending = false;
        digitalWriteFast(ledpin, LOW);
      }
    }

    if (sending) {
      refill(data, SIZE);
      Serial.write(data, SIZE);
    }
    
  } else
  if (sending) {
    digitalWriteFast(ledpin, LOW);
    sending = false;
  }
}
This works without any issues on my Linux machine (Ubuntu 20.4 LTS, Lubuntu desktop; HP Pavilion 11 x360), with receive consistently reaching 25.7 Mbytes/sec (205.6 Mbits/sec for data payload) when using various[tt] SIZE [/tt]like 512, and 15 or 17 ("particularly good" and "particularly bad" values). The LED on the Teensy is lit when it is trying to send data.

(This is already 42% of the maximum theoretical bandwidth, by the way, and plenty enough for many data aquisition tasks.)

I have not seen any data failures or any other issues yet, even when interrupting and restarting the receiver program. (That means the issue is not in Teensyduino mishandling USB NAKs as I believed might be the case; it is something more complex, and very possibly in the end-user/host program handling.)

Interestingly, on the Linux end, I can see that some of the read()s returned just a few bytes, or a few bytes below/above a multiple of 1024. This is quite interesting, and indicates a possible inefficiency in the tty layer -- which is not a surprise considering the things it can do to the data transferred (CR-LF translation, among others); which in turn means that developing some USB bulk transfer support for 512-byte messages in Teensyduino (and corresponding OS-specific, native, Python interface modules) could be useful for us sensor/data acquisition/scientist folks, maybe.
 
But I am not sure how well that translates to actually using the actual subsystems like the ACM Serial support.
My ping-pong test shows that Teensy 4.0 has issues at high-bandwidth 1024-byte messages, but it does not tell us where the problem is.

When you suggested testing transfers in just one direction (and because I suspected a glitch when host responds with NAK to Teensy), I decided to implement a Teensy→Host verifying benchmark.
But first, I wanted to know what kind of bandwith one should expect (since in Linux this goes through the tty layer), and wrote the simulator uses the tty layer (POSIX pseudoterminal) to transfer the data between the two programs. On my particular older machine, with 512 byte chunks, they transferred around 30 Mbytes/sec (240 Mbits/s of payload data). I run at least 100M tests (100Mbytes transferred), by the way.

Then, I implemented the Teensy 4.0 sketch that produces the same stream (selectable at compile time by redefining SEED), letting Teensyduino USB layer manage how/when they are delivered to the host computer. Regardless of the chunk size (tested 15, 17, and 512 byte chunks), I got 25+ Mbytes/sec (200+ Mbits/s of payload) on the receiving end.

My original belief of the issue (related to NAK responses) was proven wrong, because this is very fast and utterly stable: no lockups or data errors, and over 200 Mbit/s data payload bandwidth.
While this did not tell us where the problem with the ping-pong test is, this tells us that high-bandwidth USB Serial Teensy→Host transfers (with occasional control messages the other way) is robust and works at least to 200 Mbit/s (of payload). Which is quite a lot.

If others could test the receiver and the Teensy sketch on a faster Linux machine -- this one is just a Celeron N2820 --, we could see if the bandwidth varies from computer to computer or not. If it does vary, then it is the tty layer or Linux kernel USB Serial driver that is the bottleneck. If it does not vary, it is limited by the Teensy and/or Teensyduino. This would be interesting to know, but not otherwise important.

However, if anyone can see any data verification errors (anything except "Verifying data .. no errors." on the last line of output), or transfer lockups, or errors, it would negate my conclusions in this message. I would prefer wider testing before asserting any kind of reliability for these conclusions.

I mean it could be real simple things like, I ran into issues with Linux (RPI, BBK, UP...) with doing servo support for Dynamixel servos. At one point I was using a board that had an FTDI chip on it, which was adding a lot of latency. So I added a call to tcdrain to tell it to flush the data at the end of my write packet. This worked great. But then I went to a Teensy and the tcdrain killed the performance. My guess at the time was (is) that the FTDI driver had the IOCTL support to detect that the output queue was finished and it output everything, so it did that and returned.
Or the FTDI driver only started flushing data and returned immediately, and the CDC-ACM driver waited until the send queue is empty, more likely.

There is a similar issue with e.g. fsync()/fdatasync(): if you want to maximize bandwidth – say, you are a mail transfer agent, and you want to ensure the saved messages hit storage before continuing, but you are transferring to many mailboxes in parallel –, you need to do that in a separate thread, because the call will block. With tcdrain(), you'd use a separate "drainer" thread per device, and a semaphore or a mutex and a condition variable, so that posting on the semaphore or signaling on the condition variable would be enough to start the drain, without blocking the main thread. (If I recall the details correctly, that is!)

Having now looked at the tty layer internals, I think it would be nice if one had a CDC-ACM driver with a non-TTY character device interface, closer to socket semantics, now that devices start to support really high transfer rates. The TTY layer and termios interface definitely was not intended for high-bandwidth bulk data transfers.

But yes, I do completely agree with you and Defragster.

Simply put, I have not proven a problem, only showed one problematic case (ping-pong with 1024+ byte messages) and speculated for its causes. With the receiver test program above and the Teensy 4.0 sketch, I've shown that at least on my machine, the Teensy→Host USB Serial can do 25+ Mbytes/sec (200 Mbits/s of payload) very robustly without issues. With the sender program, anyone can check the local bandwidth over the tty layer on their own Linux machine, as a sort of a sanity check on the Teensy sketch.

I think my next step is to write a better Teensy sketch, using CDC-ACM for bulk data, and HID (bidirectional) for command and status transfers; and a paranoid-careful host side. Hopefully that will either pinpoint the problematic situation, or clear the issue completely (as a host-side programming or tty layer issue).

My end goal is to find the coding pattern (in a Teensy sketch, and in a Linux program using termios) that provides robust high-bandwidth data transfers over USB Serial, between a Teensy 4.0 and a Linux host.
 
Status
Not open for further replies.
Back
Top