Teensy 4.0 USB serial problem - hangs after some amount of data

Status
Not open for further replies.

goodney

Member
Just got a Teensy 4.0 today, Yay! But already found an issue ;-(

I started with the code from here: https://github.com/benkrasnow/T4_USB_speedtest

Specifically the T4_send_data. If I program this to the Teensy, then run on my Mac:

Code:
$ cat /dev/cu.usbmodem64919501 |hexdump

This will hang after some amount of data. I simplified the code to the following:

Code:
// USB Virtual Serial Receive Speed Benchmark
//
// This program sends data as rapidly as possible



// use one of these to define
// the USB virual serial name
//
#define USBSERIAL Serial      // Arduino Leonardo, Teensy, Fubarino
//#define USBSERIAL SerialUSB   // Arduino Due, Maple

char buf[64];

void setup() {
  USBSERIAL.begin(9600);
  USBSERIAL.setTimeout(0);
  for(int i=0;i<64;i++)
  {
    buf[i] = i;
  }
}


void loop() {
  
  while (1) {
    USBSERIAL.write(buf, 64);
  }
  
}

After somewhere between 5 and 10 megabytes of transfer, it hangs. Sometimes as much as 40 MB will transfer.

ctrl-c and then re-run will work. Resetting the Teensy is not necessary. Verified on two different Macs. One 10.14.6 one 10.13.4.
 
Known problem of too much speed.

Due to the much higher output from the Teensy 4.0's 480 Mbit USB ( currently pushing out perhaps 7 MB/sec ) it is a known problem where it can overwhelm the PC - more so on MAC's as discussed on other threads - IIRC the T4_Beta thread. Prior Teensy max output rate was closer to 1 MB/s and even that on occasion when not limited could overwhelm a PC depending on the efficiency of the code on the receving end.

Paul worked a great deal to speed the IDE Sermon handling of the data throughput but there is a limit to what the computer can handle. The solution is to limit the output rate on the Teensy side.

This overwhelming the PC will only get worse as the Teensy 4.0 sustained output rate can double at least once more as the USB code is further refined/optimized.
 
Sorry, but "over whelming the PC" is not a thing. There are many devices that transfer continuously over the USB bus at high-speed and do not hang (think audio devices or video capture). I have other microcontrollers that I've left transferring data over USB serial for days at a time with no problem.

Can Paul weigh in here? Is this a PC side or Teensy side problem? It seems like some sort of race condition on the Teensy side. If possible I'd like as technical an explanation as possible so I can help fix it if necessary!
 
Which version of Teensyduino are you using. Click Arduino > About to check.

Version 1.48 was released just yesterday, after weeks of beta testing. One of the major changes in 1.48 is this work to make the serial monitor much more efficient.

https://www.pjrc.com/improving-arduino-serial-monitor-performance/

Can Paul weigh in here?

Sure I can. I hope you'll take a moment to read or at least skim that blog article. Maybe look over some of the prior threads on this forum also? The speedup work has been a huge effort that reaches deeply into seldom traveled parts of Java.

But, I'm sad to say, the results on Macintosh are not nearly as good as Windows and Linux. I have indeed poured many hours into the Mac specific code, and I'm afraid it seems there's little more I can do.

The remaining problems on Mac appear to be 2 issues.

1: Apple CDC serial driver, or perhaps other USB stuff in the kernel, appears to consume a lot of CPU usage at these higher speeds. If you run Applications > Utilities > Activity Monitor while Arduino with Teensyduino 1.48 is receiving maximum speed data, and watch the little scrolling graph at the bottom of the window, you'll see the lion's share of the CPU usage is within the OS kernel.

2: Java appears to run less efficiently on Mac than it does on Windows. This is a lesser problem than the excessive CPU use by something in Apple's kernel code, but still a contributing factor. I know this because I've done many tests with the same Java JAR files running on the same Macbook Air using Apple Bootcamp to reboot into Windows 10. The Java JRE on Windows 10 uses much less CPU time to run the exact same Java code doing approximately the same work.

If you're running version 1.47, first upgrade to 1.48 so you'll get all the work I've already done. It does make a huge improvement.

But once you're at 1.48, I'm afraid that's probably as efficient as it's going to get... unless Apple makes improvements in the part of MacOS that's consuming so much CPU use within the kernel, or Oracle optimizes the Java runtime better for MacOS (or maybe if Arduino someday switches to a non-Oracle JRE). I've improved everything else, everything where I can control the code, as much as I can. I just can't do anything about Apple's drivers and Oracle's JRE.


It seems like some sort of race condition on the Teensy side.

The Teensy side is currently running much slower than I could. I have been putting off optimization work on the Teensy side, since the Java-based PC side has been so terrible. Now with 1.48, I'm going to soon start optimizing the Teensy side more.

For Macintosh, I'm afraid that's not great news. But with 1.48, Windows and Linux handle the speed very well... and that speed will soon become much faster.
 
Hi Paul:

I downloaded the code today, so I'm on 1.8.10 with 1.48.

Thanks for the more technical description. It's too bad that you've had to limit the CDC speed, the application I have in mind will benefit from all the speed possible so don't hold back.

But as you can see from my example, I'm not running in Java or inside the Arduino studio, I'm running on the command line.

So the take-away is that this is in the AppleCDC driver? Interesting...

Thanks for the feedback, I'll test on Linux.
 
My post was a bit nebulous from seeing this in OP::
$ cat /dev/cu.usbmodem64919501 |hexdump
IDE SerMon not in use using that.

Trying with the Teensy_sermon from TD 1.48 will show what Paul did - as is indicated in the linked blog post with great details.

Also alternate types of USB interface "(think audio devices or video capture)" have special case code to handle those at expected rates with special case protocols and handlers - versus a generic data interface. Do those even stream at 7 MB/sec? TV and movies come from ISP at under 4 Mbit/sec.
 
I've read the blog post, but in general, I don't care about the serial monitor. I care about what Paul said at the end of the post:

Eventually more microcontrollers will appear on the market with 480 Mbit or faster USB, and fast enough processors and USB code capable of sustained printing at these speeds and perhaps much faster.

I hope you agree that a modern PC should be able to process incoming data at 480 Mbit, regardless of the source. We (microcontroller firmware developers) shouldn't have to worry about *what* our device is connected to. The spec says 480 Mbit, we should be able to send 480 Mbit.

I'm trying to understand the parameters of the problem so I can help solve it, and right now it looks like a race condition or timing issue in the AppleCDC driver or somewhere else in the kernel. Mac OS shouldn't hang here, especially since the timing of the USB transfers come from the host.

My current thought is to try and reproduce this using a Raspberry Pi or similar in OTG mode with a CDC gadget. If I can do that, maybe I can submit a cogent of enough bug report to Apple.
 
As a follow on: can Teensy become an Audio Class device?

And/or is there support in Teensyduino for implementing our own USB device personalities?
 
but in general, I don't care about the serial monitor.
....
I'm trying to understand the parameters of the problem so I can help solve it

Ok, sounds like you're familiar with using the command line.

On your Mac, open a terminal and change to the Arduino.app folder. Then go into Contents/Java/hardware/tools. There you will find a command line program called "teensy_serialmon". It expects 1 input, which is the name show in the lower right corner of the Arduino IDE. For example, on my Macbook Air is shows "usb:14200000". So the command line would be:

Code:
./teensy_serialmon usb:14200000

When you run this, it will connect to the Teensy at that USB location and print everything the Teensy sends to your Mac to stdout.

Before you run this, make sure you have Activity Monitor running and visible. When I've run this on my Macbook Air, the total CPU usage is even worse for Terminal than having the Arduino IDE display the data!


As a follow on: can Teensy become an Audio Class device?

Teensy 3.2, 3.5, 3.6 can.

Today this is not yet supported on Teensy 4.0, but I will be adding it "soon".


And/or is there support in Teensyduino for implementing our own USB device personalities?

That really depends on what you consider "support".

Start by looking at usb_desc.h and usb_desc.c. If you're using MacOS, remember these are inside the application bundle, so control-click Arduino and "show package contents" to access the files and folders in the bundle. Read the comments in those files to get started.

A few people have published custom USB stuff. One that emulates Xbox HID is pretty popular. It's come up many times on this forum. Maybe you can find it with a little searching? Look for "xinput".
 
> "...There are many devices that transfer continuously over the USB bus at high-speed and do not hang..." I'd be interested if there are any examples using the USB device type Teensy 4 currently is. It's my guess that T4 is currently the very fastest of it's type and it is pushing some host code in ways the developers have not done. That's just a guess though.

I can tell you that my original Saleae Logic analyzer often has trouble at 24 MHz sampling and has to slow down, even on a high-end PC. Even using all 8 bits that's 192 Mbps. No doubt there is packet overhead but it seems well short of the theoretical 480 Mbps of USB2. Maybe the current stuff does better, but back in my video editing days, I had nothing but trouble with the early USB2 video digitizers, they would frequently lock up the app or the entire PC.
 
Just an FYI in case anyone comes across this thread. It is definitely a timing issue on the Mac OS side. For example the following command works just fine and achieves ~7.5MB/s
Code:
dd if=/dev/cu.usbmodem64919501 ibs=512 obs=64
Reading in chunks of 512 and outputting in chunks of 64 will of course tweak the timing (making it slightly slower).

tl;dr On Mac OS we have to slow things down :(
 
Status
Not open for further replies.
Back
Top