Teensy 3.2 USB Serial "non-blocking" write isn't really non-blocking

CraigF

Well-known member
My teensy 3.2 device has an ISR that reads a sensor and writes to a circular inBuffer that handles overflow gracefully. A main loop processes data from the inBuffer, triggers some time-sensitive external actions, and writes logging output to a circular outBuffer that also overflows gracefully. The last step in the main loop is a non-blocking write to the USB Serial port, which is read by a host PC.

Everything runs fine with data rates of 1 - 2 Mbps and no problems for days on end, until the host PC gets distracted and fails to read the serial port for some period of time. At that point my Serial.Write call intermittently blocks ("forever" or until the PC begins reading again) even though the code is written to be non-blocking. And of course, blocking is a disaster in my scenario.

Here's where the code blocks:

Code:
const int serialWriteTimeout = 1000;	// microseconds
void SendDataToHost()
{
	elapsedMicros serialWriteTime;
	while (serialWriteTime < serialWriteTimeout)
	{
		int bytesAvail = outBuffer.getCountAvailForRead();
		if (bytesAvail == 0)
			break;	// quit when there's no remaining data to send
		int serialOutAvail = Serial.availableForWrite();	// normally this will be 64
		if (serialOutAvail == 0)	
			continue;
		int bytesWanted = (bytesAvail > serialOutAvail) ? serialOutAvail : bytesAvail;
		int bytesToSend = outBuffer.readBytes(USB_OutBuffer, bytesWanted);
		Serial.write(USB_OutBuffer, bytesToSend);	
                // on the teensy this shouldn't block because the serial buffer has promised there's room for this data
                // Unfortunately, it DOES block, at least some of the time.
	}	
}

After convincing myself this was indeed the problem, I looked into the serial implementation and found that in Nov 2014 Paul added a ToDo in the code:
Code:
int usb_serial_write_buffer_free(void)
{
	uint32_t len;

	tx_noautoflush = 1;
	if (!tx_packet) {
		if (!usb_configuration ||
		  usb_tx_packet_count(CDC_TX_ENDPOINT) >= TX_PACKET_LIMIT ||
		  (tx_packet = usb_malloc()) == NULL) {
			tx_noautoflush = 0;
			return 0;
		}
	}
	len = CDC_TX_SIZE - tx_packet->index;
	// TODO: Perhaps we need "usb_cdc_transmit_flush_timer = TRANSMIT_FLUSH_TIMEOUT"
	// added here, so the SOF interrupt can't take away the available buffer
	// space we just promised the user could write without blocking?
	// But does this come with other performance downsides?  Could it lead to
	// buffer data never actually transmitting in some usage cases?  More
	// investigation is needed.
	// https://github.com/PaulStoffregen/cores/issues/10#issuecomment-61514955
	tx_noautoflush = 0;
	return len;
}

This (loss of atomicity) is the likely mechanism causing my non-blocking code to block. My questions are:

Question #1: Is there anything that can be done purely in user code to achieve reliably non-blocking data transmission?

Question #2: Has anybody else already investigated this?

Question #3: I have added the line suggested in Paul's note and Serial.Write now appears to behave as expected/desired. Can anybody explain what kinds of scenarios might cause issues (IOW, how to test).

Question #4: Is there a best practice for using a modified library while still applying bulk library updates periodically?

Thanks!

-- Craig
 
Last edited:
Everything runs fine with data rates of 1 - 2 Mbps and no problems for days on end, until the host PC gets distracted and fails to read the serial port for some period of time. At that point my Serial.Write call intermittently blocks ("forever" or until the PC begins reading again) even though the code is written to be non-blocking.

I would expect this to happen
you write non-blocking into USB buffer, but if the PC is not reading the data the buffer may fill up, and then?
You have to decide, what to do
drop data or wait until PC drains data again.
It is a well known problem in all real time application.
My own preferred solution is to drop the data, (this requires that the data are time-stamped, so you can see the dropout)
For this you need the possibility to read the buffer status before writing data to USB buffer.

if
Code:
void usb_tx_isochronous(uint32_t endpoint, void *data, uint32_t len)
does that I do not know.
If not maybe a method could be added to usb_dev.c
 
Yeah, this is a really tough situation. That's why I left that TODO in the code...

The possible downside to adding that line is certain behavior from your code could create a deadlock condition, where you're waiting for a certain set of conditions, but the act of waiting prevents them from happening. Even if your code never does that, I need to consider these things carefully before publishing for everyone to use.
 
Last edited:
I would expect this to happen
you write non-blocking into USB buffer, but if the PC is not reading the data the buffer may fill up, and then?
You have to decide, what to do
drop data or wait until PC drains data again.
It is a well known problem in all real time application.
My own preferred solution is to drop the data, (this requires that the data are time-stamped, so you can see the dropout)
For this you need the possibility to read the buffer status before writing data to USB buffer.

if
Code:
void usb_tx_isochronous(uint32_t endpoint, void *data, uint32_t len)
does that I do not know.
If not maybe a method could be added to usb_dev.c

Thanks for the response.

My device can handle the scenario where the PC is not listening -- as described in my original post, I am applying some algorithms and delivering real-world outputs directly from the device, and you can think of the link to the PC as being a logging function. The device simply throws away data when it runs out of buffer space. So I can tolerate the PC not listening, but I *cannot* tolerate my teensy code blocking on a "non-blocking write" because that shuts down the primary function of the unit. :cool:
 
Last edited:
Yeah, this is a really tough situation. That's why I left that TODO in the code...

The possible downside to adding that line is certain behavior from your code could create a deadlock condition, where you're waiting for a certain set of conditions, but the act of waiting prevents them from happening. Even if your code never does that, I need to consider these things carefully before publishing for everyone to use.

Hi Paul. Thanks for responding, and thanks for making this processor available in such a great form factor with so much library support!

After several days of testing I've concluded that the current library is not non-blocking in any meaningful sense of the word. It will honor the non-blocking behavior some of the time, but will block within 2000 msec 100% of the time if the PC stutters and then takes a long timeout (e.g., the stutters and timeouts commonly caused by Adobe Acrobat Reader).

Could you please give me a tiny clue more than you have in your comment? Under what conditions would my querying Serial.availableForWrite() create a deadlock? I have a hard time picturing it, since I'm not requesting any locks and the library has complete control of any internal locks... As you can see in the code that I posted, I merely check bytes available for write on each loop. The loop time is typically about 15 usec, but every 10th loop or so it will be 15,000 usec (because an aliquot of data was received and needs to be processed). I could check less frequently, if that's the potential problem.
 
...
Could you please give me a tiny clue more than you have in your comment? Under what conditions would my querying Serial.availableForWrite() create a deadlock? I have a hard time picturing it, since I'm not requesting any locks and the library has complete control of any internal locks... As you can see in the code that I posted, I merely check bytes available for write on each loop. The loop time is typically about 15 usec, but every 10th loop or so it will be 15,000 usec (because an aliquot of data was received and needs to be processed). I could check less frequently, if that's the potential problem.

I read Paul's comment as regarding a general solution in place of the TODO. If the user code never writes when space is not available ( by querying Serial.availableForWrite() ) then there will be no blocking and the user code can decide when to discard data that won't fit. The general solution would be at the lower level of the TODO and that is where the Catch_22 comes in causing potential for deadlock.
 
I read Paul's comment as regarding a general solution in place of the TODO. If the user code never writes when space is not available ( by querying Serial.availableForWrite() ) then there will be no blocking and the user code can decide when to discard data that won't fit. The general solution would be at the lower level of the TODO and that is where the Catch_22 comes in causing potential for deadlock.

ummm -- perhaps I've misunderstood what's going on, but from what I can see the situation is a little different. If you look at my code you will see that I never write when space is not available. Nonetheless, the Serial.write call consistently blocks whenever the overall throughput is moderately high and the receiving machine hiccups.

The reason it happens is that the library code as written is not atomically "thread-safe," so the service IRQ can give away a buffer between the time it was promised to me and the time I try to write to it. In that case the code will block.

I put in place the library fix that Paul suggested (three years ago in his ToDo) and thus far it is working perfectly. However, I'm still concerned that I may encounter some other failure mode, since I really don't understand the special circumstances under which my code could result in a Serial.write deadlock. :cool:
 
I've put this on my list of bugs to fix. It certainly is a bug in Teensyduino's core library.

Can't promise a time frame, and realistically it may be weeks or months, but it's now on my list.

After several days of testing I've concluded that the current library is not non-blocking in any meaningful sense of the word. It will honor the non-blocking behavior some of the time, but will block within 2000 msec 100% of the time if the PC stutters and then takes a long timeout (e.g., the stutters and timeouts commonly caused by Adobe Acrobat Reader).

Normally I don't investigate issues without a complete program. In this case I will. But still, when I do work on this, all I have to go on is the code fragment you posted. I'm going to have to fill in the rest with guesswork... and in this scenario I'm guessing you've probably got data streaming in both directions? Maybe?

If you can give me anything more to go on, it'll save me time when I get around to this bug, and it'll improve the changes I come up with a fix that really addresses your circumstances.
 
I've put this on my list of bugs to fix. It certainly is a bug in Teensyduino's core library.

Can't promise a time frame, and realistically it may be weeks or months, but it's now on my list.



Normally I don't investigate issues without a complete program. In this case I will. But still, when I do work on this, all I have to go on is the code fragment you posted. I'm going to have to fill in the rest with guesswork... and in this scenario I'm guessing you've probably got data streaming in both directions? Maybe?

If you can give me anything more to go on, it'll save me time when I get around to this bug, and it'll improve the changes I come up with a fix that really addresses your circumstances.


I'll post a complete repro.

Only reason I didn't do so before is that I was specifically asking about the safety of the proposed fix or any user-code workarounds (e.g., wait until exactly 64 bytes are available).

In case you find yourself thinking about it before I post the repro, the Teensy side is straightforward: timer interrupt places 32K bytes/sec into a circBuffer, teensy main loop listens (non-blocking) for a single control character to End Program, and streams Serial.write (non-blocking) whenever data is available.

Host side is slightly harder to repro -- my organic host glitches are induced by a deadlock interaction between Visual Studio, NetGenie, Akamai, and Adobe Acrobat, but I'll write a host-side simulator.


Weeks or months is fine, because the ToDo fix has unblocked me completely. I just don't want to find myself scratching my head over some seemingly unrelated glitch once the device is in the field :)

Thanks!

-- Craig
 
Last edited:
I've been thinking on this for the last few days. My initial plan is probably going to involve adding a flag to remember an availableForWrite promise has been made. Later when the packet is to be automatically committed to send, if the flag is set the SOF interrupt code will do an extra check to allocate a blank buffer and make sure the promise isn't broken. I might also make the availableForWrite() promise a little more conservative, perhaps promise one less buffer if there are a certain number available, so when it comes time to make sure the promise it kept there will be a higher likelihood the partially full packet can be sent without rescheduling for a future SOF.
 
I'll post a complete repro.

Planning to look at this issue over the weekend, and (hopefully) get a fix into 1.36-beta3. If things slip (going to freeze most changes after beta3), this issue will remain in the 1.36 release and probably get fixed for 1.37 or 1.38 later this year.

If you've got any code to share which reproduces the problem, it'd save me some time and improve the odds I'll get this slipped into 1.36-beta3 this weekend.
 
Paul, my design depends on "Serial(1,2,3).availableForWrite()" returning a believable commited value so the various serial service routines can run un-blocked.

As of 6/29/2020, has it been fixed?
 
Hi Paul,

I am developping a similar application with a Teensy 4.0 and I have the exact same issue as described by CraigF: Every ~2ms, the Serial.write() function blocks the code for ~5µs even though I took care to use the Serial.availableForWrite() function before.

I have few questions:
  • Is it the same issue than in the Teensy 3.6?
  • If so, could you please give me more details about how to apply the fix you suggested? I don't really understand this:


You could try adding that line mentioned in the comment. Or you could just subtract 64, or a 63 for a worst case partial packet of 1 byte which got taken away by the automatic transmit flushing.

Thanks
 
Serial.availableForWrite() should work on Teensy 4.0. If you've found a circumstance where it is not reliable, please start a new thread (this very old thread is about Teensy 3.2) and post a complete test case for reproducing the problem.

Sadly, on Teensy 3.x, Serial.availableForWrite() can be tricky because USB buffers are shared between all endpoints. The USB implementation is completely different on Teensy 3 vs 4. Teensy 4 has dedicated buffers for each endpoint. Best to open a new thread if you're having this problem on Teensy 4.0. Again, please remember you need to show a complete program and steps to reproduce the problem.
 
Back
Top