Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 11 of 11

Thread: Teensy 3.2 USB Serial "non-blocking" write isn't really non-blocking

  1. #1
    Member
    Join Date
    Apr 2016
    Location
    Honolulu, Hawaii
    Posts
    22

    Teensy 3.2 USB Serial "non-blocking" write isn't really non-blocking

    My teensy 3.2 device has an ISR that reads a sensor and writes to a circular inBuffer that handles overflow gracefully. A main loop processes data from the inBuffer, triggers some time-sensitive external actions, and writes logging output to a circular outBuffer that also overflows gracefully. The last step in the main loop is a non-blocking write to the USB Serial port, which is read by a host PC.

    Everything runs fine with data rates of 1 - 2 Mbps and no problems for days on end, until the host PC gets distracted and fails to read the serial port for some period of time. At that point my Serial.Write call intermittently blocks ("forever" or until the PC begins reading again) even though the code is written to be non-blocking. And of course, blocking is a disaster in my scenario.

    Here's where the code blocks:

    Code:
    const int serialWriteTimeout = 1000;	// microseconds
    void SendDataToHost()
    {
    	elapsedMicros serialWriteTime;
    	while (serialWriteTime < serialWriteTimeout)
    	{
    		int bytesAvail = outBuffer.getCountAvailForRead();
    		if (bytesAvail == 0)
    			break;	// quit when there's no remaining data to send
    		int serialOutAvail = Serial.availableForWrite();	// normally this will be 64
    		if (serialOutAvail == 0)	
    			continue;
    		int bytesWanted = (bytesAvail > serialOutAvail) ? serialOutAvail : bytesAvail;
    		int bytesToSend = outBuffer.readBytes(USB_OutBuffer, bytesWanted);
    		Serial.write(USB_OutBuffer, bytesToSend);	
                    // on the teensy this shouldn't block because the serial buffer has promised there's room for this data
                    // Unfortunately, it DOES block, at least some of the time.
    	}	
    }
    After convincing myself this was indeed the problem, I looked into the serial implementation and found that in Nov 2014 Paul added a ToDo in the code:
    Code:
    int usb_serial_write_buffer_free(void)
    {
    	uint32_t len;
    
    	tx_noautoflush = 1;
    	if (!tx_packet) {
    		if (!usb_configuration ||
    		  usb_tx_packet_count(CDC_TX_ENDPOINT) >= TX_PACKET_LIMIT ||
    		  (tx_packet = usb_malloc()) == NULL) {
    			tx_noautoflush = 0;
    			return 0;
    		}
    	}
    	len = CDC_TX_SIZE - tx_packet->index;
    	// TODO: Perhaps we need "usb_cdc_transmit_flush_timer = TRANSMIT_FLUSH_TIMEOUT"
    	// added here, so the SOF interrupt can't take away the available buffer
    	// space we just promised the user could write without blocking?
    	// But does this come with other performance downsides?  Could it lead to
    	// buffer data never actually transmitting in some usage cases?  More
    	// investigation is needed.
    	// https://github.com/PaulStoffregen/cores/issues/10#issuecomment-61514955
    	tx_noautoflush = 0;
    	return len;
    }
    This (loss of atomicity) is the likely mechanism causing my non-blocking code to block. My questions are:

    Question #1: Is there anything that can be done purely in user code to achieve reliably non-blocking data transmission?

    Question #2: Has anybody else already investigated this?

    Question #3: I have added the line suggested in Paul's note and Serial.Write now appears to behave as expected/desired. Can anybody explain what kinds of scenarios might cause issues (IOW, how to test).

    Question #4: Is there a best practice for using a modified library while still applying bulk library updates periodically?

    Thanks!

    -- Craig
    Last edited by CraigF; 02-17-2017 at 06:52 AM.

  2. #2
    Senior Member
    Join Date
    Jul 2014
    Posts
    1,381
    Quote Originally Posted by CraigF View Post
    Everything runs fine with data rates of 1 - 2 Mbps and no problems for days on end, until the host PC gets distracted and fails to read the serial port for some period of time. At that point my Serial.Write call intermittently blocks ("forever" or until the PC begins reading again) even though the code is written to be non-blocking.
    I would expect this to happen
    you write non-blocking into USB buffer, but if the PC is not reading the data the buffer may fill up, and then?
    You have to decide, what to do
    drop data or wait until PC drains data again.
    It is a well known problem in all real time application.
    My own preferred solution is to drop the data, (this requires that the data are time-stamped, so you can see the dropout)
    For this you need the possibility to read the buffer status before writing data to USB buffer.

    if
    Code:
    void usb_tx_isochronous(uint32_t endpoint, void *data, uint32_t len)
    does that I do not know.
    If not maybe a method could be added to usb_dev.c

  3. #3
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    15,159
    Yeah, this is a really tough situation. That's why I left that TODO in the code...

    The possible downside to adding that line is certain behavior from your code could create a deadlock condition, where you're waiting for a certain set of conditions, but the act of waiting prevents them from happening. Even if your code never does that, I need to consider these things carefully before publishing for everyone to use.
    Last edited by PaulStoffregen; 02-17-2017 at 09:21 AM.

  4. #4
    Member
    Join Date
    Apr 2016
    Location
    Honolulu, Hawaii
    Posts
    22

    Cool

    Quote Originally Posted by WMXZ View Post
    I would expect this to happen
    you write non-blocking into USB buffer, but if the PC is not reading the data the buffer may fill up, and then?
    You have to decide, what to do
    drop data or wait until PC drains data again.
    It is a well known problem in all real time application.
    My own preferred solution is to drop the data, (this requires that the data are time-stamped, so you can see the dropout)
    For this you need the possibility to read the buffer status before writing data to USB buffer.

    if
    Code:
    void usb_tx_isochronous(uint32_t endpoint, void *data, uint32_t len)
    does that I do not know.
    If not maybe a method could be added to usb_dev.c
    Thanks for the response.

    My device can handle the scenario where the PC is not listening -- as described in my original post, I am applying some algorithms and delivering real-world outputs directly from the device, and you can think of the link to the PC as being a logging function. The device simply throws away data when it runs out of buffer space. So I can tolerate the PC not listening, but I *cannot* tolerate my teensy code blocking on a "non-blocking write" because that shuts down the primary function of the unit.
    Last edited by CraigF; 02-20-2017 at 06:24 AM.

  5. #5
    Member
    Join Date
    Apr 2016
    Location
    Honolulu, Hawaii
    Posts
    22
    Quote Originally Posted by PaulStoffregen View Post
    Yeah, this is a really tough situation. That's why I left that TODO in the code...

    The possible downside to adding that line is certain behavior from your code could create a deadlock condition, where you're waiting for a certain set of conditions, but the act of waiting prevents them from happening. Even if your code never does that, I need to consider these things carefully before publishing for everyone to use.
    Hi Paul. Thanks for responding, and thanks for making this processor available in such a great form factor with so much library support!

    After several days of testing I've concluded that the current library is not non-blocking in any meaningful sense of the word. It will honor the non-blocking behavior some of the time, but will block within 2000 msec 100% of the time if the PC stutters and then takes a long timeout (e.g., the stutters and timeouts commonly caused by Adobe Acrobat Reader).

    Could you please give me a tiny clue more than you have in your comment? Under what conditions would my querying Serial.availableForWrite() create a deadlock? I have a hard time picturing it, since I'm not requesting any locks and the library has complete control of any internal locks... As you can see in the code that I posted, I merely check bytes available for write on each loop. The loop time is typically about 15 usec, but every 10th loop or so it will be 15,000 usec (because an aliquot of data was received and needs to be processed). I could check less frequently, if that's the potential problem.

  6. #6
    Senior Member defragster's Avatar
    Join Date
    Feb 2015
    Posts
    4,694
    Quote Originally Posted by CraigF View Post
    ...
    Could you please give me a tiny clue more than you have in your comment? Under what conditions would my querying Serial.availableForWrite() create a deadlock? I have a hard time picturing it, since I'm not requesting any locks and the library has complete control of any internal locks... As you can see in the code that I posted, I merely check bytes available for write on each loop. The loop time is typically about 15 usec, but every 10th loop or so it will be 15,000 usec (because an aliquot of data was received and needs to be processed). I could check less frequently, if that's the potential problem.
    I read Paul's comment as regarding a general solution in place of the TODO. If the user code never writes when space is not available ( by querying Serial.availableForWrite() ) then there will be no blocking and the user code can decide when to discard data that won't fit. The general solution would be at the lower level of the TODO and that is where the Catch_22 comes in causing potential for deadlock.

  7. #7
    Member
    Join Date
    Apr 2016
    Location
    Honolulu, Hawaii
    Posts
    22
    Quote Originally Posted by defragster View Post
    I read Paul's comment as regarding a general solution in place of the TODO. If the user code never writes when space is not available ( by querying Serial.availableForWrite() ) then there will be no blocking and the user code can decide when to discard data that won't fit. The general solution would be at the lower level of the TODO and that is where the Catch_22 comes in causing potential for deadlock.
    ummm -- perhaps I've misunderstood what's going on, but from what I can see the situation is a little different. If you look at my code you will see that I never write when space is not available. Nonetheless, the Serial.write call consistently blocks whenever the overall throughput is moderately high and the receiving machine hiccups.

    The reason it happens is that the library code as written is not atomically "thread-safe," so the service IRQ can give away a buffer between the time it was promised to me and the time I try to write to it. In that case the code will block.

    I put in place the library fix that Paul suggested (three years ago in his ToDo) and thus far it is working perfectly. However, I'm still concerned that I may encounter some other failure mode, since I really don't understand the special circumstances under which my code could result in a Serial.write deadlock.

  8. #8
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    15,159
    I've put this on my list of bugs to fix. It certainly is a bug in Teensyduino's core library.

    Can't promise a time frame, and realistically it may be weeks or months, but it's now on my list.

    Quote Originally Posted by CraigF View Post
    After several days of testing I've concluded that the current library is not non-blocking in any meaningful sense of the word. It will honor the non-blocking behavior some of the time, but will block within 2000 msec 100% of the time if the PC stutters and then takes a long timeout (e.g., the stutters and timeouts commonly caused by Adobe Acrobat Reader).
    Normally I don't investigate issues without a complete program. In this case I will. But still, when I do work on this, all I have to go on is the code fragment you posted. I'm going to have to fill in the rest with guesswork... and in this scenario I'm guessing you've probably got data streaming in both directions? Maybe?

    If you can give me anything more to go on, it'll save me time when I get around to this bug, and it'll improve the changes I come up with a fix that really addresses your circumstances.

  9. #9
    Member
    Join Date
    Apr 2016
    Location
    Honolulu, Hawaii
    Posts
    22
    Quote Originally Posted by PaulStoffregen View Post
    I've put this on my list of bugs to fix. It certainly is a bug in Teensyduino's core library.

    Can't promise a time frame, and realistically it may be weeks or months, but it's now on my list.



    Normally I don't investigate issues without a complete program. In this case I will. But still, when I do work on this, all I have to go on is the code fragment you posted. I'm going to have to fill in the rest with guesswork... and in this scenario I'm guessing you've probably got data streaming in both directions? Maybe?

    If you can give me anything more to go on, it'll save me time when I get around to this bug, and it'll improve the changes I come up with a fix that really addresses your circumstances.

    I'll post a complete repro.

    Only reason I didn't do so before is that I was specifically asking about the safety of the proposed fix or any user-code workarounds (e.g., wait until exactly 64 bytes are available).

    In case you find yourself thinking about it before I post the repro, the Teensy side is straightforward: timer interrupt places 32K bytes/sec into a circBuffer, teensy main loop listens (non-blocking) for a single control character to End Program, and streams Serial.write (non-blocking) whenever data is available.

    Host side is slightly harder to repro -- my organic host glitches are induced by a deadlock interaction between Visual Studio, NetGenie, Akamai, and Adobe Acrobat, but I'll write a host-side simulator.


    Weeks or months is fine, because the ToDo fix has unblocked me completely. I just don't want to find myself scratching my head over some seemingly unrelated glitch once the device is in the field

    Thanks!

    -- Craig
    Last edited by CraigF; 02-21-2017 at 09:16 PM.

  10. #10
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    15,159
    I've been thinking on this for the last few days. My initial plan is probably going to involve adding a flag to remember an availableForWrite promise has been made. Later when the packet is to be automatically committed to send, if the flag is set the SOF interrupt code will do an extra check to allocate a blank buffer and make sure the promise isn't broken. I might also make the availableForWrite() promise a little more conservative, perhaps promise one less buffer if there are a certain number available, so when it comes time to make sure the promise it kept there will be a higher likelihood the partially full packet can be sent without rescheduling for a future SOF.

  11. #11
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    15,159
    Quote Originally Posted by CraigF View Post
    I'll post a complete repro.
    Planning to look at this issue over the weekend, and (hopefully) get a fix into 1.36-beta3. If things slip (going to freeze most changes after beta3), this issue will remain in the 1.36 release and probably get fixed for 1.37 or 1.38 later this year.

    If you've got any code to share which reproduces the problem, it'd save me some time and improve the odds I'll get this slipped into 1.36-beta3 this weekend.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •