Large delays in code when using Serial.print, etc.

Status
Not open for further replies.

mHo2

Member
Hi All,
I seem to be receiving huge delays with writing to the serial port in a teensy.
I'm currently running the following code:

Code:
void loop() {
 // put your main code here, to run repeatedly:
   for(int i = 0; i < 30; i++) { 
   bool sysReady = initiate();
   if(sysReady){
     readInput();      //read input from Pi
   }
 }
   draw();       //send entire array from Pi to LED's
    if(receivedData && shiftRegCount == 0) {
     readSensors();  //Send entire array to Pi
   }
}

This code continually grabs 30 bytes to be read from a device on the other end of the serial port, then draws one column on my matrix, then sends information back to the device on the other end of the serial.

When I don't send serial communication from the teensy to the other device (in this case my PC running pySerial) I get awesome framerates, maybe 60fps. However when I start sending data back, my program slows down to a crawl. See attached video:

http://youtu.be/sMRN28Zhrjw

Any suggestions?

The functions referenced in the above code are as follows:
Code:
//========================READ SENSORS==========================
void readSensors(){
  for(int i=0; i<yMax; i++){
    //
    //digitalWrite(S0,bitRead(i,S0));
    //digitalWrite(S1,bitRead(i,S1));
    //digitalWrite(S2,bitRead(i,S2));
    //digitalWrite(S3,bitRead(i,S3));
    //digitalWrite(4,0);        we can just ground this
    
    for (int j=0; j<xMax; j++){
        //Serial.write(char(255));
        //Serial.write(char((i)*10 + (j) + 1));
        //Serial.write(char(i+j+1));
        //Serial.write(char(0));
    }
  }
  Serial.println("This is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, this is a test, ");
} 
void draw(){
  for(int j = 0; j<48; j++) {
      setGSData(j,ledArray[shiftRegCount][j%16][j/16]);
  }
    //Serial.println(j);
    //delay(50);  
  //Send out values and shift the shift register
    sendGSData(shiftSelect[shiftRegCount]);
  //Increment count
   ++shiftRegCount;
   if(shiftRegCount > 7) {
      shiftRegCount = 0;
   }
}

//=========================READ INPUT FROM PI==============================
void readInput() {
   inByte = Serial.read();  //reads first character in buffer from incoming serial data
   payloadByte = int(inByte);  //typecasts inByte from a char to an int
   if (inPacket) {  //If we're in a packet (have read start bit), then continue
     if(payloadByte == 0) {  //Checks to see if we're reading the stop bit
         inPacket = false;      //Sets inPacket to false, meaning we're not in a packet and will look for another start bit
         //Once we hit the stop bit, package has been read and we update the mask
         //ledArray[currentValue[0]][currentValue[1]][6]=255; 
         if(++updateCount > 159) {
           updateCount = 0;
           if(writeToLeftArray == 0) {
             writeToLeftArray = 1; 
             shifter = 3;
           }
           else {
              writeToLeftArray = 0; 
              shifter = 0;
           }
         }
     }
     else {
         if(payloadByte == 255) {  
           inPacket = true;
           packetPlacement = 0;
         }
         if(packetPlacement == 0) {                  //First piece of data in packet denote position on table
             int x = 0;
             int y = 0;
             if(payloadByte > 160) {
               payloadByte = 160; 
             }
             xyconvert(payloadByte-1, &x, &y);       //Converts position to x-y coordinates
             currentValue[0] = x;
             currentValue[1] = y;
         }                                           //Next three bytes will denotes r,g,b brightness,
         else if(payloadByte == 1) {     //                      
             payloadByte = 0;
         }
         else if(payloadByte == 254) {
             payloadByte = 255;
         }
         
         if(packetPlacement > 0 && packetPlacement < 4) {
           if(writeToLeftArray == 1){                                                                  //Determines which array to write to
             ledArray[currentValue[0]][currentValue[1]][packetPlacement-1] = map(payloadByte,0,255,0,4095);            //Writing to first array
           }else{
             ledArray[currentValue[0]][currentValue[1]][packetPlacement+2] = map(payloadByte,0,255,0,4095);            //Writing to second array
           }
         }
         if(packetPlacement < 0 || packetPlacement > 3) {
             Serial.println("Error: Incorrect data format");
         }
         if(++packetPlacement > 6) {
           inPacket = false;
         }    
     }
   }
   else {
     if(payloadByte == 255) {  //ASCII start bit
         inPacket = true;
         receivedData = true;
         packetPlacement = 0;
     }
   }
 //Logic to flip from reading to left to reading from right and vice versa for data input from serial
}
 
Hi

Is Serial.println() a blocking function (Only returns when it has completed and the data has been transmitted)?

If it is you will need to use an alternative one that puts the debug output to be sent to a buffer and sends it out by interrupts or DMA. This will then not impact speed of the application in any noticeable way.

Regards

Mark
 
Good question, uTasker. I should find that out. It almost might be worthwhile to attempt to multithread this. D'you know if that's possible?
 
true mHo2!
but I've noticed the same, Teensy 3x serial over usb blocks..

Interesting, I read up more on http://www.pjrc.com/teensy/td_serial.html
It states it doesn't..
"On a standard Arduino, this function waits while the data is transmitted. With Teensyduino, Serial.print() and Serial.println() typically return quickly when the message fits within the USB buffers. See Transmit Buffering below."
But further on it states:
All USB bandwidth is managed by the host controller chip in your PC or Mac. When a full or partially filled buffer is ready to transmit, it actual transmission occurs when the host controller allows. Usually this host controller chip requests any scheduled transfers 1000 times per second, so typically actual transmission occurs within 0 to 1 ms after the buffer is scheduled to transmit. If other devices are using a lot of USB bandwidth, priority is given to "interrupt" (keyboard, mouse, etc) and "isychronous" (video, audio, etc) type transfers.

When the host controller receives the data, the operating system then schedules the receiving program to run. On Linux, serial ports opened with the "low latency" option are awakened quickly, others usually wait until a normal "tick" to run. Windows and MacOS likely add process scheduling delays. Complex runtime environments (eg, Java) can also add substantial delay.

Perhaps it's on the other devices end? Does anyone know if there's a way to make this non-blocking or some how multi-threaded? The 'draw' function i use is time critical and can't have that.
 
Serial.print() writes your data into the USB buffers. If there's enough space in the buffers, it returns quickly. If not, it waits for more space.

On Teensy 2.0, there are always two 64 byte buffers. On Teensy 3.0 & 3.1, there's a pool of many buffers, but the pool is shared between all endpoints. One downside is if the PC sends a lot of data to Teensy, but you don't read it, that incoming data can fill up all the buffers and prevent Serial.print() from getting any buffers to write data. So that's the first thing to check... if your PC is simultaneously sending lots of data that's filling up the USB buffers.

The other potential issue is whether code on the PC side is consuming Teensy's data fast enough. USB has end-to-end flow control, so the outgoing packets don't become free buffers on the Teensy side until the PC acknowledges reception. You'd think a many-GHz PC wouldn't limit the speed, but often it can. Especially reading 1 byte at a time is very slow on Windows (incredibly slow, like only several kbytes/sec), due to some limitations on the Windows drivers. Reading in small chunks and updating a GUI synchronously (the code doesn't try to read more until the GUI repaint is done) also tends to limit the speed.
 
@Paul

Is there any way to make this a non blocking call to print regardless of buffer space? I can try to play with my read/write/draw ratios, but i'm not confident that it will work well.

EDIT:
I can definitely provide full code and details on how to reproduce the problem. The only issue is there is a lot of it. I'm running the Teensy 3.1 on win 7 (eventually it'll run on raspbian). I'm also using a custom TLC5940 library for the Teensy 3x..
 
Hi

The Kinetis USB transmitter has a ping-pong buffer (per bulk endpoint) and so it should be possible to write up to 128 bytes (64 byte endpoints) in total and leave the USB to send these 2 buffers. If the dual-buffers are not being used (just 1) the limit would be one write or up to 64 bytes in size.
If the routine waits until it can place its data in the buffer (if the buffer is presently in use) it could cause a worst case delay of maybe 1ms but the host will usually be sending IN token much faster than that.

You didn't clarify what "huge" delays means: in the ns, us, ms, seconds or years regions?

If you are writing a large block of data (more than enough to require > 2 buffers to be written to) the driver can not put this to the USB buffers and so either waits (blocking) or else deposits the data in an internal queue and returns. The USB driver then feeds the content of the internal queue to the USB engine whenever there is a free buffer space.
Using an 8k buffer (for example) it would then be possible to write up to 8k of data from the application without any delays apart from the copy time (to the queue). The USB driver would then handle about 51 USB interrupts to clear the buffer in 64 byte chunks and the transfer should be complete in about 8ms. The application would be doing further work in the meantime and so not notice this, and it coudl be adding more output data to the USB queue ithout any consequences (up to the queue length limit).

There is a description of the USB interface from the uTasker project at http://www.utasker.com/docs/uTasker/USB_User_Guide.PDF which discusses buffered rx and tx of this nature.

Regards

Mark
 
Full details are as follows:

Used devices:
Teensy 3.1
Tlc5940(x3)
SN74HC595
8x8 LED matrix (bi-color)
PC with 3gHz quad core CPU and full 2.0 USB spec.

Used software:
Arduino 1.0.5r2 IDE
TLC5940 custom library
Teensy supplied Serial library
Python 2.7
pySerial
threading
time


Python code:
http://pastebin.com/2RYkQYgf
Arduino code:
http://pastebin.com/jyffW6vd
TLC5940 Code*:
http://pastebin.com/hgMjc0mz
http://pastebin.com/v0SXL58c
* - not sure if the original creator of these wanted their name attached after I edited these, so it's out for the time being.

HOW TO RECREATE:

Hook up the TLC to the following pins on the teensy:
TLC5940 GSCLK -> teensy pin 9 (PWM)
TLC5940 BLANK -> teensy pin 10
TLC5940 XLAT -> teensy pin 12
TLC5940 VPRG -> teensy pin 15

TLC5940 SCLK -> teensy pin 13
TLC5940 SIN -> teensy pin 11
TLC5940 SOUT -> unconnected (I'm not querying the TLC5940 and thus don't
need communication from it)


Hook up the shift register to the following pins on the teensy:
Latch pin = 2
Clock pin = 7
Data pin = 8
Output Enable pin = 6

Interconnect them to your LED matrix as you would normally.

Run python code while teensy is connected to your PC (make sure to change the python code to accept the device that is common to your OS! so linux would be /dev/tty*, but windows would be COM*)
 
If you are writing a large block of data (more than enough to require > 2 buffers to be written to) the driver can not put this to the USB buffers

Actually, Teensy 3's USB stack can do exactly that. A large pool of buffers are shared among all the endpoints and dynamically assigned to the ping-pong hardware.

It's very easy to achieve the full ~1 Mbyte/sec USB speed on Teensy 3, even if your program uses libraries like LiquidCrystal that have some delays, as long as the software on the PC side keeps up.
 
I took a quick look at your Python code. I haven't run it yet (Robin & I have non-work plans this evening) and I'm not a Python expert....

But it looks like you might be attempting a command-response protocol, where the Pi sends something and waits for a reply before sending the next thing. There's quite a bit of code and it's somewhat complex, so I can't unravel it all in just a few minutes (the time I have left today).

If you are indeed using a send-and-wait-reply on the Pi side, that approach always going to be slow. This has been discussed many times before on this forum and other places, and I need to get off the computer now, so I'm not going to rewrite all that...
 
Nope, only in the case of the initial handshake (to confirm that a teensy is on the other end loaded with my code) is a send-and-wait-reply style of comms. The rest blindly sends it in the form of:
255
pos
R
G
B
0

and receives data by
255
pos
intensity
0
 
Nope, only in the case of the initial handshake (to confirm that a teensy is on the other end loaded with my code) is a send-and-wait-reply style of comms. The rest blindly sends it in the form of:
255
pos
R
G
B
0

and receives data by
255
pos
intensity
0

Just updating on the issue, still no solution, but my current workaround is to actually use two teensy's. One for data capture and one for data display. Both being connected to python.
 
Status
Not open for further replies.
Back
Top