Data dropouts in serial transfer over USB

There are a couple of issues to consider here. The belief in the forum I referenced is that Hantek is probably overstating the speed of their product and using some contrived specsmanship. They are using a Cypress processor and apparently DMA to do block transfers where the blocks are fairly large. One also needs to note the difference between MSPS (mega samples per second) and MBps (mega bytes per second) and mbps (mega bits per second). I believe that USB 2 is rated at 480 mbps or 60 MBps. By my reading, the Teensy can't keep up with those numbers so maybe the transfer method they use and the dll on the PC side will help if it fits the application.

I was specifically looking at accelerometer data from an Teensy LC to a PC and watching it using processing.exe. That just zooms the data by where I was looking for impulses and needed to find them in the buffer. The open6022 has fairly stable software triggering so one can set a threshold trigger like on an o'scope and start at that point in the buffer. That works very well on the open6022 and is junk on the software supplied by Hantek as it isn't even slightly stable. For my purposes, outputting the accelerometer data on a D/A pin and watching it on the Hantek with the open source software will work. All I can do is confirm that the open source software works with a dirt cheap USB o'scope and software triggering is rock stable.
 
There is an updated version of TYQT that is working properly with my sketch at 6.9 Mbps [863,123 Bytes/sec]. I think the Teensy 3.x USB is quoted just over that, the LC will be slower.

<Edit> - I was overshooting on throttle increment - it runs reliably with 58 usec wait with TYQT:
__throttle [ NO SEND NOW ] usecs =58
__1000 xfer ms time=62__1000 xfer Wait Loops ms=6951__Test Time ms=2319

This is 1,155,670 bytes per second - or 9.245364 mega bits per second

<Edit 2:> this is too fast - some 1224 strings are lost at this rate with GUI display - but it still runs!

https://forum.pjrc.com/threads/27825-Teensy-Qt?p=90764&viewfull=1#post90764

I just checked and 40,000 strings of 67 chars printed into TYQT in 3.105 seconds with no data loss or corruption from Teensy 3.2. Using the sketch in that post - and to run slower just push the "#define THROTTLEINIT 50" to where you want in microseconds to impose loop entry wait.

It has done this a total of maybe 150 times with no stalling! The current instance has run 66 times where SerMon would never make 8. This is faster than the command line - and way more reliable and usable than IDE SerMon - or the old TYQT. There is NO PROBLEM WITH TEENSY USB that I can see - any data issues are on the PC receive side.

current sketch: View attachment USB_Throttle2.ino
 
Last edited:
My suggestions were purely on the PC side. I have no opinion about the Teensy side other than to say everything I've tried has worked correctly so that's not where I'd start to look for a problem.
 
@veng1 - my experience agrees with yours. If there is data loss is on the receiver end. And as TYQT update shows it can be done well without slowing the Teensy too much and the Teensy is not behind data loss or the stalls or hangs in the USB transfer. Thanks PJRC and KOROMIX!

I just confirmed no data loss (one test sample) at 1.143 MBps - 9.146 Mbps. The sender Teensy in my case monitors the processor microseconds for any indication of blocking and then adds delay before the next set of transmissions so no print blocks. With no feedback from the receiver the Teensy has no way of knowing if the data was able to be buffered safely on receipt. In my case I print 67 characters on one line. I captured the TYQT output and there are 40,000 well formed lines at this rate (the first 9999 lines are at least one char short of 67 because the content is the 'iteration count')!
__throttle [ NO SEND NOW ] usecs =60
__1000 xfer ms time=63__1000 xfer Wait Loops ms=7071__Test Time ms=2344

<edit> back to the OP topic - after running 117 cycles TYQT is doing well!!! The Throttle went HIGHER 120 throttle of 120 usecs because TYQT I assume was taking longer as it manages all the buffered data - dumping the old and re-using the space. This was a natural fall out of my provided sketch monitoring blocking from the prints. I cleared the TYQT buffer and the time is back down. The GUI display and buffer management is impacting TYQT performance - but it is not locking or stalling or dropping the connect. Clearing the buffer is the answer in the case as this is not normal operation.

<edit> switched back to SerMon and same code won't run 3 complete passes where TYQT never fails. TYQT grows and shrinks RAM with buffer shuffling - the IDE SerMon only GROWS - doesn't shrink on closing a stalled SerMon. Going to recompile on an LC and see what happens.
 
Last edited:
The OP didn't ask about performance on an LC - but this shows the promise of the offered code self adjusting to the limitations at hand where now the LC USB hardware changes things. The time spent in the loop() printing takes longer of course at 80 usecs versus 30 usecs - but the net USB output is remarkably similar - especially as the OP found the problem is on the receiving end! I have no measure of CPU utilization in this code.

These times are probably too fast and may have unverified dropped lines in TYQT:
The non send_now() case still wins in running with 100 usecs wait needed to clear the buffer versus 60 usecs on the T_3.1
The send_now() case runs 226 usecs wait needed to clear the buffer versus 209 usecs on the T_3.1

I tuned from too fast going slower with T_3.x and that was reflected in the code. LC test shows putting the throttle well over the needed time ends up adjusting further upwards as the time in loop. I had a hardcoded 10 usecs used to determine when a larger throttle was needed - the LC easily overruns this in normal operation - as would the T_3.1 on occasion so this is the fix for that jumping to 20 for now where the 10 was.:
Code:
    #define THROTTLEERROR 20
if ( runningtemp > ( throttle + [B]THROTTLEERROR[/B] )) {

Teensy LC USB results - running against TYQT:
__throttle [ NO SEND NOW ] usecs =100
__1000 xfer ms time=111__1000 xfer Wait Loops ms=666__Test Time ms=4074

__throttle [ with SEND NOW ] usecs =226
__1000 xfer ms time=235__1000 xfer Wait Loops ms=16766__Test Time ms=8705

Versus the Teensy 3.1:
__throttle [ NO SEND NOW ] usecs =60
__1000 xfer ms time=64__1000 xfer Wait Loops ms=7081__Test Time ms=2345

__throttle [ with SEND NOW ] usecs =209
__1000 xfer ms time=213__1000 xfer Wait Loops ms=46796__Test Time ms=7880
 
I implemented binary packet-based communication and a python listener using a dedicated thread. I don't see any dropped packets anymore.

I haven't yet figured out if there is an ideal packet length or really pushed too hard to benchmark. I'm happy with this solution. Thanks to everybody for your help and suggestions.

firmware:
Code:
#include <Arduino.h>

// adapted from https://pypi.python.org/pypi/cobs/
uint8_t packet[256];

// payload length should be less than 254 to fit inside packet buffer of length 256
// with a deterministic overhead of 2 bytes (initial length and termination 0x00)

void cobs_encode(uint8_t * dst_ptr, uint8_t * src_ptr, uint8_t src_len) 
{   
    // encode data with constant overhead byte stuffing
    uint8_t * dst_code_write_ptr = dst_ptr;
    uint8_t * dst_write_ptr      = dst_code_write_ptr + 1;
    uint8_t search_len = 1;
    uint8_t src_byte;
    
    uint8_t * src_end_ptr = src_ptr + src_len;
    /* Iterate over the source bytes */
    if (src_ptr < src_end_ptr)
    {
        while(1)
        {
            src_byte = *src_ptr++;
            if (src_byte == 0)
            { /* We found a zero byte */
                *dst_code_write_ptr = (char) search_len;
                dst_code_write_ptr = dst_write_ptr++;
                search_len = 1;
                if (src_ptr >= src_end_ptr)
                {
                    break;
                }
            }
            else
            { /* Copy the non-zero byte to the destination buffer */
                *dst_write_ptr++ = src_byte;
                search_len++;
                if (src_ptr >= src_end_ptr)
                {
                    break;
                }
                if (search_len == 0xFF)
                { /* We have a long string of non-zero bytes */
                    *dst_code_write_ptr = (char) search_len;
                    dst_code_write_ptr = dst_write_ptr++;
                    search_len = 1;
                }
            }
        }
    }
    /* We've reached the end of the source data.
     * Write the last code (length) byte.
     */
    *dst_code_write_ptr = (char) search_len;
     dst_code_write_ptr = dst_write_ptr;
}

static const char filler_ascii[146] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz0123456789";

void setup()
{
    Serial.begin(9600); // USB is always 12 Mbit/sec?
    delay(5000); //wait for user to start up serial listener.
}

#define PACKET_SIZE 128

void loop() 
{
    uint32_t time_prev = 0;
    uint32_t time_start = micros();
    const uint32_t packets = 10000;
    time_prev = time_start;
    uint32_t actual_delay = 0;
    uint32_t excess_delay = 5; //sets first loop delay
    
    Serial.write(0x00);
    
    for(uint32_t iteration_count = 0; iteration_count<packets; iteration_count++){
        
        uint32_t extra_delay = 0;
        
        uint32_t target_delay = 1 + (uint32_t) ( (float) iteration_count * 999.0f / packets );
        
        if (actual_delay > target_delay){ // we are slower than desired
            excess_delay = actual_delay - target_delay;
        }
        if (target_delay > excess_delay) { // we should slow down
            extra_delay = target_delay - excess_delay;
            delayMicroseconds(extra_delay);
        }
        
        uint32_t time_now = micros();

        uint8_t payload[PACKET_SIZE-2]; //we need to save two bytes for COBS overhead
        uint32_t i = 0;
        payload[i++] = 0x00; // overwrite with parity bit
        
        for (uint8_t k = 0; k < 4; k++) { payload[i++] = ((uint8_t *) &iteration_count)[k]; }
        
        actual_delay = time_now - time_prev;
        for (uint8_t k = 0; k < 4; k++) { payload[i++] = ((uint8_t *) &actual_delay)[k]; }
        
        float attempted_xfer_rate = 8000.0f * (float) PACKET_SIZE / target_delay;
        for (uint8_t k = 0; k < 4; k++) { payload[i++] = ((uint8_t *) &attempted_xfer_rate)[k]; }

        for (uint8_t k = 0; k < 4; k++) { payload[i++] = ((uint8_t *) &extra_delay)[k]; }
                
        for (uint8_t k = 0; i < PACKET_SIZE-2; k++) { // leave room for parity and length code
            payload[i++] = (uint8_t) filler_ascii[k];
        }
        for (uint8_t i = 1; i < PACKET_SIZE-2; i++) { // compute parity
             payload[0] ^= payload[i];
        }
        
        cobs_encode(packet, payload, PACKET_SIZE-2);
        packet[PACKET_SIZE-1] = 0x00; //add terminating 0x00
        
        Serial.write(packet, PACKET_SIZE);
        time_prev = time_now;
    }

    while(1){};
}

listener:

Code:
#!/usr/bin/env python

import sys, os
import multiprocessing
import Queue #needed separately for the Empty exception
import time, datetime
import serial  # requires pyserial
import struct

class SerialReadProcess(multiprocessing.Process):
    def __init__(self, output_queue, port):
        multiprocessing.Process.__init__(self)
        self.output_queue = output_queue
        self.exit = multiprocessing.Event()
        self.f = None
        self.port = port
    def shutdown(self):
        self.exit.set()
    def run(self):
        with serial.Serial(self.port, timeout=0.1) as ser:
            while not self.exit.is_set():
                bytes_available = ser.inWaiting()
                if (bytes_available > 0):
                    try:
                        self.output_queue.put(ser.read(bytes_available), False)
                    except Queue.Full:
                        continue

def collectRawData(activeport):
    q = multiprocessing.Queue(maxsize=10000)
    s = SerialReadProcess(q, activeport)
    s.start()
    # write data from the queue to disk
    f = open(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'raw.txt'), 'wb')
    f.seek(0)
    data_collection_started = False
    print "READING DATA"
    while(True):
        try:
            data = q.get(False)
            f.write(data)
            data_collection_started = True;
            last_data_collection_time = time.time()
            sys.stdout.write('.')
        except Queue.Empty:
            if data_collection_started:
                if (time.time() - last_data_collection_time) > 1:
                    print "all done!"
                    break
    f.close()
    s.shutdown()
    
    

def parseRawData():
    
    print "PARSING DATA TO CSV"
    
    PACKET_SIZE = 128
    
    with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'parsed.csv'), 'w') as out:                   
        out.write("iteration, forced delay (usec), loop time (usec), attempted xfer (kbps), acheived xfer (kbps), filler data\n") #header line
        with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'raw.txt'), 'rb') as f:
            state = 0
            message = []
 
            byte_number = 1
            
            data = f.read(1)
            while data:
                #print byte_number, state, " ".join(hex(ord(n)) for n in data), ord(data[0]) == 0
                
                if (state == 0) and (ord(data[0]) == 0): #sync to next start
                    state = 1
                    message = []
                    data = f.read(PACKET_SIZE - 1) #grab encoded bytes without next 0x00
                    byte_number += PACKET_SIZE - 1
                elif (state == 1):
                    i = 0
                    while i<(PACKET_SIZE-1):
                        len_code = ord(data[i])
                       
                        if len_code == 0:
                            message = []
                            state = 0
                            break
                        len_code -= 1
                        i+=1
                        
                        if (len_code>0):
                            #print byte_number, len_code, i, len_code+i
                            for k in range(len_code):
                                if (k+i) > (PACKET_SIZE-2) or (ord(data[k+i]) == 0):
                                    message = []
                                    state = 0 #bad packet
                                    break
                                else:
                                    message.append(ord(data[k+i]))
                                    
                            
                        message.append(0)
                        i += len_code

                    if len(message) == (PACKET_SIZE-1):
                        parity_check = 0
                        for m in message:
                            parity_check = parity_check ^ m
                        if parity_check==0:
                            k =1
                            iteration_count     = struct.unpack("<L", ''.join([chr(m) for m in message[k:k+4]]))[0]
                            k+=4
                            loop_time           = struct.unpack("<L", ''.join([chr(m) for m in message[k:k+4]]))[0]
                            k+=4
                            attempted_xfer_rate = struct.unpack("f", ''.join([chr(m) for m in message[k:k+4]]))[0]
                            k+=4
                            excess_delay_time   = struct.unpack("<L", ''.join([chr(m) for m in message[k:k+4]]))[0]
                            k+=4
                            ascii_filler    =  ''.join([chr(m) for m in message[k:len(message)-1]])
                            
                            #out.write("iteration, forced delay (usec), loop time (usec), attempted xfer (kbps), acheived xfer (kbps), filler data\n") #header line                 #
                            buffer = str(iteration_count) + ',' + str(excess_delay_time) + ',' + str(loop_time)+ ',' + str(attempted_xfer_rate) + ',' + str( 8000.0 * PACKET_SIZE / loop_time) + ',' + str(ascii_filler) + '\n'
                            out.write(buffer)
                            #sys.stdout.write('.')
                            
                    message = []
                    state = 0 
                else:
                    state = 0
                    data = f.read(1) #skip bad packets
    print "all done!"
    
if __name__ == '__main__':
    collectRawData('COM7')
    parseRawData()
 
I have completed my benchmarks. With optimized python code on my desktop I can get ~6.5Mbps average transfer rate if I'm willing to accept a variable loop time, or ~2.0Mbps if I want to avoid blocking. I don't think the packet size fed to Serial.write matters very much, other than the overhead in packet encoding and the lost data efficiency due to the COBS/parity/CRC overhead.

Here's what a 6.5Mbps transfer looks like:
16_0.jpg

and a "safe" 2.0Mbps transfer:
16_50.jpg

I counted a loop that took longer than 300usec as a "stall". Here's how the stall rate per 5000 packets grows:
stall count vs rate.png

Finally, here's a plot of the average loop execution time vs. data rate:
average loop time vs rate.png

Here's the code and a csv for the test conditions in the plots:

View attachment teensyserialtest.zip

I've also run some tests using Chrome.serial or a compiled go program as the listener. Both are anecdotally faster than PySerial but I haven't been thorough in benchmarking. I'm particularly excited that Chrome.serial is fast since I will be ultimately be using it in an extension for my product (Flybrix).
 
I don't think the packet size fed to Serial.write matters very much

Write sizes shouldn't matter, other than consuming a little more CPU time. Teensyduino's USB serial code packs your writes efficiently into 64 byte packets, regardless of their size. Partial (inefficient) packets are only transmitted if you call Serial.flush() or Serial.send_now(), or if you stop writing for 4-5 milliseconds. After a 5 millisecond timeout with no more Serial.write(), whatever partial packet was in progress is transmitted. The main purpose for Serial.send_now() is to cause that last partial packet to transmit ASAP, rather than 5 ms later.
 
In case someone stumbles onto this - I have managed to get reliable and lossless teensy -> PC usb serial transfer, using the following code, without modifying the teensy library (cores/teensy3/usb_serial.c: usb_serial_write() function, as mentioned here).

Code:
static void
serial_write_nofail(const uint8_t * buffer, size_t size)
{
        while (size > 0) {
                int w = Serial.write(buffer, size);
                if (w > 0) {
                        buffer += w;
                        size -= w;
                }
        }
}

https://github.com/iger/spiflash

Used like so, in two terminals:
Code:
pv /dev/ttyACM? > /tmp/test-output  # first start reading
printf "R" > /dev/ttyACM?           # then activate output

Though it has some caveats:
  • requires preformatted buffer - no print/println
  • code blocks while waiting, though it should be avoidable by hoisting the retry logic into the main loop
  • listener should connect before writes start, otherwise some data seems to get lost before writes start blocking
  • if a data stream is written PC -> teensy, then reverse (teensy -> PC) writes block indefinitely, so the function will lock

Code was tested with receiver rate limiting and no data was lost.
Code:
pv --rate-limit 100k -W /dev/ttyACM? > /tmp/test
Note that cat and dd don't work for reading a tty (they quit immediately), but pv works fine.

Tested on linux, Teensy 3.2, teensyduino 1.8.5 (I think).
 
Back
Top