Serial.write xmit dropping data

Status
Not open for further replies.

ardvark

Member
I have a Teensy 4 and have it collect a bunch of data and then stream it back to the computer over usb via Serial.write(). For small data sets, this works fine, but if I send 100kB or more, I get some data loss where at least a few bytes get dropped. I've tried to do both send (Teensy) and receive (python on computer) in packets, but it hasn't made much of a difference. I thought that Serial.write() should be blocking if the USB buffer is full, but I also tried adding a check with Serial.availableForWrite(), but that didn't help either. If I slow down the sending with a delay in the Teensy send loop, everything works fine, so it seems like the Teensy isn't able to properly throttle the send rate based on what the host can keep up with. Am I wrong that I should be able to just dump data over Serial.write() and it will block as needed for the computer to get all the data? If not, is there a way to prevent data loss (other than a big delay in the sending)?

I have a pretty simple Teensy firmware and python script that shows the problem. Here's the python code to receive the data and check that it is right:

Code:
import serial, struct
import numpy as np

comm = 6
wordsToGet = 1000000

ser = serial.Serial('COM{}'.format(comm),115200,timeout=2)

expectedData = np.array(range(wordsToGet))
expectedData = expectedData.astype(np.uint16)

ser.write('{}\n'.format(wordsToGet).encode())
data=b''
received = 0
while( wordsToGet != received):
    packetSize = min(1024,wordsToGet-received)
    packet = ser.read(packetSize*2) # 2 bytes per word
    if (len(packet) != packetSize*2):
        raise Exception('Error, got only {} bytes, but expecting {}'.format(len(packet),packetSize*2))
    data += packet
    received += packetSize

data = np.array(struct.unpack('H'*wordsToGet,data))
diffSquared = (data-expectedData)*(data-expectedData)

if (np.sum(diffSquared) != 0):
    print('Error, data returned not as expected')
    print(np.array(diffSquared))
else:
    print('{} words successfully received'.format(wordsToGet))

del ser

I've attached the full .ino sketch I'm using, but the sending part of the code is

Code:
while ( localWordsToSend != localWordsSent) {
      uint packetSize = min((uint) SENDSIZE,localWordsToSend - localWordsSent);
      while ( (uint) Serial.availableForWrite() <  (packetSize * sizeof(data[0])));
      Serial.write( (uint8_t*) &data[localWordsSent], packetSize * sizeof(data[0]));
      localWordsSent += packetSize;
      
      //delayMicroseconds(2500); //necessary to avoid missed data
    }

Any suggestions or ideas on what is going on and/or work-arounds would be appreciated. I'm using Teensduino 1.55 and the receive side is python 3.8 (64 bit) on Windows 10.

Thanks!
 

Attachments

  • CleanTeensyDataProblem.ino
    1.6 KB · Views: 78
This is just a very wild shot in the dark, but try planting this instruction at the end of your Teensy's sending while loop:
Code:
asm("dsb");

it is an assembly command called the "Data Synchronization Barrier" and it essentially stalls the ARM processor for a little bit as all the pending memory writes finish. It is used to ensure critical code segments is allowed to fully execute before the processor moves on.
I'm not sure if it helps, but i found it invaluable in keeping my interrupts from acting weird.
 
There is definitely a bug in the Windows driver. There may also be bugs on the Teensy side. It's recently being discussed on this lengthy thread.

https://forum.pjrc.com/threads/68291-Teensyduino-1-56-Beta-1

To try Frank's proposed fix for the Teensy side, get his usb_serial.c file here.

https://github.com/FrankBoesing/cdcbench/blob/main/usb_serial.c

It's meant to replace the copy in C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy4. Maybe it will help? I'd really like to hear it if improve the situation for your Python application.

Using Serial.availableForWrite() on the Teensy side can also help in some situations.

Sadly, it's pretty clear Window has bugs in its driver. So far there is no known solution other than limiting the speed of transmission.

Linux works very well. MacOS is still in question, though it is much better than Windows.
 
Thanks for the quick reply. I tried using the usb_serial.c at the https://github.com/FrankBoesing/cdcbench/blob/main/usb_serial.c, but I still get dropped data. I also added the "dsb" assembly code mentioned earlier but it didn't make a difference either.

Using Serial.availableForWrite() on the Teensy side can also help in some situations.

I already have the line
Code:
while ( (uint) Serial.availableForWrite() <  (packetSize * sizeof(data[0])));

in my loop, but if there's a another way to use the availableForWrite, let me know.

Happy to try other things if anyone has a suggestion. I'll try to test it on Linux box as that would (somewhat) isolate python / pyserial. Thanks
 
Small update. When it errors, I updated the python code to now print how far it got before it ran out of data with the new line
Code:
raise Exception('Error, got only {} bytes, but expecting {}, received {} of {} bytes total'.format(len(packet),packetSize*2,received,wordsToGet))

and I noticed that it always fails very close to the end of the transmission. I'll get something like
Exception: Error, got only 1920 bytes, but expecting 2048, received 1396736 of 1400000 bytes total

and if I double the amount of data, it'll get twice as far, and die right at the end
Exception: Error, got only 1792 bytes, but expecting 2048, received 2798592 of 2800000 bytes total

Maybe this points to the issue being a windows / python / pyserial issue, but maybe this helps somehow. Maye some terrible work-around is to send empty data at the end and ignore it. Also, this behavior holds with both versions of usb_serial.c
 
Another terrible fix is to add a delay when sending. This way I got it pretty reliable.
But..as said.. terrible.
 
Interestingly, I did very similar tests in summer (sending from Teensy to Win10 PC) and ended up with exactly the same result as described by ardvark. I was using the standard C# serial classes. I first assumed some c# issue and switched to calling the win32 API directly (similar to what Frank did) but got exactly the same result.

As ardvark, I was also able to track the issue down to the the missing blocks at the end of large (>100kB) blocks transmitted by Serial.write(). Regardless of the block size, data was never missing within one block, always at the end and (IIRC) always in multiples of 256 bytes. The receiving Win32 code then ran into a timeout since the missing data never arrived. Placing delays between writing the 100kB blocks fixed the issue as soon as the resulting overall transmission rate was smaller than some 12-13 MByte/second.

I'm not yet convinced that this is a Windows issue. Unfortunately I don't have an USB analyzer but shouldn't it be easy to find out if the missing bytes actually left the Teensy? Serial.write() reports the correct number of bytes sent but AFIAK this only reports the number of bytes written into the Teensy buffers not the bytes actually transmitted, right?

Does anybody know of another Arduino board using USB CDC at 480MBit/s? Would be interesting to see if this generates the same issues.
 
I updated my python script to report how fast the data transfer is and what data points are bad. I ran this script on a Linux machine for different transfer sizes, and I get the following results (no data corruption):

Code:
100000 words successfully received, 0.2 MBytes in 0.0 seconds (33.22MBPS)
200000 words successfully received, 0.4 MBytes in 0.0 seconds (31.63MBPS)
400000 words successfully received, 0.8 MBytes in 0.0 seconds (32.05MBPS)
1000000 words successfully received, 2.0 MBytes in 0.1 seconds (18.08MBPS)
2000000 words successfully received, 4.0 MBytes in 0.5 seconds (8.03MBPS)
4000000 words successfully received, 8.0 MBytes in 2.2 seconds (3.65MBPS)
10000000 words successfully received, 20.0 MBytes in 52.2 seconds (0.38MBPS)

For reference, under windows I got
Code:
100000 words successfully received, 0.2 MBytes in 0.0 seconds (34.05MBPS)
200000 words successfully received, 0.4 MBytes in 0.0 seconds (33.15MBPS)  -- starts to fail

So under Linux, I didn't get transfer errors, but my transfer rate started to plummet when I started sending over 1MB or so. And by 20MB, it was just a crawl. Maybe this is the underlying problem but Linux handles it better.

Thinking the problem was at the end of the transfer, I edited my firmware to add a delay when it got near the end of the transfer, but this did not make a difference. Looking more closely at the data, I saw that the data corrupt often happened much earlier than I thought -- it dropped data sometimes at the beginning of the transfer, but it wasn't until the end that I noticed bytes were missing. So I edited the firmware (attached as v2) to store the my data as uint32_t and un-aligned my memory size with a power of 2 to make it unlikely to drop data and not see the wrong data content.

With this firmware, I updated my python code to
Code:
import serial, struct,time
import numpy as np

comm = 6
wordsToGet = 1100000
wordSize = 4

ser = serial.Serial('COM{}'.format(comm),115200,timeout=2)

expectedData = np.array(range(wordsToGet))
expectedData = expectedData.astype(np.uint32)
expectedData = np.mod(expectedData,100000)

ser.write('{}\n'.format(wordsToGet).encode())
data=b''
received = 0
start = time.time()
while( wordsToGet != received):
    packetSize = min(1024,wordsToGet-received)
    packet = ser.read(packetSize*wordSize)
    if (len(packet) != packetSize*wordSize):
        print('Error, got only {} bytes, but expecting {}, received {} of {} bytes total'.format(len(packet),packetSize*wordSize,received,wordsToGet))
        break
    data += packet
    received += packetSize

stop = time.time()
data = np.array(struct.unpack('I'*int(len(data)/wordSize),data))
diffSquared = (data-expectedData[0:len(data)])*(data-expectedData[0:len(data)])
print('Length of data received: {}/{}'.format(len(data),len(expectedData)))
if (np.sum(diffSquared) != 0):
    errorIndices = (data-expectedData[0:len(data)] != 0).nonzero()[0]
    print('Error, data returned not as expected. {} data points of {} are incorrect ({:01f}%)'.format(len(errorIndices),len(expectedData),100*len(errorIndices)/len(expectedData)))    
    print(errorIndices)
    print('{} words received, {} MBytes in {:0.2f} seconds ({:0.2f}MBPS)'.format(len(data),len(data)*wordSize/1e6,stop-start,len(data)*wordSize/(stop-start)/1e6))

else:
    print('{} words successfully received, {} MBytes in {:0.2f} seconds ({:0.2f}MBPS)'.format(wordsToGet,wordsToGet*wordSize/1e6,stop-start,wordsToGet*wordSize/(stop-start)/1e6))

del ser

and then I reran my tests. In windows, I got

Code:
11000 words successfully received, 0.044 MBytes in 0.00 seconds (36.75MBPS)
22000 words successfully received, 0.088 MBytes in 0.00 seconds (29.41MBPS)
44000 words successfully received, 0.176 MBytes in 0.01 seconds (30.45MBPS)
110000 words successfully received, 0.44 MBytes in 0.01 seconds (31.40MBPS)
220000 words successfully received, 0.88 MBytes in 0.05 seconds (18.71MBPS)
440000 words successfully received, 1.76 MBytes in 0.21 seconds (8.43MBPS)
1100000 words successfully received, 4.4 MBytes in 1.04 seconds (4.22MBPS)
2200000 words successfully received, 8.8 MBytes in 3.70 seconds (2.38MBPS) -- sometimes fails

Length of data received: 4355072/4400000
Error, data returned not as expected. 4274688 data points of 4400000 are incorrect (97.152000%)
[  80384   80385   80386 ... 4355069 4355070 4355071]
4355072 words received, 17.420288 MBytes in 18.21 seconds (0.96MBPS)
Which is significantly better in that I can transfer ~5MB before I see data loss. But as with before under Linux, I see the transfer speed plummet.

Under Linux, I get
Code:
110000 words successfully received, 0.44 MBytes in 0.01 seconds (33.93MBPS)
220000 words successfully received, 0.88 MBytes in 0.03 seconds (34.71MBPS)
440000 words successfully received, 1.76 MBytes in 0.05 seconds (33.03MBPS)
770000 words successfully received, 3.08 MBytes in 0.15 seconds (21.22MBPS)
1010000 words successfully received, 4.04 MBytes in 0.28 seconds (14.35MBPS)

Error, got only 256 bytes, but expecting 4096, received 1079296 of 1100000 bytes total
Length of data received: 1079296/1100000
Error, data returned not as expected. 14272 data points of 1100000 are incorrect (1.297455%)
[1065024 1065025 1065026 ... 1079293 1079294 1079295]
1079296 words received, 4.317184 MBytes in 2.30 seconds (1.87MBPS)

Which is actually worse. So now I'm seeing data loss with Linux. What's interesting is show sharp the data throughput drops as I increase the transfer size from 3MB to 4MB to 4.5MB.

One change from using 32-bit data is that this changed the packet size for sending and receiving by a factor of 2. I played around with that on Windows on the python receive side and a larger packet size helps a bit (smaller packet size say errors with maybe 30% smaller data sets), but increasing it above what I have now (1024*4Bytes) no longer helps.

Changing the SENDSIZE on the firmware to 512 instead of 1024 gives Windows problems like previously with the 16bit data:
Code:
110000 words successfully received, 0.44 MBytes in 0.02 seconds (24.70MBPS)
Error, got only 1920 bytes, but expecting 4096, received 218112 of 220000 bytes total
Length of data received: 218112/220000
Error, data returned not as expected. 81760 data points of 220000 are incorrect (37.163636%)
[136352 136353 136354 ... 218109 218110 218111]
218112 words received, 0.872448 MBytes in 3.05 seconds (0.29MBPS)

On the teensy side, it seems doing writes of 2048 Bytes at a time does not play well with Windows and doing writes of size 4096 Bytes helps windows, but seems worse for Linux. I wonder if whatever causes the data speed to drop is the underlying issue and the packet size and OS implementations vary in how they handle this and if / when it causes data drops.
 

Attachments

  • CleanTeensyDataProblem-v2.ino
    1.6 KB · Views: 64
...but there are TWO issues.
a) The bug in Windows.
b) Teensy sometimes drops TX data.

..aaaand finally
(c) a) can looks sometimes like b) but it's just the windows bug. Found no way to reproduce the reliable. But i'm very sure.

However, end of the day, its the same for the user.

It's even worse than you might imagine:
Today I saw coolterm repeating OLD data. it printed active data - then some hundrets of lines old - then the new again.
Apart from the issue that it prints random memory contents.

So, it's not enough to use a counter that repeats from time to time. for tests. Its needed to use a real UID to find problems. And to check that UID as often as possible.

This all leads to nothing. It just means: Don't use CDC for a reliable high speed TX.
 
Or simply use existing protocols. Z-Modem can prevent all these problems. The code is available on the Internet.
It uses its own CRC and repeats a packet if necessary. On the other hand, it slows everything down drastically.
 
I'm going to try to summarize:

PySerial + Teensy 4 sees throughput drop drastically for large sends (Windows & Linux)
PySerial + Teensy 4 sees dropped packets for large sends (Windows & Linux* -- Linux seems to avoid this when Teensy configured with certain send sizes)

So these two issues appear to be Teensy related or PySerial related, but not OS. Based on the comment luni had of seeing very similar issues with C# or Win32 API on Windows, it seems reasonable to conclude the issue is not with PySerial, so that leaves the Teensy as the most likely culprit.

I did another test where I just added a delay in my python script during each receive but this did not fix the dropped data / data corruption issue, which, I think, again points to the Teensy.

Does anyone have ideas on why the transmit speed drops for large sends? I suspicious that this is a clue to what going on.
 
it seems reasonable to conclude the issue is not with PySerial, so that leaves the Teensy as the most likely culprit.
Hm, both Python and all the other shown tests use the underlying Windows CDC driver (usbser.sys). So, it might definitely be something related to this driver. I ordered a STM32F746 evaluation board today which claims to have an USB high speed CDC interface. I'd say testing with another hardware and a different USB stack should clarify on which side of the USB cable the problem is generated.
 
To save you from all the work: that's all investigated already.
For sure, the Teenay knows nothing about Firefox data, certificates, CSS files, HTML, and my emails. Even .lst files. But all that is spit out as garbage.

And I posted links where users of other MCU reported this already. So, not sure what needs to be clearyfied?
 
So, not sure what needs to be clearyfied?

I saw your reports about the printed stuff. It looks like some buffer overrun or something similar on the PC side. So this is certainly a bug. I'm just wondering if this (call it buffer overrun) is the reason for the dropped blocks or a consequence. I just don't know.
I'll have a look at the links you posted. I assume in the Teensyduino beta thread?
 
Yes, the missing blocks is an other issue.
Unfortunately, the windows bug can cause this, too.. not easy to find the difference.
But yes, to test this, an other MCU is useful.
Unfortunately, at the end the day: As long Windows has this problem, its all pretty pointless :-(
Paul posted a screenshot of missing data on MAC if this helps.
You also must make sure that you don't hit the 120ms timeout.
And he posted that he got higher speeds with overclocking on MAC and Linux and the few existing barriers removed. So this says pretty clear Teensy has a problem, too, as higher speeds are only possible with dropped blocks (See MAC issue)
Unfortuantely, his lknecount benchmark is not usable for this kind of tests - watching at scrolling lines with a sermon that is not fast enough makes no sense, too.
 
Last edited:
Hm, both Python and all the other shown tests use the underlying Windows CDC driver (usbser.sys). So, it might definitely be something related to this driver. I ordered a STM32F746 evaluation board today which claims to have an USB high speed CDC interface. I'd say testing with another hardware and a different USB stack should clarify on which side of the USB cable the problem is generated.

Actually, I did a test with Linux using PySerial and it always showed the same throughput drop with larger transfers and it sometimes showed data loss as well, depending on size of the Serial.write() packets on the Teensy. (See above post). Now it is possible the throughput drop is a PySerial thing, but the fact that I'm seeing such similar behavior in Windows and Linux makes me think it isn't a simple Windows driver issue. And since you saw similar stuff with the Win32 API and not PySerial, it seems unlikely that it is a PySerial thing.

That said, a test on different hardware might be interesting. I have an i.MX RT1060 Evaluation Kit (same chip as Teensy 4) and I tried to using their SDK CDC examples to test this and it seemed to show similar behavior but I struggled to get high speed transfer for large datasets without accidentally overwriting the send buffer, so it wasn't super conclusive. But this post maybe points in the direction of hardware.

Frank, I don't understand why you think this has all been fully investigated before. The only link you posted to a discussion was here which is talking about relatively slow USB transmission, not the issue here with high speed transfers of large files. The Windows bug you reference in that post does sound like what is happening here, however, I'm seeing the same problem in Linux, so this isn't (only) a Windows driver bug. Additionally, I've seen no discussion on why the throughput drops to a crawl (again Windows and Linux) which is at least a legit problem / bug and potentially is related to the data corruption problems as well.
 
We have studied this intensively. I have even written a test program that confirms missing packages. Defragster has done hours of tests.
I do not deny that there are missing packages?!?

In the links, in the answers, you'll even find a ZIP file with a log that shows garbage. Not from me...

I can only test Windows. And there it is not so easy because of the additional garbage problem. We have TWO problems here.. said this here in this thread. Post#11.
But the additional barriers helped me 95%. On my computer, with my system. But there are still missing packages. What is the reason for that, I don't know. It may well be that Linux is missing more.
I could only run this on a VM. And that is not useful.

So, yes, I know, and Defragster knows about the missing data. Not sure if Paul knows. I've the impression he denys the existance (on Linux/MAC)
But, currently, new features are important for Paul. I have put enough time into this and even had discussions with Paul about his MAC Screenshot which shows missing data. Can't help more, here, sorry.. don't want.
And I can't give any save information about other OS. Please read the other thread(s).
 
Last edited:
Now it is possible the throughput drop is a PySerial thing, but the fact that I'm seeing such similar behavior in Windows and Linux makes me think it isn't a simple Windows driver issue.

You might be seeing the 120ms transmit timeout, where Teensy believes the PC side has crashed and starts intentionally discarding data so the main program doesn't stall forever in Serial.print().

Any chance you could retry your test with a check for Serial.availableForWrite() added? That function tells you how much space is ready in the USB transmit buffer. If your program waits until there is enough space, you'll avoid the transmit timeout (and if the PC side really does stop accepting data, you wait code will end up waiting...)
 
You might be seeing the 120ms transmit timeout, where Teensy believes the PC side has crashed and starts intentionally discarding data so the main program doesn't stall forever in Serial.print().

Any chance you could retry your test with a check for Serial.availableForWrite() added? That function tells you how much space is ready in the USB transmit buffer. If your program waits until there is enough space, you'll avoid the transmit timeout (and if the PC side really does stop accepting data, you wait code will end up waiting...)

Actually, I left it in the whole time:

Code:
while ( localWordsToSend != localWordsSent) {
      uint packetSize = min((uint) SENDSIZE,localWordsToSend - localWordsSent);
      while ( (uint) Serial.availableForWrite() <  (packetSize * sizeof(data[0])));
      Serial.write( (uint8_t*) &data[localWordsSent], packetSize * sizeof(data[0]));
      localWordsSent += packetSize;
      
      //delayMicroseconds(2500); //necessary to avoid missed data
    }

So all the test I reported, Linux and Windows, is with that in.
 
I can confirm that if you force the timeout (on PC, by adding a >120ms delay) , the garbage problem gets massively worse.
To be clear: Of course there are missing data then. That's the nature of this timeout code. But the timeout leads to more garbage, too.

Edit: But as the test program runs high speed but still shows missing data and garbage - with the availableForWrite() added - it means that things are way more complicated as it may seem. The test program does _not_ lead to timeouts. It was it's purpose to test that.
Then, if you just slow the TX down, it will help for all OS. I guess, if you fix it for Windows , on Teensy side, it will fix possible problems with the other OSs too.
 
Last edited:
..and this bug makes it difficult to detect a timeout on teensy, from the Sketch:
Code:
            if (transmit_previous_timeout) return [COLOR=#b22222]sent[/COLOR];
            if (systick_millis_count - wait_begin_at > TX_TIMEOUT_MSEC) {
                // waited too long, assume the USB host isn't listening
                transmit_previous_timeout = 1;
                return [COLOR=#b22222]sent[/COLOR];
[..]
            }
            if (!usb_configuration) return [COLOR=#b22222]sent[/COLOR];
Why doesn't it return 0 ? - It did not sent.

Edit: Ooops, no it indeed returns 0. But not -1.

However, Teensy 3.x returns -1.
Code:
                    return -1;

Arduino says:

"write() will return the number of bytes written"
So, someone might want to check what write returns. (Edit: On T3.x)
 
Last edited:
..and
Code:
// transmit a character.  0 returned on success, -1 on error
int usb_serial_putchar(uint8_t c)
{
    return usb_serial_write(&c, 1);
}
returns exactly what on which Teensy?
For sure not -1 and not 0 on "success" on T4

Arduino:
"If the transmit buffer is full then Serial.write() will block until there is enough space in the buffer."

Hm, they do not mention a timeout.
 
Last edited:
Status
Not open for further replies.
Back
Top