Serial.write xmit dropping data

Status
Not open for further replies.
and now take a look here:

void loop() {
Serial.print("count=");
Serial.print(count);
Serial.print(", lines/sec=");
Serial.println(count_per_second);
count = count + 1;
uint32_t msec = millis();
if (msec - prior_msec > 1000) {
// when 1 second as elapsed, update the lines/sec count
prior_msec = prior_msec + 1000;
count_per_second = count - prior_count;
prior_count = count;
}
}

It does not check if it was printed. In case of a timeout it perhaps should log "timeout" somewhere.
(My test has the same issue)

You can probably use the return-value for your TX to python. If 0 there is a problem and you should probably re-send the last xyz Bytes.
 
Last edited:
I think I've got success working around (at least) parts of the issue in Win10. It turned out that the data dropping only occurs if the teensy is sending freely into the Win10 host. As documented in Franks links this was observed elsewhere and was already reported to MS without success.

Things change if the host initiates the transfer of a datablock. This seems to work perfectly. I was able to send hundreds of Megabytes without errors at about 13MBytes per seconds. There are no artificial delays in the code and the 13MByte per second do not depend on the block size. Interestingly when I did first experiments with a freely sending Teensy in summer, I needed to throttle the sending speed to about the same 13MB/s to get error free transmission.

Here what I did on the teensy side:

  • Wait for a 'start block' command including the requested blocksize from the PC
  • Send this block as quickly as possible using a simple Serial.write(buffer, blocksize)
  • print the return value of Serial.write and the calculated transmission rate on a debug output (SerialUSB1)
  • Wait for next start command

Code:
Stream& dbg = SerialUSB1;

EXTMEM uint32_t block[512 * 1024];  // 2MB

void setup()
{
    for (unsigned i = 0; i < sizeof(block) / sizeof(block[0]); i++)  // increasing number in buffer to check transmission
    {
        block[i] = i;
    }

    while (!SerialUSB1) {}
    dbg.println("Waiting for start command...");
    Serial.setTimeout(100);
}

uint32_t blockLSize;

void loop()
{
    if (Serial.read() == 's') // wait for start command
    {
        if (Serial.readBytes((char*)&blockLSize, sizeof(blockLSize)) != sizeof(blockLSize)) return; // read the requested blocksize from the PC
        dbg.printf("\nRequested: %.1f kBytes\n", blockLSize / 1024.0f);

        uint32_t t0 = micros();
        uint sent   = Serial.write((uint8_t*)block, blockLSize);
        uint32_t dt = micros() - t0;

        float rate = 1E6 * sent / dt / 1024 / 1024;
        dbg.printf("sent: %.1f, dt: %d µs, bytes rate: %.1f MB/s \n", sent / 1024.0f, dt, rate);
    }
}

On the PC side (C#) I request a large amount of data (some 100MB) from the Teensy. For the tests I split the requests in blocks of 100kB up to 1MB size and store the results in a large buffer. For verification of the data I simple check for consecuti e numbers in the buffer after the transfer is finished.

Code:
using Microsoft.Win32.SafeHandles;
using System;
using System.Diagnostics;
using System.IO;
using static SerialSpeedPI.NativeMethods;


namespace SerialSpeedPI
{
    class Program
    {
        static void Main(string[] args)
        {
            SafeFileHandle file = CreateFile(@"\\.\COM17", GENERIC_READ | GENERIC_WRITE, 0, IntPtr.Zero, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, IntPtr.Zero);

            const int totalSize = 100_000_000;
            const int blockSize = 1_000_000;
            byte[] buffer = new byte[totalSize];

            int total = 0;
            var sw = new Stopwatch();

            using (FileStream lpt = new FileStream(file, FileAccess.ReadWrite))
            {
                sw.Start();
                while (total < totalSize)
                {
                    int requestSize = Math.Min(blockSize, totalSize - total);
                    Console.Write($"requested: {requestSize} Bytes | ");

                    lpt.WriteByte((byte)'s');
                    lpt.Write(BitConverter.GetBytes(requestSize), 0, 4);
                    lpt.Flush();

                    int read = lpt.Read(buffer, total, requestSize);  // read in data in large chunks
                    total += read;
                    Console.WriteLine($"got: {read,8} Bytes | total:{total,7} Bytes");
                }
                double dt = sw.ElapsedMilliseconds / 1000.0;
                Console.WriteLine($"---------------------------------------------\n");
                Console.WriteLine($"total requested: {totalSize / 1024 / 1024.0:0.##} MB; total received: {total / 1024 / 1024.0:0.##} MB; rate: {total / dt / 1024.0 / 1024.0:0.#} MB/s\n");

                Console.WriteLine("checking buffer");

                int i = 0;
                while (i < totalSize)
                {
                    UInt32 oldVal = 0;

                    Console.Write($"{i} - {i + blockSize}");

                    for (int j = 4; j < blockSize; j += 4)
                    {
                        UInt32 val = BitConverter.ToUInt32(buffer, i+j);
                        UInt32 delta = val - oldVal;
                        oldVal = val;
                        if (delta != 1)
                        {
                            Console.WriteLine($"Error {delta} @{i+j}");
                            break;
                        }
                    }
                    i += blockSize;
                    Console.WriteLine("  OK");
                }
            }
        }
    }
}

This won't fix the issues if one needs to listen to a freely sending Teensy (Sermon, HTerm..) but it is certainly a feasible pattern to get data quickly from the Teensy to a Win10 PC. Next step is to check how the high level access via the normal C# SerialPort class behaves...

Would be interesting to see if this pattern also helps with python.

Output on the PC side:
Code:
....
requested: 1000000 Bytes | got:  1000000 Bytes | total:86000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:87000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:88000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:89000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:90000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:91000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:92000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:93000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:94000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:95000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:96000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:97000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:98000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:99000000 Bytes
requested: 1000000 Bytes | got:  1000000 Bytes | total:100000000 Bytes
---------------------------------------------

total requested: 95.37 MB; total received: 95.37 MB; rate: 12.9 MB/s

checking buffer
0 - 1000000  OK
1000000 - 2000000  OK
2000000 - 3000000  OK
3000000 - 4000000  OK
4000000 - 5000000  OK
5000000 - 6000000  OK
6000000 - 7000000  OK
7000000 - 8000000  OK
...

On the Teensy Side:
Code:
Requested: 976.6 kBytes
sent: 976.6, dt: 75335 µs, bytes rate: 12.7 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 73509 µs, bytes rate: 13.0 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 72114 µs, bytes rate: 13.2 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 69912 µs, bytes rate: 13.6 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 78350 µs, bytes rate: 12.2 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 76489 µs, bytes rate: 12.5 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 69389 µs, bytes rate: 13.7 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 70599 µs, bytes rate: 13.5 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 79616 µs, bytes rate: 12.0 MB/s 

Requested: 976.6 kBytes
sent: 976.6, dt: 78576 µs, bytes rate: 12.1 MB/s 
....
 
Last edited:
Here a measurement of the transferrate vs Blocksize from 1000 Bytes up to 1E6 Bytes per transfer block for a total transfer size of some 10MB (10x10^6 Bytes). No transmit error detected.

Screenshot 2021-10-08 210749.jpg
 
Nice :)

now, add a CRC and a block-counter, transmit as fast as possible and re-request broken blocks only.
Windows can do 2.5x the speed.
 
luni -- That's amazing. I swear I did that same test and it didn't work. But I tried your Teensy code with a python script and I'm able to reproduce the same behavior: 30MBPS for each data transmit and about 15MBPS total when looping the sends. So your solution works with Python / PySerial on Windows and Linux. I'll have to look more closely at my Teensy code to see why it didn't work.
 
I did some more testing, this time with Linux to hopefully rule out the Windows comm port bug. Using my latest Teensy code, if I try sends of different size, I get:

Code:
Length of data received: 400000/400000
400000 words successfully received, 1.6 MBytes in 0.05 seconds (33.16MBPS)

Length of data received: 800000/800000
800000 words successfully received, 3.2 MBytes in 0.19 seconds (16.99MBPS)

Error, got only 2560 bytes, but expecting 4096, received 1034240 of 1040000 bytes total
Length of data received: 1034240/1040000
Error, data returned not as expected. 2496 data points of 1040000 are incorrect (0.240000%)
[1031744 1031745 1031746 ... 1034237 1034238 1034239]
1034240 words received, 4.13696 MBytes in 2.28 seconds (1.81MBPS)

Basically, sending 400,000 32bit words works great. 800,000 words drops the speed to half and 1,040,000 words gives data errors and a drop in speed. I used wireshark to look at the USB packets and to my unsophisticated eye, they look fine. The 400k words takes 45ms to send, 800k takes 100ms to send -- so about the same speed down the wire despite what I see in the python program. So that maybe implies the delay is now the USB, but the host side driver. The 1M word send takes 125ms to complete, which again is about 32MBPS and not the 1.8MBPS I see in my program.

So then I went back to my original firmware (attached in my first post) that uses 16bit words and for whatever reason does not have data corruption / dropped packets on Linux, but does show things slowing now. With that code, I see:
Code:
Length of data received: 400000/400000
400000 words successfully received, 0.8 MBytes in 0.02 seconds (32.71MBPS)

Length of data received: 800000/800000
800000 words successfully received, 1.6 MBytes in 0.07 seconds (22.10MBPS)

Length of data received: 3200000/3200000
3200000 words successfully received, 6.4 MBytes in 1.36 seconds (4.70MBPS)

So basically full speed, half speed and 1/10 speed in throughput. When I look at the USB packets then the USB transmit does match the time I see -- I'm seeing more 64 byte URB_BULK in packets between the 1088 byte URB_BULK in data packets and the transmit time drops.

Maybe someone can learn more the USB packets than I can, so they are attached in pcapng formap (Wireshark can open them). The files are:

LinuxUSB-400000-Good.pcapng -- 32bit words, 400k words, full speed
LinuxUSB-800000-Slow.pcapng -- 32bit words, 800k words, lower speed
LinuxUSB-1040000-Corrupt.pcapng -- 32bit words, 1040k words, dropped data and corruption
LinuxUSB-OrigCode-400000-Good.pcapng -- 16bit words, 400k words, full speed
LinuxUSB-OrigCode-800000-Slow.pcapng -- 16bit words, 800k words, lower speed
LinuxUSB-OrigCode-3200000-VerySlow.pcapng -- 16bit words, 3200k words, 1 tenth the speed

I zipped the files (bzip compression to fit) into 2 files -- one with the first 3 steps (32bit words, newer firmware code) and the other with 16bit words and the original firmware.
 

Attachments

  • LinuxUSBPackets-32bitWords.zip
    1.3 MB · Views: 62
  • LinuxUSBPackets-Orig-16bitWords.zip
    1.1 MB · Views: 62
Last edited:
I analyzed the USB packets I posted in https://forum.pjrc.com/threads/68386-Teensyduino-1-56-Beta-2?p=290757&viewfull=1#post290757 (not sure if I should be posting here, in the 1-56-Beta-1 thread, the 1-56-Beta-2 thread, or a new thread about Serial speed -- please let me know if I should post elsewhere).

I plotted the down-the-wire USB speed vs time for both the short data transfer (0.8MB) and the long one (3.2MB). What I found surprise is that the large data set started to slow down after only 20ms while the short data set ran at full speed for 40ms. Meaning the problem starts happening before any buffers are getting full. Just knowing that there will be more bytes coming somehow makes there be a problem.

SpeedVsTimeZoom.pngSpeedVsTimeFull.png

As a quick reminder, this was generated using the original firmware posted at the top of this thread and under Linux. There are no data errors, but the transfer speed as you can see plummets. I believe the same problem happens in Windows, however, other Windows bugs make it less clean an environment to look at the data.
 
Ok, sorry, please ignore the previous post's data -- I accident used the wrong data set. Here's data that shows the (somewhat averaged) instantaneous data rate vs Bytes sent for two sends: 0.8MB and 6.4MB. I've reported it both with the OS measuring the timing and by looking at the USB packets down the wire. Now all the data is about the same -- how the timing is measured does not make much difference and sending a smaller file follows the speed curve as sending a longer file.

SpeedVsTime-Good.png

P.S. In the previous graph, I was showing two different runs with a different amount of data sent.
 
This is just speculation, but I wonder if once some buffer fills up, the Teensy waits until it can send more data, but as soon as it can, it sends a small amount, and then sends another small amount since that's all that's empty in the buffer and these small transfers aren't efficient and if it we were to just wait longer and then send a long burst, it could be much more efficient. And then luni's solution of requesting the data in chucks effectively is doing this delay.
 
I took a longer data set and there is a sudden change in rate at 4.8MB (this number is not consistent shot to shot -- sometimes it happens at 5.5MB for example). I've attached plots on both linear and log scales to show how slow the data rate gets. At the end of 9.5MB, I'm getting 0.2MBPS, but it is still dropping exponentially (linear on the log scale).SpeedDropOverTime.pngSpeedDropOverTime-LogScale.png
 
For completeness, here are the same plots, but tested under Windows. Some data was dropped / corrupted, but I just ignored it. The weird discontinuity at ~5MB is gone, and the rate dropping isn't as clearly linear on log scale, but the general shape and speed is about the same with the data rate going to 0.2MBPS after 5MB or so.

In Linux, I need to do my Serial.writes in a loop with 2048 bytes per write or else I get dropped data. In this Windows data, I kept that, but I found that going to 4096 bytes per send is better in Windows -- it stretches the x axis a bit so it sends at higher speeds for a little longer.

WindowsDataSlowLinear.pngWindowsDataSlowLog.png
 
On the 1.56b1 thread the serial issue testing has ongoing testing results posted.

Indeed Windows (on this machine) can receive at rates over 400 or 500 Mbps as suggested by T_4.1 send rate below. Just saw this on starting the Windows side 'c' code .EXE:
Code:
X_Lines-Delta: 33835. Received: 3800000 Bytes in 60426 us = [B]503.09 Mbps[/B]
Lines-Delta: 100096. Received: 3800000 Bytes in 134677 us = 225.73 Mbps
        stats: 100K=1 less=0 repeated 0 rmax=2000000
..................................................
Lines-Delta: 100096. Received: 3800000 Bytes in 229703 us = 132.34 Mbps
.............^C
Code:
X_Lines-Delta: 37628. Received: 3800000 Bytes in 65136 us = [B]466.72 Mbps[/B]
 bytes: 40.
Lines-Delta: 100096. Received: 3800000 Bytes in 135698 us = 224.03 Mbps

That is two starts of the EXE - the High rates are before the 'c' synchronizes to the data transmission - flowing IN PROGRESS from the T_4.1. The "X_" says the buffer was not parsed for a complete set of 100,096 packets between those strings reporting the "Mbps".

The transmission blocks are each 3.8 MB and then without delay the T_4.1 repeats that and measures the transmit time per block of the 100,096 packets.

Once the 'c' code starts parsing the data it slows and backs the data up in Windows buffers. The 'c' code has a 2MB read buffer and reads are usually all returning 2MB - so it is not keeping up with the incoming data stream even at the 224 Mbps rate.
>> Posted on the Beta 1 thread - the T_4.1 can add significant delay before the 'c' code sees a drop in its ability to keep up with the incoming data. Adding a 50ms delay during send block still comes in over 200 Mbps, making that delay 150ms slowed incoming data to 140 Mbps with transmit time of 180 ms
-> this suggests that Windows is telling the T_4.1 to slow the send by a similar amount once it has filled as much buffered data as it is prepared to.

Adding code to: while ( 2MB read ) { 'read request 2MB', discard, sum the amount read } // Note a single console 'c' print in that loop slows the read rate enough to extend the loop as the buffered data refills!
>> The extent of that Windows buffering is 50MB to over 100MB using that while() to measure!
>> Once data backs up to that magnitude buffers received can be garbled or even system garbage data - but it can work for 24 hours at a steady stream of 220 Mbps with few or no errors, but it is working on buffered data that is maybe 7 seconds old.

Taking this back to the Beta 1 thread as summry ...
 
Ardvark,

at this point, you probably should take a look at the teensy sourcecode.
It's not that complicated, if you just ignore the function calls in other sourcefiles.
Interesting is the timeout-code. It's the only point where it intentionally drops data.

Edit:
And yes, I think too that the reason why lunis idea works is that it results in pauses.
A delay() does the same but his method is at least a bit more speed dependend and better than a dumb delay().

As said, a CRC in each 2048 bytes packet will give you max speed, if done right. And it fixes the Windows garbage, too. (don't forget the packet counter, too to detect dropped packets)
 
Last edited:
For completeness, here are the same plots, but tested under Windows. Some data was dropped / corrupted, but I just ignored it. The weird discontinuity at ~5MB is gone, and the rate dropping isn't as clearly linear on log scale, but the general shape and speed is about the same with the data rate going to 0.2MBPS after 5MB or so.

In Linux, I need to do my Serial.writes in a loop with 2048 bytes per write or else I get dropped data. In this Windows data, I kept that, but I found that going to 4096 bytes per send is better in Windows -- it stretches the x axis a bit so it sends at higher speeds for a little longer.

I don't know much about Python, so an honest question: are you sure that this is not a python thing? E.g. in #1 I see this:

Code:
 data += packet

I don't know how python handles such things but in other high level languages this can generate a permanent copying of your data buffer. The time penalty will get worse if the buffer grows. What happens to the transfer rate if you don't append your received packages to data? Can you preallocate a data buffer in python?
 
I don't know much about Python, so an honest question: are you sure that this is not a python thing? E.g. in #1 I see this:

Code:
 data += packet

Well, this is super embarrassing. You are 100% correct and I've been chasing my own tail. I stopped storing the data and I can hold 32MBPS throughput for 100's of MB with no slow down. I added functionality to test if the data is right (per packet, not at the end, so still not storing large data) and in Linux with a 2048 writes, I get no data errors and ~16MBPS for any size of data. I'm sure that 32 -> 16 MBPS is from my python code checking the data in the loop. So that's great.

So, Linux works 100% as desired / expected with reading the data in packets of size 2048 bytes or less. (2048 maximizes speed, but that too may be an artifact of my code since it does more data checking with smaller packet sizes). I went to 4096 byte packets and I get data errors. Somewhere between 3800 and 4000 bytes seems to cause a problem. Not sure what's going on there, but receiving small(ish) packets isn't hard. (And that may be a python / pySerial bug)

As for Windows, other data corruption bugs prevent me for testing the data integrity, but the speed is basically the same: 32MBPS if I don't test the data, 16 MBPS if I do (and ignore it, since it is often bad).

Again, sorry for all the distractions, all the speed stuff was an artifact of my code. Good catch Luni!
 
Status
Not open for further replies.
Back
Top