Win10 & T4 Serial Communication Tests

Status
Not open for further replies.

luni

Well-known member
I did a quick Win10 app to test the serial communication speed of a T4.0 and a T3.6 under somehow real life conditions. The app continuously downloads a block of 25kB to the Teensy. The teensy copies the received data to a buffer and calculates and checks a simple checksum. The download speed is measured by calculating the ratio of the download size and the overall download time including all sending/receiving, buffer copying and checksum calculation overhead.

Here the pretty amazing results:
Code:
T3.6  (TD 1.48):          0.8 MByte/sec
T4.0  (TD 1.48):          3.0 MByte/sec
T4.0  (Current Core)     13.5 MByte/sec

I didn't observe any checksum error during about 30GB download volume.
The Win10 app (C#) only uses standard serial sending routines

Looks like the download rate is limited by the Teensy receiving code. Using single byte reads instead of copying the whole block of available data with readBytes() significantly reduces the speed.
Source code and precompiled binaries are available on GitHub https://github.com/luni64/SerialTester

speedtest.jpg
 
Last edited:
Thanks for testing. Really good to see the new USB code is running faster!

If you're feeling like experimenting, any chance you might try running this with different numbers of buffers inside usb_serial.c. This is the line to edit.

Code:
#define RX_NUM  8

Also if you can add a delay for the "work" required to actually use the incoming data for something more than just computing a CRC check, would be really interesting to see how extra buffers affect the overall speed when your program is spending time doing other stuff.

I should admit, I picked 8 buffers quite arbitrarily. It's only 4K of RAM. Maybe 4K is a good default? Or maybe it should be higher?
 
I can certainly try. However, my gut feeling is that download speed is currently limited by the maximum 512 byte reported by Serial.available(). Here the receiving function. (Sent text has a '\0' for EOF, buffer lives in OCRAM). (Complete sketch: https://github.com/luni64/SerialTester/blob/master/src/firmware/echo/src/main.cpp)

Code:
int32_t readString()
{
    char* p = buffer;
    char* bufEnd = buffer + bufLen;

    while (true)
    {
        size_t av = Serial.available();
        while (av > 0 && p + av < bufEnd)
        {
            p += Serial.readBytes(p, av);
            if (*(p - 1) == 0)
            {
                return p - buffer;
            }
            av = Serial.available();
        }
        if (av > 0) return -1; // buffer overrun

        // blink while waiting for data
        static elapsedMillis stopwatch = 0;
        if (stopwatch > 250)
        {
            stopwatch = 0;
            digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
        }
    }
}
 
@luni - Nice. I took a quick look 'software' - 'firmware' - looks like usable files are there somewhere for both CPU's :) I grabbed the first line of the latin text 930+ bytes and threw that in the short send I was doing at 600KB/sec - two prints of that took it to 10MB+/sec - didn't disturb my minimal parse to view T4 feedback once a second to see that.

Anyhow - my Question: your sketch is a main.cpp. What happens to your building - if you put an empty src.INO file in that folder? With the folder as src and the 'sketch' is main.cpp the IDE would open SRC.ino and since the main.cpp is there it will build it too and give a working file like : "T:\\TEMP\\arduino_build_382146/src.ino.TEENSY40.hex"

This would allow IDE building of your ZIP with the method you use - it worked here.

Opened VS 2019 and (debug) built your solution and saw 4.68 through HUB or 5.18 MB/sec on front port - this is with TD1.49b1 which should be current with latest github USB update 7 days back.
 
@Paul: Tested with RX_NUM = 32. Speed went up from 13.5 to 13.6 but that might as well be some other effect or just by chance.

@Defragster: I cloned the core files from Pauls gitHub repo yesterday. There was some activity on that repo in the last days, so maybe your core is somehow outdated? There is a .hex file in the firmware folder. This is compiled with the latest core.
Regarding empty *.ino. Never tried that, i usually rename main.cpp to main.ino if I want to compile with the IDE. I'll try your trick later and update the repo accordingly.
 
@Paul: Tested with RX_NUM = 32. Speed went up from 13.5 to 13.6 but that might as well be some other effect or just by chance.

@Defragster: I cloned the core files from Pauls gitHub repo yesterday. There was some activity on that repo in the last days, so maybe your core is somehow outdated? There is a .hex file in the firmware folder. This is compiled with the latest core.
Regarding empty *.ino. Never tried that, i usually rename main.cpp to main.ino if I want to compile with the IDE. I'll try your trick later and update the repo accordingly.

I did use you inbuilt firmware upload - same speed 4.86? Seems to go to Panic flash after 1 iteration with send of 0.02MB?

It finds a single T4 nicely on hub or left front port - but not on right front port - though other code does - and TyComm does in bootloader.

Interesting you have UPLOAD in the code - but don't recognize a Teensy in bootloader mode - at least as tried.

USB change looked like 7 days - the recent code picked up was other edits from KurtE. But did try included HEX, cool trick.
 
If your program spends time doing work to "consume" the incoming data, like a blocking write to SPI for 4-wire addressable LEDs, those buffers will allow USB to receive the next incoming data while you work on using the data you've recently read. If you get 4096 from Serial.available(), that's a sure sign it filled all the buffers with maximum size packets while you were working. If the USB host sends smaller packets, the USB code tries to combine them, but only in fairly simple ways. A number of available bytes less than 4096 could still mean all the buffers were used.

This is the really tricky part about benchmarking. If the benchmark doesn't account for the time a "typical" program would spend to actually use the incoming data, of course the results will be faster and very little of the buffering will ever be used. But does that really show the sort of speed people can expect in applications where their code has to do real work with the incoming data? My focus isn't just on the best speed for benchmarks, but making sure USB performs well when people write simple programs that aren't optimal.

My main concern is what size buffer should be the default in the upcoming 1.49 release.
 
So, I added a background worker which delays (1ms) each second. Not much but this already generates problems.

Code:
IntervalTimer timer;

void worker()
{
   // delay(1);
    Serial1.println(millis());
}

void setup()
{
    timer.begin(worker, 1'000'000);
    timer.priority(128);

    pinMode(LED_BUILTIN, OUTPUT);
    Serial1.begin(230400);
    Serial1.println("Start");
}

As soon as I uncomment the delay(1) it starts to choke and looses data. Regardless of the RC_NUM setting (8, 32, 64). This is somehow strange since 64x512 = 32kB which should be enough at 15MB / sec. and a delay of 1ms, right?

Strange thing is that windows reliably stops sending if Teensy doesn't read the date from the USB. Can it be that the data get lost somewhere on the Teensy side? Might also be that I have a bug somewhere, can have a closer look in the evening.

Edit: delayMicroseconds(200) in the worker is OK. 300µs already generate data loss. 300µs is quite some time but should not generate an issue at the observed data rate?
 
Interesting you have UPLOAD in the code - but don't recognize a Teensy in bootloader mode - at least as tried.

Yes, the upload button was just a quick hack, I can fix that later today. Strange that you get transmission issues. Runs perfectly here (as long as I don't add more than 200µs delay). With or without a hub.
 
Strange thing is that windows reliably stops sending if Teensy doesn't read the date from the USB. Can it be that the data get lost somewhere on the Teensy side?

Difficult to say when I can't run it here. This USB code is pretty new, so it's quite possible there may be undiscovered bugs.

If you can give me a test case to run here, I can try investigating. I can watch the actual USB communication with a protocol analyzer.
 
The app I sent in #1 is just sending that 25kB string over and over. I can do a more simplistic console application if that helps? But let me check my code first, might be some bug on my side as well.
 
Oh my... :eek:
This was a silly bug on the Teensy side. I was for some reason assuming that the transmitted 25kB are always synchronized with the data in the USB buffers. I fixed that and get the following result:

Max transmission speed: 17 MByte/sec
  • This is just copying the received data to some buffer, checking size and checksum of the received data.
  • If I connect additional devices (i.e. a USB serial cable) the transmission rate goes down by about 1MByte / sec per additional device.
  • I transferred some 100GB without error.
  • Number of RX buffers makes no difference (tested 8 - 64)
To simulate some load I placed a delay(n) after reading a received 25kB block.
  • The transmission speed of course goes down but the transmission stays absolutely stable, I observed no lost or corrupted data.
  • I tested delays from 1ms up to 1s, no problem,no lost data. Looks like the handshake is working perferctly and Windows is kindly waiting until the Teensy buffers are free again. Again, I did not do anything special on the windows side, just writing out that 25kB string in a loop. At 0.1s delay the transfer rate goes down to 240kB/s with 8 RX buffers and stays the same with 64 RX buffers (which absolutely makes sense since the sender stops sending in case of full buffers)

So far everything looks perfect. Since Win10 stops sending in case of full Teensy buffers the number of RX buffers seems to be not very important in this case.

@Paul: let me know if you want something special tested

Here the used teensy code (Win app still the same as in #1)
Code:
#include "Arduino.h"

constexpr size_t bufLen = 1024 * 30;
char* buffer = new char[bufLen];

void checkBuf(size_t cnt);
void panic();
void yield() {} // doesn't make a big difference...

void setup()
{
    pinMode(LED_BUILTIN, OUTPUT);
    Serial1.begin(230400);
    Serial1.println("Start");  // debug info on Serial1
}

void loop()
{
    if (Serial.available() > 0)
    {
        int cnt = Serial.readBytes(buffer, 1024 * 25);  // Just read in 25kB
        checkBuf(cnt);                                  // check for correct size, terminator and checksum
        delay(100);   // <------  Change this to simulate load
    }

    static elapsedMillis stopwatch = 0;  // blink
    if (stopwatch > 250)
    {
        stopwatch = 0;
        digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
    }
}

void checkBuf(size_t cnt)
{
    if (cnt != 1024 * 25)
    {
        Serial1.println("Wrong packet size");
        panic();
    }

    if (buffer[cnt - 1] != '\0')
    {
        Serial1.println("Terminator missing");
        panic();
    }

    int chkSum = 0;
    for (size_t i = 0; i < cnt; i++)
    {
        chkSum += buffer[i];
    }

    if (chkSum != 2400596)
    {
        Serial1.println("Checksum Error");
        panic();
    }
}

void panic()
{
    while (1)
    {
        digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
        delay(25);
    }
}
 
Really good to know it's working well. I've already started on the other USB types with this receive code. Will probably start pushing code to github later today.

Looks like 8 buffers will be a pretty reasonable default then?
 
That sounds great Paul. Other simple test done here looked good and reliable - even if it was not built for speed - it was reliable. And as noted dumping some @luni_LATIN text quickly pushed to 10 MB Rx in serialEvent() without breaking the other reverse LPS on receive test I wrote.

@luni - Using new sketch code above I still get low MB/s @4.94 and then PANIC after 1 pass? Maybe the VS2019 build I'm getting is troubled or diff USB timing here? Didn't look close but putting exit on PANIC shows that is where it is going. If Github gets update for your current let me know and I'll look again.

Also with src.ino in that folder the WIN BATCH build I do works to trigger Arduino IDE normal building of main.cpp from the SublimeText when the Frank_B style Compile.Cmd is done with the TSET tool I made. Hopefully you can create that empty file in your system and ignore it, then your sketch build will be IDE compatible - even if always called "SRC" :)
 
Status
Not open for further replies.
Back
Top