RAHID performance

Status
Not open for further replies.

DrM

Well-known member
Using RAWHID to send data, and reading with PyUSB, the data transfer rate seems to be about 15usec/byte, or 67 kB/sec.

On the Teensy (3.2, 96MHz), it is simply looping over

Code:
RawHID.send( &buffer[i], 100)
i += 64;

On the python side it is looping over

Code:
barray = self.dev.read(self.endpoint_in.bEndpointAddress, 64, 1000)

pieces.append(barray)

Notice on the python side, I am not using HID, and instead reading the endpoint directly. Also , the python is running on a >3GHz Linux host, so it should be able to keep up.

Using timers outside the python loop, or inside the loop before the append, it adds up to the same 112 milliseconds to transfer 116 packets (7,4 kB). In other words, that is the time spent in the PyUSB read().


Is that what is expected?

Is there a faster way to do this?

Is there a faster, lower level USB interface available with an API?


Thank you
 
Last edited:
IIRC correctly - been some time since reading RawHID posted notes ... or Python ...

RawHID may be 1ms clocked for 512bit packets giving MAX 512Kb/sec is 64 KB /sec
> See post #4 - 64 Byte packets at 1Khz ...

USB Serial can get over 1MB/sec from the T_3.2's 12 Mb/sec USB - see :: pjrc.com/teensy/benchmark_usb_serial_receive.html
>> That is as old as T_3.0 ...

When using USB Serial with python at times it took tuning of the host side code to get to MAX - but that may have been to keep up with 40X faster 480 Mbps T_4.x
 
Last edited:
@defragster The Serial interface is unworkable for this because of the limited buffering in serial drivers, both linux and windows.

My transfers are 7.4K or 15K, (remember that 4k buffer in the driver), and I need to do at least 10 of these per second.

in practice it is close to impossible to not loose data on back to back to serial transfers larger than 4k. And most often it is both un-reliable, and randomly hangs.

The RawHID transfers are reliable and has never hanged, but too slow. I suspect the HID is the problem and what the world needs is a true low level interface to the Teensy.

P/S I just noticed a detail in your post, 1ms clocked??? How do we fix or change it?
 
P/S I just noticed a detail in your post, 1ms clocked??? How do we fix or change it?

T3.2 is a USB-FullSpeed device. According to the USB spec, those devices can request a max HID report polling rate of 1kHz from the host. Each report is limited to 64bytes. Thus the theoretical limit is 62.5 kB/s.
The T4.x boards implement USB High-Speed where the device can request a max polling rate of 8kHz (500kB/s max). Here some test results demonstrating a 500kB/s HID transfer with a T4: https://forum.pjrc.com/threads/63948-USB-2-0-Teensy-port-at-8000-Hz-poll-rate

Using Serial:

My transfers are 7.4K or 15K, (remember that 4k buffer in the driver), and I need to do at least 10 of these per second.
150kB/s max seems to be rather moderate. I would be very surprised if a modern PC has problems with such a low rate. Here some Win10 tests using a T4 at a transmission rate of > 15MB/s. Windows has no problem handling that. https://forum.pjrc.com/threads/58629-Win10-amp-T4-Serial-Communication-Tests. I wouldn't assume Linux is any slower.

Buffer: AFAIK the 4k buffer is only the default value. You can increase the buffer size to 2GB if needed: https://docs.microsoft.com/en-us/do...lport.readbuffersize?view=dotnet-plat-ext-5.0
 
Last edited:
@luni Then it seems the answer is that we need a USB interface that does not involve HID.

Some of that must be in the code base already.
 
@luni Then it seems the answer is that we need a USB interface that does not involve HID.

Some of that must be in the code base already.

As indicated the T_3.2 Serial USB normal interface can reach 1 MB /second - with reliable communication to a host computer: Windows/Mac/Linux can all handle that in Serial Monitor.

They can get a bit spotty somewhere over 6 or 12 MB /sec talking to a T_4.x high speed device. When the computer software isn't up to the task of keeping the buffers processed.

Well tested for T_4.0 when Paul spent a week improving SerMon. And at that rate they should keep up with T_3.6 at full speed - which exceeds what a T_3.2 can put out.
 
@defragster Serial transfer in back to back units larger than the 4K receive buffer in the serial driver is not reliable. It will always depend on speed and load of the host. And, computer science (apologies to Feynman) is full of examples of protocols that address exactly that situation.

Moreover, I have tested it, and lived with it. It has been an unending set of workarounds and kluges. My data frames have to be less than 4K or the experiment will stop in the middle, maybe after hours have been invested. The device that produces 8K frames is dedicated, and it still looses data and hangs inside the serial driver from time to time.

And, in fact, it was on this very forum, that one of our more expert participants acknowledged all of the above, and pointed me to the RawHID API as the solution.

So, now that we understand why RawHID is not completely the answer, the 1ms polling, is there no way around that? Is there no raw USB interface? Or how do we go about creating one?
 
@defragster @luni The example you cited, seems to be about transfer from computer to Teensy. That is not what we are talking about.

We are talking about transfer from Teensy to the host computer. Hence the problem with the 4K receive buffer on the host. And notice I showed RawHID.send() in the code snippets.
 
Last edited:
in practice it is close to impossible to not loose data on back to back to serial transfers larger than 4k. And most often it is both un-reliable, and randomly hangs.

Can you show as an example where it is unreliable? I've never seen that. IF (I really don't think so) there is a problem, we can fix it.

All who stated problems here had a) a too slow host (or too slow software) or b) buggy software on the host or c) their teensy-software was not good.
You might want to look for problems there.
I know of a windows bug, when transfers are too fast. WIndows has a bug in the driver. If you experience this problem, send a little bit slower, or use a more reliable OS.

And if you said that in a more general way: The whole "mailbox" or "fido-net" szene (30--40yrs ago) would'nt had existed.. that time there were serial transfers only. Modems. And yes, for files > 4k...
 
Last edited:
@FrankB I think everyone is not understanding the problem regarding transfers from the Teensy to the host using the serial interface.

Here is an example

https://forum.pjrc.com/threads/62681-USB-Serial-Communication-problems-in-Windows


When the host is a general purpose platform and the receive buffer is only 4K, this is not ever going to be reliable. Period.

It is not something you can fix.


Re your last comment;

The code example in the above link, is as simple as can be, and the host has a 3.4 GHz I5 3570 with 32GB of 1600 memory, normally running Linux.

In that post I mention windows, you will have to trust me that similar results are obtained in Linux though it takes a bit longer to fail.

(I just noticed, that example does one formatted datum per line, so that is potentially even slower on the Teensy side, though I imagine the formatting might be faster than the transfer. I have other examples with raw binary transfers, it does not work better).



P/S I started this thread asking about the RawHID problem. @luni or @defragster answered that, it is the 1ms polling for "full speed" devices. Thank you, that was really helpful.

Now the question is ow to solve the problem. One solution would be if the interface could be set to "high speed", or if we could do larger than 64 byte transfers, perhaps. (I dont know enough about USB protocols to know if that makes sense)

Probably the best solution would be if we could please offer a true, raw USB API so that we can have advantage of the full set of USB capabilities.
 
Last edited:
There is a mention in post #3, not more. later you just say "unreliable".
So if the 4k is you problem - use less? 2x 2k?
 
Using timers outside the python loop, or inside the loop before the append, it adds up to the same 112 milliseconds to transfer 116 packets (7,4 kB). In other words, that is the time spent in the PyUSB read().
Does this not mean, Python is the problem?
I know, Python can be VEEEERY slow.
 
in practice it is close to impossible to not loose data on back to back to serial transfers larger than 4k. And most often it is both un-reliable, and randomly hangs.

Does "in practice" mean actual experience using USB virtual serial on Teensy?

Or is "in practice" general experience with serial communication?

We get this sometimes on this forum, where people mistakenly believe USB virtual serial works the same way as ordinary asynchronous serial communication, using baud rates and start / stop bits (and no RTS/CTS flow control). Indeed the end user appearance is the same as ordinary serial communication with much higher speed.

But in truth the underlying protocols are nothing alike. USB virtual serial uses USB bulk transfer protocol, which uses packets with CRC checks and automatic retransmission, all done at a very low level by the USB hardware. Bulk transfer also always has end-to-end flow control, similar in concept to RTS/CTS, but implemented using USB tokens (similar to fragments of packets) rather than a dedicated wire. If the receiver can't keep up, the transmitter automatically slows, so you don't get data loss. You can't disable the flow control. It's a fundamental part of the USB bulk transfer protocol, always present no matter what you do.

USB virtual serial also has special messages, so the traditional signals like DTR & RTS are emulated. But there too, the actual underlying implementation is a very reliable protocol with CRC checks, ACK & NAK tokens, and automatic retransmission at the hardware level if any communication errors are detected.
 
We are talking about transfer from Teensy to the host computer. Hence the problem with the 4K receive buffer on the host.
You are right.
Anyway, I'd really like to reproduce that. What I understood so far is: If you repeatedly send about ten 15kB blocks per second from a T3.2 to a Win10 PC (python) you sometimes get transmission errors right? Anything else to consider?
 
If you write your own software to receive USB serial data, the buffer size you use has a huge performance impact on Windows and Linux. To get good performance, you need to use a large buffer (32K is diminishing returns) and a non-blocking read which fills as much of the buffer as it can.

You might think a multiple-GHz many-core processor with gigabytes of memory wouldn't limit the performance, but if you use small buffer sizes, or if you do a GUI update for each byte or chunk of data, rather than processing everything and then doing 1 GUI update, you will indeed limit the speed. PCs are fast, but not nearly fast enough to compensate for such inefficient programming.
 
@PaulStoffregen When you say , write your own software to receive USB serial data, and use a buffer upto 32K, what API are you using? Is that available say, in IOCTL? Or do you mean write something at the level of a driver? Can you offer an example?


Aside, ugggh! I just re-did my entire system to use the RawHID (and some housekeeping too, but the interface drives the architecture to some extent because of the performance requirements)
 
P/S @PaulStoffregen "In practice", in this instance, means with the Teensy. I see the data drops and hangs on the said Linux platform receiving large data transfers from the teensy 3.2


I buffer up all of the data, in a dedicated task, and then queue it to another task for further processing and graphics. Turning the other processing off, does not solve the issue. Also, as in the other thread I posted on this, I have tried it stripped down to minimum code.

I agree though that it can easily be performance issues in Pyserial, but it is still a fundamental problem. Not keeping up, should not result in data loss or crashes. I don't recall if I tried writing a c-code using the regular serial interface, I think I might have tried that too. Maybe I'll try it again. I was hoping to keep everything in pure python for portability.


@luni, by all mean, do try it.
 
Here is a collection of only slightly cryptic comments on the problem with serial receive from a Teensy, in general and in pyserial. So, it's not just me.

https://www.xspdf.com/resolution/50011123.html

Teensy is not mentioned on this site ?!?

Anyhow, you see a reliable connection if you just use the Teensy Serial monitor - or other any good Terminal software.
Ok, they don't use Python.

You could try to slow down the speed a bit. There is plenty of room, you're not transmitting that much.
And I'd ask in a python forum, that's a better place.

If you want raw usb -sure, doable - just write the code :)
 
Last edited:
This was the REF code for Teensy unconstrained print to test computer reception speed with USB serial:

github.com/PaulStoffregen/USB-Serial-Print-Speed-Test

Not seeing it run now but seems each print line was 32 bytes

IIRC:
T_3.6 could send 25K + lines/sec :: 0ver 800 KB/sec
maybe closer to 20K on less T_3.x's?

But T_4.0 was reliable up to 200K-250K and beyond over 6.4 MB/sec

Paul did a week of work around JAVA lame code and ended up with Windows at 500K lines and peaks of like 850K that might falter and overwhelm the PC - but that is in the 15 MB/sec region

Paul found Linux was better and could sustain IIRC 900K to 1M lines per second of Teensy writing to computer reliably as he saw it.

Other code was written without GUI display but some validation of the received text and speeds were similar in "C" with buffering as seen here. And indeed at some point the computer mishandled buffers but that was in excess of 700 or 800 K *32byte lines per second

That shows the high end capability of any capable computer reliable over 6 MB gathering Serial USB output from Teensy.

So a T_3.2 pushing one 50th of that should be programmatically doable.
 
@defragster Do you have the host side of that, for Linux? I think the operative word is that it kept up, it was running C, and nothing else was happening.

What happens when the machine is loaded and doesnt get to that undersized buffer in time?


What this tells us is that pyserial is probably not an appropriate way to talk to the Teensy for volume tansfers, I can agree to that. (I am sorry if it seemed like I was defending it actually).

But it tells us also that we need a high priority c-coded task managing the serial interface and interacting with our program through pipes or queues (or shared memory perhaps).

This is what I had in mind at the top of this thread, when I referred to the solution for the serial interface as impractical. Burdensome might be a better word.

The right way to do this is to provide an interface where the issue does not exist to begin with.
 
Last edited:
Maybe you want to look at ancient procotcols like XModem, YModem, ZMOdem that worked great 40 Years ago.
Sourcecodes and descriptions for them should be on the net.
Or write your own - calc a CRC for small blocks and do a re-transmit if needed. There is enough time to do the re-transmit several times.
Teensy 3.2 as hardware CRC (use the fastCRC library) which is pretty fast.
You might want to add a timeout for the case of missing bytes, which I guess is the main problem with slow python.

Or as said - easiest - slow down the tx.
 
@FrankB, I tried slowing down the transmit. It does not work at any reasonable speed. I tried everything one might think of except coding a compiled task to act as a user space intermediate level driver.

I think of it this way, the transfer, absent adequate buffering, creates a hard real time requirement. Soft real-time is not easily going to provide a robust solution.

On a general purpose host, short of writing a driver, the best you can do is approximate a real time solution by running some task at high priority to manage the interface and in effect queue the data to the actual program.

Since there are also commands and responses it becomes quite a bit of work.

And all of that is for a problem that need not exist to begin with.
 
Well the Teensy Monitor does work too - how can this be? :) If I take your works, it should'nt be possible. :)


Here is a description of YModem, from 1987. http://pauillac.inria.fr/~doligez/zmodem/ymodem.txt
It worked over flaky acoustic-coupler / modem telephone lines. Missing bytes and even more wrong bytes were normal. (as USB uses a CRC itself you shouldn't see wrong data)
Should be more than enough .. and it is simple.

And... review your code. I'm beginning to think that it just has bugs... more and more obvious.
 
Status
Not open for further replies.
Back
Top