Serial (USB) transmission time inconsistency

Status
Not open for further replies.

Siridakis

Member
Hello.

I'm working on a project where I want to sample some sensors periodically and one of them is a KY-037 (which is basically an electret microphone).
For now I'd want to continuously sample and transmit the data to a PC via USB cable (so, using "Serial" methods).
I'm using a 50kHz sampling frequency to sample from the KY037 and I know that it's a lot if I want to transmit all this data in real time, so I began some tests to see if it's possible.
I'm using TEENSY3.6 by the way.

The current test (which I'll post the code below) uses 2 buffers to storage the audio samples, transmitting one while the other one is being filled and so on. I'd like to check if I could transmit faster than the time it takes for the other buffer to be full (meaning I could keep doing this and wouldn't lose any samples). I'm also reading data from a BME280 sensor and transmitting along with every audio buffer transmission.

I did some measurements with a Tektronix DPO2012B Scope and it mostly works fine, but every few seconds something weird occurs. Normally each transmission takes a short amount of time, less than it takes to fill a buffer (say, for instance, about 849us each transmission) but then, every few seconds (around 7 to 12s) there's one transmission that takes over 80ms! Sometimes over 100ms! And this is where things don't work as intended, because in this time the next buffer fills up and I begin losing samples until the transmission is finished.

I'll attach the code below. I tried to add as much comments and perhaps a better explanation of the problem in the code itself. I apologize if some comments seems obvious, I just wanted to make it as simple to understand as I could.

I'll also attach some Scope shots I took.
Print 1 shows the normal behaviour: Each yellow (CH1) pulse is a sample (analogRead()) and each blue (CH2) pulse is buffer1 being transmitted. This print shows that the sampling continues as the buffer is being transmitted (because the transmission is faster then the time it takes to fill the second buffer).
Print 2 shows the abnormal behaviour. The blue pulse takes way longer (844.9us normally and then 86.8ms when this happens). The sampling continues until the next buffer is full and then stops until the end of this transmission.
These prints where taken with the following test parameters: buffer size = 128 samples, Fs = 50 kHz. But the same kind of problem occurs with bigger buffer size and/or lower sampling frequency.


By reading https://www.pjrc.com/teensy/td_serial.html#txbuffer I've suspected this has something to do with the way data is transmitted using Serial methods (buffered first, then transmitted later), but these tests were made adding Serial.send_now() or Serial.flush() and the behaviour is the same.

Can someone please help me understand what's happening and how I can avoid this?

Any help is appreciated. Thanks in Advance.
 

Attachments

  • KY037_BME280.ino
    9 KB · Views: 75
  • print1.jpg
    print1.jpg
    89.8 KB · Views: 94
  • print2.jpg
    print2.jpg
    102.2 KB · Views: 104
USB packet data size is 64 bytes - conforming to that may help. AFAIK Teensy holds sending until buffer is full (with a short delay before sending partial buffer) - so sending perpetual blocks of 32 bytes in total will use the Teensy internal buffering and allow it to keep the line full to the PC.

In text above doesn't show size of the data element - is it 2 bytes?

Also what is the PC connection point? USB3 direct port likely faster than through a Hub.

Also the PC's receiver has to be efficient. If it does not properly take and remove the packets it will slow the Teensy changing the throughput.

Seeing about 18,500 packets of 63 bytes+NewLine with T_3.6 to USB3 port, and under 14,000/sec through a hub. Dropping to 32 byte .prints - the number printed per second doubles to 37K.

IIRC .send_now and .flush aren't generally helpful. They can force partial packet sending and can hold up the sketch.
 
Hi, thank you so much for replying.

In text above doesn't show size of the data element - is it 2 bytes?
The data sampled with the adc is being stored in a buffer of uint16_t, so 2 bytes. But I'm sending 128 samples (and using println, so I believe it adds 1 byte to every sample sent). I'm also sending some string tags before and after those 128 samples.
Along with all that I'm sending other tags with other data from the BME280 sensor (floats in this case). Here's a snippet from the code I've attached:
Code:
      Serial.println("[a]");
      for(j=0;j<N;j++)
      {
        Serial.println(buff1[j]);
      }
      Serial.println("[/a]");
      Serial.println("[T]");
      Serial.println(bme.getTemperature_C(),2);
      Serial.println("[/T]");
      Serial.println("[H]");
      Serial.println(bme.getHumidity_RH(),2);
      Serial.println("[/H]");
      Serial.println("[P]");
      Serial.println(bme.getPressure_Pa(),6);
      Serial.println("[/P]");
      Serial.send_now();


Also what is the PC connection point? USB3 direct port likely faster than through a Hub.
USB3 direct port, no Hubs. Sometimes I've used a USB2 port, but I've read that Teensy always uses full-speed (12Mbit/s). Isn't that achievable through USB2?


Also the PC's receiver has to be efficient. If it does not properly take and remove the packets it will slow the Teensy changing the throughput.
I'm currently receiveng through the Arduino serial monitor (and will probably use a Processing sketch later).


Seeing about 18,500 packets of 63 bytes+NewLine with T_3.6 to USB3 port, and under 14,000/sec through a hub. Dropping to 32 byte .prints - the number printed per second doubles to 37K.
Not sure if I understand. So it's best to send 32 bytes than sending 64?
I think I don't really get what defines a packet here. Is each call to Serial.print a packet? (As in, do I need to have all the 32 bytes in a single call?).


Seeing about 18,500 packets of 63 bytes+NewLine with T_3.6 to USB3 port, and under 14,000/sec through a hub. Dropping to 32 byte .prints - the number printed per second doubles to 37K.
If this is a test you've made, could you please post the code so I can reproduce it here and maybe learn a bit more about what I've said earlier (like what defines a packet)?


IIRC .send_now and .flush aren't generally helpful. They can force partial packet sending and can hold up the sketch.
I thought using Serial.send_now would prompt the USB buffer to be sent now instead of waiting for it to fill up even more and then holding up the sketch for a long continuous time while the buffer is being sent. By my tests it doesn't really look that this is happening (actually I haven't noticed any changes by using Serial.send_now).


I really appreciate you taking your time to help me.
I know I have a lot to learn, so I hope we can continue this discussion further.
 
Here is the sketch I used: View attachment MaxUsbLC.ino - it was quickly made to see how fast a T_LC could send 40 byte packets - then I saw speed differences between USB connection path and the Sermon used - and compared T_3.5 and T_3.6.

And modified for your question. It prints 31 characters then a NEW_LINE for a total of 32.

A packet is formed by the USB software - AFAIK it collects data sent to Serial and sends out packets when they are a full 64 bytes, if a packet remains unfilled after a fixed delay it will be sent.

Note was just to show I tried both 64 and 32 byte prints and the results seemed equal.

Any and all characters set to Serial need to be counted when they are part of the data sent.

If you buffered the sampled data for print as a 32 byte block, and with minor [sprint] formatting the second block is roughly ~30 bytes now so adjusting that to 32 bytes exactly would allow either block to print in turn and result in fully filled 64 byte USB buffers it might prevent backlog where partial packets are waiting to be sent. Also sending in smaller than 128 samples of 2 bytes would prevent 256 bytes from leving empty space in the USB queue - only to get hit with 4 buffers full all at once. That alone might make a difference doing a print after each 32 samples { 32 adjusted down for filler around the sample data } for 64 byte prints.


USB2 and 3 both work at max Teensy speed - but going through a HUB adds a level of overhead. My PC front panel has USB3 so that is what I used and saw faster transfers than through either a USB2 or USB3 HUB.

send_now will force a print of a partial buffer - which means a full transfer is done and any unused bytes in that packet will cascade into the next packet minimizing total throughput - the difference can be minimal at times.
 
Because you are using Serial.println() to send the data, you are transferring quite a bit more data than you may realize.

All of the data sent is first converted to ASCII so your 16bit values take 1 byte/char per digit plus the trailing newline char. Assuming 5 digits per 16bit a/d val, each data transmission consists of ~800 bytes (including the header & BME data). This is roughly consistent with your timing of ~850us per transmission for a throughput of ~950 kB/sec which is close to the ~1MB+ upper limit.

At 64 bytes per USB buffer, 800 bytes takes 12.5 buffers. Checking "cores/teensy3/usb_desc/h" it appears that the default # of allocated USB buffers is 12, so your 800 byte transmission requires more buffers than are available and the transmission will block until enough data has been sent freeing the necessary buffer resource. Still, your throughput is pretty good considering that you haven't tried any optimizations.

As to why you are seeing the 80-100mSec "stalls" every 7-12 sec, I'd look at the PC side for the cause. Remember that USB bandwidth is always controlled by the host (PC). If the host side stutters, the Teensy USB code has no choice but to wait. If all the USB buffers are full, then The USB send code will block waiting for the host to receive more data freeing the needed resources.

The Arduino SerMon may be the culprit (likely), or it may be a resource issue in the PC USB driver (less likely). I believe that you can simply close the Arduino SerMon and the Teensy will still send data across the link. The data will simply be tossed on the PC side as there's no consumer for it. If your "stalling" goes away when the SerMon is closed, then you'll have a good idea as to where the latency is coming from.
 
I would agree with the limiting factor likely being the Arduino Serial Monitor just can't keep up with this data rate. Since this is using ordinary Serial, which is CDC-ACM protocol, which means "bulk" pipes/endpoints for the data, when the host can't keep up the result is stalls. That's how things are designed to work.

While I don't believe it's the cause of this 100 ms stall, the Teensy side isn't running very efficiently. An IntervalTimer runs a function every 20 us which does an analogRead() which takes ~10 us, so about 50% of all Teensy CPU time is spend in an interrupt waiting on analogRead().

A much more efficient (and easy) way would involve using the audio library to acquire the data, and the Audio USB feature (in Tools > USB Type) to stream the audio data to the PC. USB audio uses isochronous pipes/endpoints, which are designed to automatically drop data but never stall. Then you could pass the motion sensor data over regular serial. The downside is you'd lose the tight coupling, since the PC side would have to receive the audio data and sensor data as separate streams.
 
Thank you all for the help.

I'll try to comment on each reply here.

defragster:
Thanks for sending me the code, I'll take a look at it as soon as I can.
About your comments, I don't think I fully understand some parts:

Also sending in smaller than 128 samples of 2 bytes would prevent 256 bytes from leving empty space in the USB queue - only to get hit with 4 buffers full all at once. That alone might make a difference doing a print after each 32 samples { 32 adjusted down for filler around the sample data } for 64 byte prints.
Maybe I'm lost in translation here. When you say 4 buffers full at once, I thought every time a 64 byte buffer is filled it has to be transmitted before any other data can be buffered, but by dgranger's reply I assume I can fill up to 12 buffers before having to wait for data to be transfered, is that correct?
I thought 128 samples of 2 bytes each were a good call exactly because it wouldn't leave any partially filled buffers.

send_now will force a print of a partial buffer - which means a full transfer is done and any unused bytes in that packet will cascade into the next packet minimizing total throughput - the difference can be minimal at times
When you said that any unused bytes will cascade into the next packet, it's because a full transfer takes the same amount of time for a partial or a full buffer (After the transfer the buffer will be left empty, no data really remains ocupying buffer space for the next transfer) right?


dgranger:
All of the data sent is first converted to ASCII so your 16bit values take 1 byte/char per digit plus the trailing newline char. Assuming 5 digits per 16bit a/d val, each data transmission consists of ~800 bytes (including the header & BME data). This is roughly consistent with your timing of ~850us per transmission for a throughput of ~950 kB/sec which is close to the ~1MB+ upper limit.
You are absolutely corret, I wasn't taking into account the fact that I'm not sending raw bytes, but instead converting the values to strings by using print/println. In my tests the adc is at default 10bits mode, and data is always around 300, so it's about 3 characters/bytes per sample, but I should really consider the worst case of 5 digits for I intend to change to 16bits later

At 64 bytes per USB buffer, 800 bytes takes 12.5 buffers. Checking "cores/teensy3/usb_desc/h" it appears that the default # of allocated USB buffers is 12, so your 800 byte transmission requires more buffers than are available and the transmission will block until enough data has been sent freeing the necessary buffer resource. Still, your throughput is pretty good considering that you haven't tried any optimizations.
Thanks for that info, as stated above, I thought the transmission would block after just 1 buffer of 64 bytes were filled. Is there a way to change the number of USB buffers available? And where are theses buffers really? Do they take up part of Teensy's RAM? Is there a function that returns how many buffers are full at a given moment?

The Arduino SerMon may be the culprit (likely), or it may be a resource issue in the PC USB driver (less likely). I believe that you can simply close the Arduino SerMon and the Teensy will still send data across the link. The data will simply be tossed on the PC side as there's no consumer for it. If your "stalling" goes away when the SerMon is closed, then you'll have a good idea as to where the latency is coming from.
I believe this is really the case! I tried to receive the data using a processing sketch to store it in a .txt file instead of using the Arduino SerMon and the 100ms+ stalls were gone! Thanks a lot for that one!
It left me wonder if there's an easier (or a proper) way to debug this sort of things that I'm missing.


PaulStoffregen:
While I don't believe it's the cause of this 100 ms stall, the Teensy side isn't running very efficiently. An IntervalTimer runs a function every 20 us which does an analogRead() which takes ~10 us, so about 50% of all Teensy CPU time is spend in an interrupt waiting on analogRead().

A much more efficient (and easy) way would involve using the audio library to acquire the data, and the Audio USB feature (in Tools > USB Type) to stream the audio data to the PC. USB audio uses isochronous pipes/endpoints, which are designed to automatically drop data but never stall. Then you could pass the motion sensor data over regular serial. The downside is you'd lose the tight coupling, since the PC side would have to receive the audio data and sensor data as separate streams.
Yes, half of CPU time is spent doing the sampling and I'm worried about it even more considering I plan to change the adc to 16bit.
About using the Audio USB feature, I didn't even know it was possible, and I'm not sure how to go about it. To receive the data that way (and, for instance, store it in a file), would I still be listening to a COM port? Probably not, I assume. Is there a place you could point me to for learning about it in a simple fashion (like a tutorial or something)?
I didn't use the audio library because I don't really know what's happening in the background and the way I did it seemed simpler (based on my current knowledge) to be aware of everything from the sampling to the transfer and check wether I was losing samples or not.
I was also thinking about using the intervalTimer to start a sampling (by writing to registers or maybe using adc library) and then using the ADC interrupt to read the data when the sampling is finished, as a way to have more CPU time available for other tasks. Does it seem like a bad approach?


I'll take this chance to ask a question in a more project guidance fashion: For now I'm using USB to have a prototype, but the idea was to send this data over WiFi. I've never used WiFi and I'm really worried about the data transfer rates. Since it seems I'm needing almost the full 12Mb/s speed here, is it reasonably achievable through WiFi, considering I'll have to send the data to some ESP8266 module, which I imagine will be way slower than 12Mb/s? I've read that 10Mb/s is achievable through SPI, so I think there might be a way. Please send me your thoughts on this.

Once again, thank you all for helping and I'm sorry for the long post.
 
About using the Audio USB feature, I didn't even know it was possible, and I'm not sure how to go about it. To receive the data that way (and, for instance, store it in a file), would I still be listening to a COM port? Probably not, I assume. Is there a place you could point me to for learning about it in a simple fashion (like a tutorial or something)?

It#s like connecting an USB-Soundcard to the computer. The computer will detect the Teensy as audio device.

https://www.pjrc.com/teensy/td_libs_Audio.html

Please take a look at the video, the design-tool and the sample-code in Teensyduino.

SPI: The SPI-Hardware can do much more than 10MB/s, but it depends more on the efficiency of your code.
 
About:
Maybe I'm lost in translation here. When you say 4 buffers full at once, I thought every time a 64 byte buffer is filled it has to be transmitted before any other data can be buffered, but by dgranger's reply I assume I can fill up to 12 buffers before having to wait for data to be transfered, is that correct?
I thought 128 samples of 2 bytes each were a good call exactly because it wouldn't leave any partially filled buffers.

I was saying it is better to send proper sized chunks to USB as you have them than to buffer a larger amount of data - the 128 samples - and sending that in bulk. The USB system is designed to buffer and transmit ASAP with the buffers built into the system.

Sending the smaller sets of data will allow the USB system to send the data as soon as it can see 64 bytes, and perhaps offer best throughput keeping 'the pipe full'. Sending 128 samples - intermixed with the other data will fill the buffers with data after perhaps some downtime when the USB could have been sending the initial data already.

It looks like dgranger read the code and calculated actual data output - that may exceed even optimal use - if the PC falls behind.
 
I assume I can fill up to 12 buffers before having to wait for data to be transfered, is that correct?

Close, but not quite correct. The USB Serial transmit code checks how many buffers are still unused and stops when there are only 4 left. This is done in an attempt to prevent a large amount of transmitted data from preventing anything being received by hogging all the buffers.


Is there a way to change the number of USB buffers available?

Yes, you would need to edit usb_desc.h.

The maximum is 31 buffers, so don't go crazy with a huge number.


And where are theses buffers really? Do they take up part of Teensy's RAM?

They are allocated in Teensy's RAM. If you make this change, you should see the amount of memory Arduino reports as used for global variable increase by the number of extra buffers.


Is there a function that returns how many buffers are full at a given moment?

There are functions in usb_mem.c and usb_dev.c, but these are generally meant to be used only from the other USB code. You can probably make use extern "C" to declare them from C++, but this is not recommended. They're not part of the official API that will remain long-term stable as new versions of Teensyduino are released.

The official API is Serial.availableForWrite(). This does have a known bug for certain cases where you write less than a full buffer and then it gets auto-flushed after a 4 ms timeout. But generally this is the best way if you want to know how much data you can transmit without blocking.
 
Status
Not open for further replies.
Back
Top