Test of Serial Transfer, fails for large transfers

Status
Not open for further replies.
As noted - the problem is on the 'Host' side with send rates it isn't capable of keeping up with. When the device properly constrains the rate of output - the host receives the USB certified data.

Code:
in 83606 microseconds 	 bytes printed	1358975
DONE with Serial.availableForWrite() 6144	 bytes printed1359031
	134528
	94468
	82122
	80400
	80917
	81530
	83485
	91171
	79962
	80900
	79729
	80146
	79757
	80651
	78930
	79543
	81370
	85431
	83606

That shows 19 consecutive runs against TyCommander as SerMon - I ran again and did get TyComm buffer garbage once.

Then Ran HTerm with repeat every 0.6 secs ( nice feature ):
Code:
in 78573 microseconds <9> bytes printed<9>1358975<\r>
DONE with Serial.availableForWrite() 6144<9> bytes printed1359031<\r>
<9>79931<\r>
<9>87684<\r>
<9>79127<\r>
<9>82425<\r>
<9>82524<\r>
<9>82398<\r>
<9>82751<\r>
<9>81820<\r>
<9>82652<\r>
<9>81065<\r>
<9>82197<\r>
<9>84160<\r>
<9>77892<\r>
<9>79228<\r>
<9>82198<\r>
<9>79504<\r>
<9>82649<\r>
<9>85746<\r>
<9>80033<\r>
<9>78824<\r>
<9>82272<\r>
<9>81643<\r>
<9>78573<\r>

23 iterations against HTerm counts 31,260,335 bytes - total transfer time about 1.9 seconds

Code updated - see following post
Here is the gated sketch on T_4.1 that runs:
Code:
// https://forum.pjrc.com/threads/67870-Test-of-Serial-Transfer-fails-for-large-transfers
static uint32_t cnt = 0;
void setup() {
  Serial.begin(115200);
  while (!Serial && millis() < 8000 );
  cnt += Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  cnt += Serial.println("Send USB data to get output ...");
}

char databuffer[4096];
#define NumRuns 30
uint32_t sTimes[NumRuns];
uint32_t iiT = 0;
void loop() {
uint32_t cntW = 0;
  if ( Serial.available() ) {
    while ( Serial.available() ) Serial.read();
    char *pc = databuffer;
    cnt += Serial.printf( "Starting with Serial.availableForWrite() %u\n", Serial.availableForWrite() );
    int n;
    for (n = 0; n < 130; n++ ) {
      pc[n] = 'a' + n % 26;
    }
    pc[n++] = '\n';
    pc[n++] = 0;

    uint32_t prTime = micros();
    for (n = 0; n < 10000; n++ ) {
      cnt += Serial.print( n );
      cnt += Serial.print( "\t" );
      cnt += Serial.print( databuffer );
[B]      while ( Serial.availableForWrite() < 132 ) { // T_3.2 and ??? has max 64B USB buffers so 132 will halt sketch here unless changed. Max Bytes printed early in setup().
        Serial.flush();
        //delayMicroseconds(25);
        cntW++;
      }[/B]
    }
    prTime = micros() - prTime;
    cnt += Serial.printf( "\nin %lu microseconds \t bytes printed\t", prTime);
    cnt += Serial.println(cnt);
    cnt += Serial.print("Count While flush repeat : ");
    cnt += Serial.println(cntW);
    delay(4);
    sTimes[iiT++] = prTime;
    if ( iiT >= NumRuns ) iiT = 0;
    uint32_t tSum=0;
    uint32_t iiS=0;
    for ( uint ii = 0; ii < NumRuns; ii++) {
      if ( sTimes[ii] != 0 ) {
        iiS++;
        tSum+=sTimes[ii];
        cnt += Serial.print("\t");
        cnt += Serial.println(sTimes[ii]);
      }
    }
    cnt += Serial.printf( "\tTx us sum=%lu Tx us Avg= %u\n ", tSum, tSum/iiS );
    cnt += Serial.printf( "DONE with Serial.availableForWrite() %u\t bytes printed\t", Serial.availableForWrite() );
    cnt += Serial.println(cnt);
  }
}

That BOLD while is essential to decent function - No count of the number of times it runs was made, but with that 'break' to allow buffer transfer the host was able to keep up and get valid data
 
Last edited:
FASTER write throughput, and NO HOST breakage with output wait in while()

Updated CODE in prior post #5.
Added count of the repeats of:
Code:
      while ( Serial.availableForWrite() < 132 ) {
        Serial.flush();
        //delayMicroseconds(25);
        cntW++;
      }

SUMMARY of below /TLDR : That while() behavior on Teensy allows the HOST to RELIABLY receive at 16 MB/sec without error. INCLUDING Teensy Ports IDE SerMon ( though it's FIRST run is over 330ms then they are in the same 80ms range! ). That Pausing while() keeps HOST from getting bad buffer garbage it seems and continued running!

That is in a loop that repeats 10,000 times and it runs over 400 to 500,000 times causing a NEEDED wait for free buffer space ( according to Serial.availableForWrite() )
Commented delayMicroseconds has the effect of lowering the 'flush repeat' count - but just allows buffer to clear - the count was about 2500 times with delay of 25 us. That is ~63ms of a total print time of 87 ms
{ put 10us delay back in and that is 120 ms of 137 for the printing :: Count While flush repeat : 12035 > us taken 137384 in first after upload that is always looking slower. On a run of 84265us the flush repeat 10us time is 6812 * 10us }

NOTE: With this the TyComm Serial output takes a noticeable pause, on some occasions - probably as it is writing ALL incoming to DISK file - but IT RECOVERS. Any issue would be the return from Serial.flush() before there was enough space in device or host buffers to continue. With that there is no more stalled Teensy not taking Serial input after an unfinished set of 10K prints of 130 bytes, plus the diagnostic info.
Here is above code on T_4.1 running against TyCommander - where the list of numbers is the us taken per 10K writes. These times are MUCH faster than previously seen now that Teensy is pausing output for buffer clearing.
Code:
[B][U]Count While flush repeat : 422585[/U][/B]
	105222
	104126
	125550
	81159
	80109
	80470
	85989
	92844
	80057
	80767
	79781
	80715
	81246
	79813
	81143
	82874
	88209
	80536
	79805
	79133
	79605
	79863
	132463
	79176
	79495
	Tx us sum=2180150 Tx us Avg= 87206
 DONE with Serial.availableForWrite() 6144	 bytes printed	33980842

All print's have a cnt+= to track the bytes sent and HTerm count of bytes matches, less 10 bytes, after 30 iterations - where the 10 bytes are in the final byte count with '\n'!
HTerm Summary of 30 runs, showing 2.535 seconds transmitting 40.77 MB:
Code:
Tx us sum=[B]2535247[/B] Tx us Avg= 84508
 DONE with Serial.availableForWrite() 6144    bytes printed    [B]40777554[/B]
 
Last edited:
@PaulStoffregen
> Code in p#51 shows good perf and no loss of USB data or Host data corruption
> Notes in p#52 offers details

It seems overwhelming the OUTPUT is filling ALL USB buffers? The while loop ends up with MAX expected ~16MB/sec throughput - where ~3/4 of Tx time is waiting for buffer space

Without that wait ALL buffers consumed leaves none for INPUT from HOST? Not sure if any of this fits/aids your understanding for making any improvement?

ADDED NOTE for p#52: That code ran a few times with NO ERROR on IDE native SerMon with same speed - then the next iteration KILLED IDE and IDE SerMon ... 'poof' ... not managing MB's of data buffer allocs?

Restarted back in HTerm - it counted 122339232 bytes transferred in 3 sets of 30 iterations, and the code shows 11 less bytes printed 122339221. With 9 digit count and '\n' that is off by ONE byte in 122MB.
 
@defragster,
please be always clear if you talk about T3 or T4, otherwise posts are difficult to follow
AFAIK, T3,x use dynamic and T4.x use static allocation of buffers
 
@defragster,
please be always clear if you talk about T3 or T4, otherwise posts are difficult to follow
AFAIK, T3,x use dynamic and T4.x use static allocation of buffers

Indeed - T_4.1 is mentioned (a bit low) in p#51 and #52 to which p#53 refers - as was attempted in prior posts - but the posted speeds are only possible on T_4.x.

The point was to stress the Host - and with any reasonable Host receive code - only the T_4.x can do that.

Initially ran on T_3.2 as noted back then - and indeed the posted sketch won't run on a T_3.2 as it has only 64 B from Serial.availableForWrite().
Edited comment on this line: while ( Serial.availableForWrite() < 132 ) { // T_3.2 and ??? has max 64B USB buffers so 132 will halt sketch here unless changed. Max Bytes printed early in setup().

To run on T_3.2 - when blocks need written and the Host can't process fast enough this edit should work.:
Code:
      [B]while ( Serial.availableForWrite() < 64 ) {[/B]
        //cnt += Serial.print( "\tFLUSH ...\n" );
        Serial.flush();
        delayMicroseconds(10);
        cntW++;
      }
    }

Results in 55.2 ms of waiting and 2163 ms of Tx time:
Code:
in 2163166 microseconds 	 bytes printed	5436492
Count While flush repeat : 55237
	2180080
	2194550
	2156186
	2163166
	Tx us sum=8693982 Tx us Avg= 2173495
 DONE with Serial.availableForWrite() 29	 bytes printed	5436669
 
So, I see two strategies to ensure complete transfer
Code:
      while ( Serial.availableForWrite() < 132 ) {
        Serial.flush();
        //delayMicroseconds(25);
        cntW++;
      }
and
Code:
size_t written = 0;
      while (written != frameSize)                               // need to account for blocked usb transmission (pc busy, buffer too small etc)
      {                        
            written += Serial.write(frame + written, frameSize - written);
      }

Is that correct?
 
Saw the second go by but didn't look at that. I wasn't feeling up to doing the buffer math and trusting that it might help without drops or dupes of chars.

Picked the path posted and got proper looking results - in great time - with no SerMon hangs/stalls or displayed buffer corruption observed as posted.

Seems the second path could still run into the trouble the first method avoids when buffers go full.

Before adding that while() the transfer times were much larger - perhaps that was the 'long wait' with some part of the mentioned 120ms stall?

Would be easy enough to rewrite p#51 code to swap perhaps from method 1 to 2 for the fixed buffer - suspect the times or transfer would not be as good if what seems to be the case with first method. Time to go offline here for the day ...

Not that the second would be generically easy { i.e. inline print( "spew" ) } to use without sprintf()'ing to a local buffer then pushing to a func() to perform as posted in that code.

Problem that while() suggests is that Serial.flush() OFTEN returns when there is not actually enough usable buffer space for continued full speed writing. Without getting the Host in a bad way.

Note: the cntW++ in the while is just a debug diagnostic, and the 132 was chosen because of the print width of 2nd display in portrait mode as it fits in TyComm. And delay_us was used for short time - but including a yield might be better overall.
 
Status
Not open for further replies.
Back
Top