UART speed weirdness

Status
Not open for further replies.

john-mike

Well-known member
I'm trying to send bytes between two Teensy 3s but can't seem to transfer them any faster by raising the baud past 115200.

Code:
HardwareSerial Uart = HardwareSerial();
byte j,RX,g;
long d,t,prev;
#define rate 115200*10

void setup() {
  //Uart.begin(115200);
  Uart.begin(rate);
  Serial.begin(rate);
  pinMode(13,OUTPUT);
  pinMode(2,INPUT_PULLUP);
}

void loop() {
  if (digitalRead(2)==0){    
    RX=0;   
  }
  else{    
    RX=1;  
  }

  //////////////// SEND 

  if (RX==0){
    if ((millis()-prev)>500 ){
      j++;
      g+=64;
      prev=millis();
      t=micros();
      Uart.write(j);
      Uart.write(d);
      d=micros()-t;
    }



    if (g>0){
      g--;
      digitalWrite(13, HIGH);
    }
    if (g==0){ 
      digitalWrite(13, LOW);
    }	
    if (g==1){     
      Serial.println(d);
    }	
  }

  ////////////////////////// RECIEVE


  if (RX==1){
    /*
     int incomingByte;
     if (Serial.available() > 0) {
     incomingByte = Serial.read();
     Serial.print("USB received: ");
     Serial.println(incomingByte, DEC);
     //   Uart.print("USB received:");
     //  Uart.println(incomingByte, DEC);
     }
     */

    if (Uart.available()) {
      j+=128;
      t=micros();
      byte b1 = Uart.read();
      byte b2 = Uart.read();
      d=micros()-t;

      Serial.print(b1);		
      Serial.print(" ");
      Serial.println(b2);
      Serial.print("     ");
      Serial.println(d);
    }

    if (j>0){
      j--;
      digitalWrite(13, HIGH);
    }
    if (j==0){ 
      digitalWrite(13, LOW);
    }	

  }
}

No matter how high I raise the baud it still takes ~11us to send the two bytes and about 2-3us to read them. Eventually at around 1.5MHz it locks up.
I've tried Serial2 and 3 but since they don't have FIFOs both bytes aren't received in the same cycle (I guess?)

On my scope I do see the frequency going faster when I change it about 115200.

I saw the other posts on UARTS, specifically http://forum.pjrc.com/threads/23687-Teensy-3-0-UART-datarate, but I'm wondering if there's another setting I'm missing or if its just that higher rates don't work or there's a bottleneck I'm missing.
 
Not surprised about trying to go faster than 115200. Here, baud rate mismatches are heightened. And slew rates suffer; UART FIFOs struggle. And software driver race conditions or buffer overflow might arise at those Mbps speeds.
At megabit speeds, I'd use SPI or I2C.
 
- At some point the program's execution time will outweigh the UART's transmission speed and so no more improvement will be obtained.
- Removing debug output will probably speed things up
- At a certain point it becomes advantageous to use DMA on the UARTs rather than interrupts or polling.

Regards

Mark
 
Thanks y'all.

- Removing debug output will probably speed things up
You mean the digital writes and serial? They are outside the speed test.

- At a certain point it becomes advantageous to use DMA on the UARTs rather than interrupts or polling.
I haven't seen any posts on how to implement DMA in a general sense. Any ideas?
 
Last edited:
Even the non-FIFO ports should work at this speed.

I'll give this a try here soon and see if I can figure out what's up.....
 
I'm running the code now on a pair of Teensy 3.1 boards. It seems to be working fine.

sc.png

scope_10.png
 
>> Debug output
I didn't look closely but if there is no debug output during the test that can slow the operation it won't be that.

>> DMA
Below I have copied some code for DMA operation from the uTasker project. When the DMA transfer has completed there is an interrupt and this is used by the output queue manager. How this is done in your proejct will depend on how the queue manager needs to do it. This is for Tx (the easiest case) and often most useful. There is equivalent code for reception but there are varous strategies - the most practical is to use a free-running reception buffer and poll in the application.

It is important to get DMA priorities right on the Kinetis since conflicts cause DMA errors which stall the operation.

The driver supports all UARTs in Rx/Tx DMA mode (or mixtures of DMA and interrupt driven) and has been used in applications with up to 6 UARTs (eg. on K60) at over 1.5M Baud on each.

Regards

Mark

Code:
Initialisation extract:

if (pars->ucDMAConfig & UART_TX_DMA) {
    KINETIS_DMA_TDC *ptrDMA_TCD = (KINETIS_DMA_TDC *)eDMA_DESCRIPTORS;
    ptrDMA_TCD += UART_DMA_TX_CHANNEL[Channel];
    ptrDMA_TCD->DMA_TCD_SOFF = 1;                                    // source increment one byte
    ptrDMA_TCD->DMA_TCD_DOFF = 0;                                    // destination not incremented
    ptrDMA_TCD->DMA_TCD_ATTR = (DMA_TCD_ATTR_DSIZE_8 | DMA_TCD_ATTR_SSIZE_8); // transfer sizes always single bytes
    ptrDMA_TCD->DMA_TCD_DADDR = (unsigned long)&(uart_reg->UART_D);  // destination is the UART's data register
    ptrDMA_TCD->DMA_TCD_NBYTES_ML = 1;                               // each request starts a single transfer
    ptrDMA_TCD->DMA_TCD_CSR = (DMA_TCD_CSR_DREQ | DMA_TCD_CSR_INTMAJOR); // stop after the defined number of service requests and interrupt on completion
    fnEnterInterrupt((irq_DMA0_ID + UART_DMA_TX_CHANNEL[Channel]), UART_DMA_TX_INT_PRIORITY[Channel], (void (*)(void))_uart_tx_dma_Interrupt[Channel]); // enter DMA interrupt handler
    uart_reg->UART_C5 |= UART_C5_TDMAS;                              // use DMA rather than interrupts for transmission
    POWER_UP(6, SIM_SCGC6_DMAMUX0);                                  // enable DMA multiplexer 0
    *(unsigned char *)(DMAMUX0_BLOCK + UART_DMA_TX_CHANNEL[Channel]) = ((DMAMUX_CHCFG_SOURCE_UART0_TX + (2 * Channel)) | DMAMUX_CHCFG_ENBL); // connect UART tx to DMA channel
    uart_reg->UART_C2 |= (UART_C2_TIE);                              // enable the tx dma request (DMA not yet enabled) rather than interrupt mode
}
else {
    uart_reg->UART_C5 &= ~(UART_C5_TDMAS);                           // disable tx DMA so that tx interrupt mode can be used
}

// Start transfer of a block via DMA
//
extern QUEUE_TRANSFER fnTxByteDMA(QUEUE_HANDLE channel, unsigned char *ptrStart, QUEUE_TRANSFER tx_length)
{
    KINETIS_DMA_TDC *ptrDMA_TCD = (KINETIS_DMA_TDC *)eDMA_DESCRIPTORS;
    ptrDMA_TCD += UART_DMA_TX_CHANNEL[channel];

    ptrDMA_TCD->DMA_TCD_BITER_ELINK = ptrDMA_TCD->DMA_TCD_CITER_ELINK = tx_length; // the number of service requests (the number of bytes to be transferred)
    ptrDMA_TCD->DMA_TCD_SADDR = (unsigned long)ptrStart;                 // source is tty output buffer
    DMA_ERQ |= (DMA_ERQ_ERQ0 << UART_DMA_TX_CHANNEL[channel]);           // enable request source
    return tx_length;
}
 
I'm running the code now on a pair of Teensy 3.1 boards. It seems to be working fine.
Yes the code works fine at 11520 but I was hoping it could be faster. As you were saying in this post, the UART should be able to go over 115200.
 
No matter how high I raise the baud it still takes ~11us to send the two bytes and about 2-3us to read them.

I believe part of what you're measuring is the function call and complex code inside the micros() function. Even though it reports the microsecond clock time, the function itself takes a few microseconds.

On Teensy 3.1, I measured 7-8 us to send 2 bytes. Maybe you're testing Teensy 3.0? Teensy 3.1 is faster.

However, I modified your code to send 10 bytes instead of 2. I'll post the full code below. My measurement went from 7-8 for 2 bytes (about 4 us/byte) to 23-24 for 10 bytes (2.4 us/byte).

2.4 us/byte is 434 kbytes/sec, or fast enough to sustain over 4 Mbit/sec transmission. Even then, I'm sure some of that time is still measuring slowness in micros().

I also verified with my oscilloscope. The bytes are always transmitted without any delays between them.

Eventually at around 1.5MHz it locks up.

I increased the baud rate to 2000000, but I could not reproduce any lockup. Here's the code I tested.


Code:
HardwareSerial Uart = HardwareSerial();

byte j, RX, g;
long d,t,prev;

#define rate 2000000

void setup() {
  //Uart.begin(115200);
  Uart.begin(rate);
  Serial.begin(rate);
  pinMode(13,OUTPUT);
  pinMode(2, INPUT_PULLUP);
}

void loop() {
  if (digitalRead(2)==0) {    
    RX=0;   
  }
  else{    
    RX=1;  
  }

  //////////////// SEND 

  if (RX==0) {
    if ((millis()-prev)>500 ) {
      j++;
      g+=64;
      prev=millis();
      t=micros();
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(j);
      Uart.write(d);
      d=micros()-t;
    }
    if (g>0){
      g--;
      digitalWrite(13, HIGH);
    }
    if (g==0){ 
      digitalWrite(13, LOW);
    }	
    if (g==1) {     
      Serial.println(d);
    }	
  }

  ////////////////////////// RECIEVE


  if (RX==1){
    /*
     int incomingByte;
     if (Serial.available() > 0) {
     incomingByte = Serial.read();
     Serial.print("USB received: ");
     Serial.println(incomingByte, DEC);
     //   Uart.print("USB received:");
     //  Uart.println(incomingByte, DEC);
     }
     */

    if (Uart.available() >= 10) {
      j+=128;
      t=micros();
      byte b1 = Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      byte b2 = Uart.read();
      d=micros()-t;

      Serial.print(b1);
      Serial.print(" ");
      Serial.println(b2);
      Serial.print("     ");
      Serial.println(d);
    }

    if (j>0) {
      j--;
      digitalWrite(13, HIGH);
    }
    if (j==0){ 
      digitalWrite(13, LOW);
    }	

  }
}
 
Thanks so much Paul. Yes I'm using two 3.0.

I didn't realized micros() saw so slow. That explains some other timing problem I've been having.

I can only afford about 10us so I guess I'll just stick with a single byte and use a 3.1.
 
You can also increase the writing speed, if you're sending more data, by writing a block with a single call. For example:

Code:
      byte buf[10];
      buf[0] = j;
      buf[1] = j;
      buf[2] = j;
      buf[3] = j;
      buf[4] = j;
      buf[5] = j;
      buf[6] = j;
      buf[7] = j;
      buf[8] = j;
      buf[9] = d;

      t=micros();
      Uart.write(buf, 10);
      d=micros()-t;

A single write with a 10 byte buffer is approx 3X faster than 10 single byte writes.
 
Please post complete code and specific details to reproduce the lockup condition.

Same code but
#define rate 115200*10
was
#define rate 15000000

At that point I was no longer getting anything in the serial window and since I don't have a scope that fast I couldn't see what was going on.
 
I suspect 3 or 4 Mbit/sec might be an upper limit on the usable speed.

Honestly, not much testing has ever been done above 1 Mbit/sec. I've got those 2 boards still connected on my desk, so if you post another test case today, I can pretty easily give it a try.
 
I've got it running on my desk at 4 Mbit/sec, using a pair of Teensy 3.1 boards, running this code on both:

Code:
HardwareSerial Uart = HardwareSerial();

byte j, RX, g;
long d,t,prev;

#define rate 4000000

void setup() {
  //Uart.begin(115200);
  Uart.begin(rate);
  Serial.begin(rate);
  pinMode(13,OUTPUT);
  pinMode(2, INPUT_PULLUP);
}

void loop() {
  if (digitalRead(2)==0) {    
    RX=0;   
  }
  else{    
    RX=1;  
  }

  //////////////// SEND 

  if (RX==0) {
    if ((millis()-prev)>500 ) {
      j++;
      g+=64;
      prev=millis();
      byte buf[10];
      buf[0] = j;
      buf[1] = j;
      buf[2] = j;
      buf[3] = j;
      buf[4] = j;
      buf[5] = j;
      buf[6] = j;
      buf[7] = j;
      buf[8] = j;
      buf[9] = d;

      t=micros();
      Uart.write(buf, 10);
      d=micros()-t;
    }
    if (g>0){
      g--;
      digitalWrite(13, HIGH);
    }
    if (g==0){ 
      digitalWrite(13, LOW);
    }	
    if (g==1) {     
      Serial.println(d);
    }	
  }

  ////////////////////////// RECIEVE


  if (RX==1){
    /*
     int incomingByte;
     if (Serial.available() > 0) {
     incomingByte = Serial.read();
     Serial.print("USB received: ");
     Serial.println(incomingByte, DEC);
     //   Uart.print("USB received:");
     //  Uart.println(incomingByte, DEC);
     }
     */

    if (Uart.available() >= 10) {
      j+=128;
      t=micros();
      byte b1 = Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      Uart.read();
      byte b2 = Uart.read();
      d=micros()-t;

      Serial.print(b1);
      Serial.print(" ");
      Serial.println(b2);
      Serial.print("     ");
      Serial.println(d);
    }

    if (j>0) {
      j--;
      digitalWrite(13, HIGH);
    }
    if (j==0){ 
      digitalWrite(13, LOW);
    }	

  }
}

Here's what I'm getting from the receiving board:

sc.png

I'm looking at the waveform on my oscilloscope. All 10 bytes are filling about 5 divisions, at 5 us/div. 100 bits in 25 us is 4 Mbit/sec.
 
Here's my scope screen, with 10 bytes at 4 Mbit/sec, from the code and screenshot of reply #18.

scope_12.png

Looks like 4 Mbit/sec is working fine. Of course, that code is writing the 10 bytes with a single write(buf, len). 10 individual writes might not be fast enough to sustain 4 Mbit/sec data.

These tests are on Teensy 3.1 running at 96 MHz. Teensy 3.0 run somewhat slower, due to less flash memory bandwidth and cache.
 
Ok write(buf, len) and 4MHz on the 3.1 should be just fast enough then.
Thank you for the very thorough response!
 
If you're going to send just 1 byte, with a lengthy dead time between, you might be able to get away with just writing directly to the UART's data register. That's an incredibly dirty trick, but it will execute in just a couple clock cycles. You might need to reconfigure the UART not to generate interrupts.

It certainly looks like 4 Mbit/sec can work with the regular code, using block write, where write() is a C++ vtable call that manipulates a buffer and the actual transmission is done with an interrupt. Still, lots of overhead for all those nice features.

If you have code that reproduces a lockup bug, please post it, plus specific instructions for reproducing the problem. If there's a bug, I can usually fix it, but only after I manage to reproduce it.
 
Also, consider this measurement technique probably over-estimates the speed on the receiving side. The interrupt code is part of the overhead, for both transmitting and receiving. For transmitting, the interrupt probably runs immediately after the write, before micro() can acquire the time. But for receiving, the interrupt happens before available() indicates data has arrived, so the 2 calls to micros() do not capture the interrupt overhead. Receiving isn't really so fast as this test makes it seem. It's probably pretty similar to transmitting.
 
writing directly to the UART's data register
Any hint on how to do that?

I tried it again on the two 3.0s (I on'y have one 3.1 available) and it works at 4M but with lots of errors.
Sorry I must have something else screwy before. Of like I did up there, insert another 0.

Code:
HardwareSerial Uart = HardwareSerial();
byte j,RX,g;
long d,t,prev;
#define rate 4000000

void setup() {
  //Uart.begin(115200);
  Uart.begin(rate);
  Serial.begin(rate);
  pinMode(13,OUTPUT);
  pinMode(2,INPUT_PULLUP);
}

void loop() {
  if (digitalRead(2)==0){    
    RX=0;   
  }
  else{    
    RX=1;  
  }

  //////////////// SEND 

  if (RX==0){
    if ((millis()-prev)>500 ){
      j++;
      g+=64;
      prev=millis();
      t=micros();
      Uart.write(j);
      Uart.write(d);
      d=micros()-t;
    }



    if (g>0){
      g--;
      digitalWrite(13, HIGH);
    }
    if (g==0){ 
      digitalWrite(13, LOW);
    }	
    if (g==1){     
      Serial.println(d);
    }	
  }

  ////////////////////////// RECIEVE


  if (RX==1){
    /*
     int incomingByte;
     if (Serial.available() > 0) {
     incomingByte = Serial.read();
     Serial.print("USB received: ");
     Serial.println(incomingByte, DEC);
     //   Uart.print("USB received:");
     //  Uart.println(incomingByte, DEC);
     }
     */

    if (Uart.available()) {
      j+=128;
      t=micros();
      byte b1 = Uart.read();
      byte b2 = Uart.read();
      d=micros()-t;

      Serial.print(b1);		
      Serial.print(" ");
      Serial.println(b2);
      Serial.print("     ");
      Serial.println(d);
    }

    if (j>0){
      j--;
      digitalWrite(13, HIGH);
    }
    if (j==0){ 
      digitalWrite(13, LOW);
    }	

  }
}
 
Status
Not open for further replies.
Back
Top