Serial communication at high baudrates

Status
Not open for further replies.
Hello,

I am trying to communicate via serial interface between Teensy LC and Teensy 3.5 using RS-485-Transceiver (https://www.sparkfun.com/products/10124). I'd like to transmit data at 3 000 000 baud (Teensy 3.5 runs only at 96MHz).
I uncommented the line to activate 9 bit data format in HardwareSerial.h because I like to use the address match feature of the UARTs and use the 9th bit as address mark bit, later:
Code:
#define SERIAL_9BIT_SUPPORT
I tested communication by sending 32 byte data frames:
Code:
uint8_t data[32];

for(uint8_t i = 0; i <32; i++) {

data[i] = i;

}

Serial1.write(data,32);
At 500 000 baud everything works fine. But when I increase the baudrate to 1 500 000 baud or 3 000 000 baud writing to the transmit buffer with Serial1.write() takes longer in comparison to lower baudrates.
For the first two 32-byte-frames it takes only around 105 microseconds to return for Serial1.write() at 500 000 baud (Following frames take more time because of the 64-byte transmit buffer).
For higher baudrates it takes around 300 microseconds to return for Serial1.write().
I know that the duration for Serial1.write() to return is no measurement for the UART's speed, because Serial1.write() just writes to the TX buffer of the HardwareSerial object, which is transmitted byte by byte at the specified baudrate and the return of Serial1.write() does not mean all data has already been transmitted. But why does it take longer for Serial1.write() to return when I increase the baudrate? Is it because data from the buffer is transmitted so fast that interrupts are stretching time for Serial1.write() that much? That would mean Serial1.write() produces an overhead of more than twice the time it takes the UART to transmit the data frames. If so, is there a possibility to speed up this process, so it will take the transmitter controller less REAL time to transmit data?

Thanks in advance!
 
Hard to say if it can work, all the 6Mhz serial work I have done is with the 3.1 and 3.2. The LC lacks the FIFO on the serial ports.

If you have an O-scope I would look at the data going in/out.
 
Sorry, it is unclear to me, which board is this code running on? the T3.5 or the LC? If the LC, maybe as Donziboy2 mentioned it has to do that the LC has no hardware queue and it is servicing many more interrupts during the time it takes to queue the data to your hardware queue. The T3.x boards have hardware queues on some of the Serial ports (Serial1 and Serial2 typically) which allows the interrupt to move several items to the queue (or from the queue) on one interrupt. Thus reducing the number of interrupts needed to process the data.
 
Sorry, I don't have an oscilloscope.

Yeah, I meant it running on the LC.

Actually, I just need to send data at fixed time intervals every 500 ms. I planned to send 1600 bytes in under 8 ms, but in between I would have time to prepare the data transfer. Is there a possibility to "block" the UART's transmit function? Then I could increase the TX buffer size and put data frames directly to the buffer. When it's time to send data I could just "activate" the transmit function again and the controller only has to handle the transmit interrupts during that time crictical phase.
Or would this also not help solving the problem because the bottelneck is a hardware problem?
 
First issue: If you look at the Table of Baud rate speeds at the end of the web page: https://www.pjrc.com/teensy/td_uart.html
You will find that the LC does not generate the same baud rates as the T.3x boards above 500000. So there will be an issue with this.

You are saying that you wish to send 1600 bytes in under 8ms... at baud rate of 500000 it should take more or less 8ms if my quick calculations are correct.

I guess the real question is there an issue if your writes to the buffer take longer? As long as the double buffering is working properly as to keep the UART.

Now if the issue is that you need to do some more work during this time and you don't want the code held up while it is copying bytes... It is possible to update the size of the buffer, by editing the source serial1.c
and change the lines:

#ifndef SERIAL1_TX_BUFFER_SIZE
#define SERIAL1_TX_BUFFER_SIZE 64 // number of outgoing bytes to buffer
#endif

But it then applies to all programs... Also have to be careful on size as the LC only has 8K of ram... Will at some point have a way to increase the size on a per program basis... I had a prototype that did this, but it may be awhile before anything like it gets into the build.
 
Need to have Paul update that table to include, 1.5M, 3M and 6M baud rates. Both 1.5M and 3M should be possible with the LC at 48Mhz, 6M is good for Teensy 3.1 - 3.6.
 
Hi Donziboy2: yes and no. That is yes you can run at 3M, but I am not sure it is buying you anything...

That is I put in a simple test program, based off of the program code snippet
Code:
void setup() {
  while (!Serial && (millis() < 2000)) ;
  Serial.println("Test Start");
  Serial1.begin(3000000);
  pinMode(2, OUTPUT);
  pinMode(3, OUTPUT);
}

void loop() {
  uint32_t start_time = millis();
  uint8_t data[32];

  for (uint8_t i = 0; i < 32; i++) {
    data[i] = i;
  }
  // We wish to output 1600 characters. 
  digitalWriteFast(2, HIGH);
  for (int i = 0; i < 50; i++) {
    digitalWriteFast(3, HIGH);
    Serial1.write(data, 32);
    digitalWriteFast(3, LOW);
  }
  digitalWriteFast(2, LOW);
  // quick and dirty..
  delay(500-(millis()-start_time));
}
I have my logic analyzer setup to show stuff.

As you can see in the Logic output, we are not getting a full 3mbs output:
screenshot.jpg

Or a closeup showing the gaps between bytes:
screenshot2.jpg

So It took about 13ms at 3mbs to output the 1600 characters

At 500000 it took about: 32 ms

The interesting thing was at 1.5mbs it took about 10.7ms - so less time than 13ms...

Note: if I had a large buffer of data and want to write out and I want to do other stuff while doing this, I would arrange my code differently...

That is I would use the method availableForWrite() to know how many bytes are free on the output queue and write a max of that number of characters and then do some work then repeat...
 
Kurt - can you rerun your tests using your 'write what fits' every time you pass by - keeping the queued data to write from going empty. Based on prior forum notes - I suspect you'll see it use more of the requested speed.

Unless it has changed - when the write queue runs empty then cycles are lost before getting it restarted when new Write() data pushes in.

<edit>: thread linked in post #9 is the one I was referring to: High-Speed-Serial-comms-between-2-Teensy-3-1
 
Last edited:
Last edited:
So It took about 13ms at 3mbs to output the 1600 characters

At 500000 it took about: 32 ms

The interesting thing was at 1.5mbs it took about 10.7ms - so less time than 13ms...
I think this is the same issue my initial question was about. It takes more time for Serial1.write() to return at higher baudrates (3M) than it takes at lower ones (1.5M). But I will "loose" this time later when the interrupts are served to send out data, right?

Note: if I had a large buffer of data and want to write out and I want to do other stuff while doing this, I would arrange my code differently...

That is I would use the method availableForWrite() to know how many bytes are free on the output queue and write a max of that number of characters and then do some work then repeat...
My problem is that I need to send data at a defined point of time. I use the LC to get sensor readings every 10 ms via I2C bus. Because of the limited RAM of the LC I would like to send the collected data to the 3.5 at fixed time intervals and write it to sd card. The goal is to have 8 LC/sensor units (with the possibility to expand it to 16 units, later) reading the sensor measurements and sending it to the 3.5, where all data is saved on sd card. So I like to implement a serial bus using RS-485 standard. That's also the reason why I use 9 bit data format and address match feature of the UART.
So my problem is I have timeslots where I could fill the buffer, but I need a possibility to allow the UART emptying the buffer only at defined time intervals.
 
@defragster, @donziboy2, @no_user_name - yes running Teensy LC, Don't think it would matter much if I called the code with only the number of bytes that the queue has free, as then my code would probably just spin outside of the call instead of spin inside the call (unless I had something else to do with that time).

As for calling Serial1.write(buff, cnt) - It can help some to speed up the actual call on T3.x, but on LC it is defined like;
Code:
void serial_write(const void *buf, unsigned int count)
{
	const uint8_t *p = (const uint8_t *)buf;
	while (count-- > 0) serial_putchar(*p++);
}

For the fun of it I may try a quick hack to bypass using the Serial.write code and try avoiding the interrupts using the registers instead, to see how that would work. The code will be hard looping here so again 100% usage of board... Not much else... A better approach may be to use DMA to transfer buffer, probably more than I want to play with. There was a T3.2 DMA Serial library for T3.2 that I think @duff did a few years ago. Not sure how work it would take... Now off to try quick hack... Will report back.

Edit: Forgot to mention again this is only doing 8 bit writes as while your test program you mentioned enabling 9 bits, your data showed 8 bit data... To do 9 bit code you would need to do a little more work and play with another hardware register... Look at serial1.c in cores3 at the putchar function...
 
Last edited:
Quick update: I updated the code as I mentioned to not use interrupts... Replace the one Serial.write() with code
Code:
#if 1
    // This code assumes that the serial registers are setup and TX and RX are enabled but the interrupt is not... 
    for (int j = 0; j < 32; j++) {
        while  (!(UART0_S1 & UART_S1_TDRE)) ;   // Wait for uart1 to say it has room to put
        UART0_D = data[j];
    }
 
#else    
    Serial1.write(data, 32);
#endif
This may be specific for LC and for serial1 and...
screenshot.jpg
But with it and baud rate of 3000000 the time it took to output was about: 5.33ms

I tried building for 6mbs, but the output still shows it is going out at 3mbs and same speed as 3mbs
 
I think the max divisor for serial is 16, so at 48Mhz you are limited to 3Mhz serial.

Edit, actually it's more a minimum then a max.
 
Not sure if anyone would be interested, but I hacked up the test program to output full TX using DMA... It worked on TLC, edited for T2... (Assume still works on TLC...)
Code:
#define BUFFER_SIZE 1600
uint8_t data[BUFFER_SIZE];  // larger buffer
#include "DMAChannel.h"


DMAChannel dmaTX;

#ifdef HAS_KINETISK_UART0_FIFO
#define C2_ENABLE    UART_C2_TE | UART_C2_RE | UART_C2_RIE | UART_C2_ILIE
#else
#define C2_ENABLE   UART_C2_TE | UART_C2_RE | UART_C2_RIE
#endif
#define C2_TX_ACTIVE    C2_ENABLE | UART_C2_TIE

void setup() {
  while (!Serial && (millis() < 2000)) ;
  Serial.println("Test Start");
  Serial1.begin(3000000);
  pinMode(2, OUTPUT);
  pinMode(3, OUTPUT);

  for (uint16_t i = 0; i < sizeof(data); i++) {
    data[i] = (i & 0xff);
  }

  // Try to setup dma channel
  dmaTX.disable();
  dmaTX.destination((volatile uint8_t&)UART0_D);
  dmaTX.attachInterrupt( uart0_dma_tx_isr );
  dmaTX.interruptAtCompletion( );
  dmaTX.disableOnCompletion( );
  dmaTX.triggerAtHardwareEvent( DMAMUX_SOURCE_UART0_TX );
  UART0_C5 = UART_C5_TDMAS; // enable DMA on TX
  Serial.println("End of setup");
}

void uart0_dma_tx_isr() {
  digitalWriteFast(3, !digitalReadFast(3));
  dmaTX.clearComplete();
  dmaTX.clearInterrupt( );  // You need to clear the interrupt to do again later.
  dmaTX.disable();
#ifdef KINETISK
  UART0_C2 = C2_ENABLE;
#endif  
}



void loop() {
  uint32_t start_time = millis();

  // We wish to output 1600 characters.
  digitalWriteFast(2, HIGH);
  dmaTX.sourceBuffer( data, sizeof(data ));   // May want as sourcBuffer...
#ifdef KINETISK
  UART0_C2 = C2_TX_ACTIVE;
#endif
  dmaTX.enable();   // Start a transfer
  digitalWriteFast(2, LOW);
  // quick and dirty..
  delay(500 - (millis() - start_time));
}

Output Took 5.33ms on both TLC and 3.2 at 3mhz, the time that the setup for the DMA took about 1us... Have interrupt that shows when then dma output is almost complete... I mean almost as the dma interrupt is for the last byte to be put into the output queue, not when it actually output is fully output on the TX pin. On TLC that is about one character time, where as with T3.x (Serial1/2), it is the buffer size or 4 bytes...

screenshot.jpg
 
Thank you very much for your help.
@KurtE I implemented your bypass solution as shortcut in serial_putchar() like it is done in HardwareSerial.cpp in Arduino cores.
Code:
void serial_putchar(uint32_t c)
{
	uint32_t head, n;

	if (!(SIM_SCGC4 & SIM_SCGC4_UART0)) return;
	if (transmit_pin) transmit_assert();

	// start of editing
	if (tx_buffer_tail == tx_buffer_head && (UART0_S1 & UART_S1_TDRE)) {
		if(use9Bits) UART0_C3 = (UART0_C3 & ~0x40) | ((c & 0x100) >> 2);
		UART0_D = c;
		return;
	}
	// end of editing

	head = tx_buffer_head;
	if (++head >= SERIAL1_TX_BUFFER_SIZE) head = 0;
	while (tx_buffer_tail == head) {
		int priority = nvic_execution_priority();
		if (priority <= IRQ_PRIORITY) {
			if ((UART0_S1 & UART_S1_TDRE)) {
				uint32_t tail = tx_buffer_tail;
				if (++tail >= SERIAL1_TX_BUFFER_SIZE) tail = 0;
				n = tx_buffer[tail];
				if (use9Bits) UART0_C3 = (UART0_C3 & ~0x40) | ((n & 0x100) >> 2);
				UART0_D = n;
				tx_buffer_tail = tail;
			}
		} else if (priority >= 256) {
			yield();
		}
	}
	tx_buffer[head] = c;
	transmitting = 1;
	tx_buffer_head = head;
	UART0_C2 = C2_TX_ACTIVE;
}
I will test whether there is enough gain in speed with this shortcut tomorrow, probably and post my results.
Edit: There are two lines missing in the above code:
Code:
if (tx_buffer_tail == tx_buffer_head && (UART0_S1 & UART_S1_TDRE)) {
	if(use9Bits) UART0_C3 = (UART0_C3 & ~0x40) | ((c & 0x100) >> 2);
	UART0_D = c;
	transmitting = 1;
	UART0_C2 = C2_TX_ACTIVE
	return;
}
As we need to enable TC interrupt to deassert the transmit pin again and set transmitting to 0, this is no good solution, so I have to use the hard looping or the dma solution.
 
Last edited:
I'm hesitant to give the impression on the website that Teensy LC supports very high baud rates, even if the UART can technically can be configured at those speeds. My philosophy is casual reading of the info should give an honest idea of the real capability. Showing very high speeds where the supporting code can't really manage to really keep up the pace isn't the way I like to do things.
 
I'm hesitant to give the impression on the website that Teensy LC supports very high baud rates, even if the UART can technically can be configured at those speeds. My philosophy is casual reading of the info should give an honest idea of the real capability. Showing very high speeds where the supporting code can't really manage to really keep up the pace isn't the way I like to do things.

What about 1.5Mhz, 3Mhz, and 6Mhz for T3.1-T3.6?
 
Status
Not open for further replies.
Back
Top