Teensy4.1 MAX baud rate

MO_YA_NE

Member
Hi!
I have been teensydiuno for the third day.
I want to communicate with other MPUs at high speed using RS485 communication.
I set the teensy 4.1 hardware serial to 6Mbps, but higher values ​​didn't work.
Is there any way to get higher speeds?

Code:
void setup() {
//  Serial1.begin(3000000);     //3M OK
    Serial1.begin(6000000);     //6M OK
//  Serial1.begin(12000000);    //equal to 6M

}

void loop() {
  Serial1.write(0x55);
  delay(1000);
}
 
Currently the Serial code on T4.x is using the fixed 24mhz clock and probably the min divide is which gives your 6M...

I believe it would be possible to make it faster. I have never tried it:

In the cores\teensy4 directory:
startup.c - Fixes the Serial to 24mhz:
Code:
	// UARTs run from 24 MHz clock (works if PLL3 off or bypassed)
	CCM_CSCDR1 = (CCM_CSCDR1 & ~CCM_CSCDR1_UART_CLK_PODF(0x3F)) | CCM_CSCDR1_UART_CLK_SEL;
This could be adjusted to use the PLL3 clock which I think is 480mhz
So that register you would want to remove the CCM_CSCDR1_UART_CLK_SEL bit (Page 1062)
Then if you look at the clock tree: page 1017 you will see then The PLL3_SW_CLOCK / 6 gets passed in so : 80mhz.
We set the LCDIF_PRED field to 0 so divide by 1...

Then the HardwareSerial code needs to be updated to know about this. Which again would not be hard.

There is a define at line 63 of hardwareSerial.cpp: #define UART_CLOCK 24000000

That is used in:
Code:
void HardwareSerial::begin(uint32_t baud, uint16_t format)
{
	//printf("HardwareSerial begin\n");
	float base = (float)UART_CLOCK / (float)baud;
We should instead read the CCM_CSCDR1 and compute the clock which is not hard:
Simply see if the CCM_CSCDR1_UART_CLK_SEL bit is set then 24mhz else 80mhz / by the UART_CLK_PODF

Note the PODF if I read correctly is either 0(1) or 0x3f which is: 2^6 or 64, which I have not tried or used...

Then the rest should take care of it self... It may of course change the actual generated baud rates and how close some are...
Example 6mhz with a 24mhz base is 24/6=4.. but at 80mhz 80/6= 13.333
 
Thanks for your reply Kurt-san.
I hope I can get a higher baud rate with the method you suggested.

I'm still not familiar with the clock division of the IMXRT1060.
I'll try your ideas as I peruse the hardware manual.
 
Thank you Mr. KurtE ! It was exactly what you said!
I am very grateful for your advice.
I succeeded in raising the baud rate to 20Mbps.

I just rewrote the UART_CLK_SEL bit in the CCM_CSCDR1 register to 0.
Look at the code and the execution result.

Code:
void setup() {
//  Serial1.begin(3000000);     //3M OK
//  Serial1.begin(6000000);     //6M OK
    Serial1.begin(20000000);    //20MHz
    CCM_CSCDR1=105450240;       //UART_CLK_SEL bit set to 0
}

void loop() {
  Serial1.write(0x55);
  delay(1000);
}
20Mhz_uart.jpg
 
@PaulStoffregen - Wondering if I should update HardwareSerial to check the state of the CCM_CSCDR1
And properly configure the baud rate for the input clock of either 24 or 80mhz?

My guess is to leave the default for 24... Or could potentially add code that if the user asks for > 6mbs and no other active Hardware Serial ports, we could update that register as well.
Could probably check if the other hardware ports are configured and reconfigure, but run into issues like what happens if the queue is active or input coming or ...
 
> I succeeded in raising the baud rate to 20Mbps.

Have you found that this baud rate provides reliable data transfer?
 
I have not tried to send and receive actual data yet.
I just saw the output waveform of TX1.
I'm going to connect Teensy 4.1 with Renesas RX72T and Microchip PIC32MZ with RS485 and try the actual data communication.
3boad.jpg
 
I tried a program that connects teensy 4.1 and PIC32MZ and echo back.
Teensy 4.1 sends characters at 20Mbps and PIC32MZ just adds 1 and returns.
It looks working fine.
Code:
void setup() {
  pinMode(3, OUTPUT);
  
    while (!Serial) ;
  
//  Serial1.begin(3000000);     //3M OK
//  Serial1.begin(6000000);     //6M OK
    Serial1.begin(20000000);    //20MHz
    CCM_CSCDR1=105450240;       //UART_CLK_SEL bit set to 0
}

void loop() {
  char c1;
  Serial.printf("Press any key>");
  while(Serial.available()<=0){}
  c1=Serial.read();

  digitalWrite(3, HIGH);        //R/W=H
  Serial1.write(c1);
  Serial1.flush();
  digitalWrite(3, LOW);       //R/W=L

  while(Serial1.available()<=0){}
  c1=Serial1.read();

  Serial.printf("c1=%d\n\r",c1);    //
  
}
teensy_pic32_2.jpg
teensy_pic32_1.jpg
 
I think a reasonable test would be to send a block of data at full speed, read back an echo and compare to check for any errors.
 
NXP UARTs fine at 9.375 Mbits/sec in T3.5/T3.6 related project

FWIW, I have been involved in a project that drives Kinetis K60 and K66 microcontroller UARTs at their max rate (150MHz sysclock /16=9.375) across optical links through 50MHz-rated transceivers. In early prototyping we drove these speeds through 12-18 inch (30-45cm) flying leads between overclocked K64F dev boards, without detectable errors (in an electromagnetically challenging environment no less...). Our little protocol uses CRC32 to check packet integrity for packet sizes ranging from 6 to about 400 bytes, and have not ever seen any issues attributable to transmission errors. This relates to the Teensy in that the Teensy 3.5 is based on the K64F and Teensy3.6 is based on the K66.

Our system consists of 20 networked nodes, and they are stable and consistent. I have not tried the rt106x, being way busy with my day job. I would assume your tests should be successful even though the rt106x would be set to 8x oversampling rather then the 16x oversampling in the K6x (not configurable to the best of my knowledge).

As you run experiments with your units, I would be very interested in knowing whether you have difficulties.
 
I extended the harness length to 2.5m and experimented again.
I sent and received 20Mbps by 5000byte.

The result was all 5000 bytes OK.
It was carried out about 10 times, and the same result was obtained.


I am trying to apply this to a 250W-motor driver.
I am wondering if it is possible to communicate at 20Mbps in a high noise environment.

Code:
char Big_Data[5000];

void setup() {
  pinMode(3, OUTPUT);
  
    while (!Serial) ;
  
//  Serial1.begin(3000000);     //3M OK
//  Serial1.begin(6000000);     //6M OK
    Serial1.begin(20000000);    //20MHz
    CCM_CSCDR1=105450240;       //UART_CLK_SEL bit set to 0
}

void loop() {
  char c;
  int i,data_num;
  int ok_cnt=0;
    
  data_num=5000;
    
  Serial.printf("Press any key>");
  while(Serial.available()<=0){}
  c=Serial.read();
  Serial.printf("\n\r");

  digitalWrite(3, HIGH);        //R/W=H
  for(i=0;i<data_num;i++){
  Serial1.write(255*i/data_num);
  Serial1.flush();
  }
  digitalWrite(3, LOW);       //R/W=L

  for(i=0;i<data_num;i++){
  while(Serial1.available()<=0){}
  Big_Data[i]=Serial1.read();
  }

  for(i=0;i<data_num;i++){
  if(Big_Data[i]==255*i/data_num+1) ok_cnt++;
  Serial.printf("%d %d\n\r",Big_Data[i],ok_cnt);    //
  }
  Serial.printf("OK_Count=%d\n\r",ok_cnt);      //

  
}
rs_485_2.5m.jpgrs_485_2.5m_2.jpg
 
Using differential signalling is going to maximize your change of success for this, so lets hope it works first time!
 
Serial.write not working as expected (for me)

I got an unusual problem with a seemingly blocking call to write() while using Serial2 with 20MHz

I have put together a minimum working example to show my problem.
Futher down are several results shown, 20MHz, 16MHz and a comparision between 6MHz using both pll3_80m and osc_clk for the UART clock

For now I can confirm, that
- it does not depend on the Serial port, it is the same for all of them.
- it does not depend on the size of the buffer, that is sent, as long as the additional buffer is big enough (Serial2.addMemoryForWrite(addSndBuf, ADDBUFFERSIZE))
- it does not depend on wether an additional buffer is added
- it does depend on the Baudrate

I have not found anything specific for the teensy (4), but the arduino documentation talks about the .write function to be non-blocking.
"If there is enough empty space in the transmit buffer, Serial.write() will return before any characters are transmitted over serial"

But obviously it seems to be for different the Teensy. Even worse: for a lower baudrate the write function returns faster (seems to be true for 6+MHz)

Is there a way to make the Serial.write non-blocking and return immediatly?
I need these few micros desperatly for other stuff, as I need both, the fast transmission of data and low latency of Serial.write.
Also it is not the expected behaviour, for me. I am sorry, if I expect something that is not intented and if I missed a note somewhere. I am leaning onto arduinos notes here

Just to be clear about what I expect the behaviour to be:
I expected the write function to immediatly return, as the transmit buffer has enough free space.
That means sending should take 1-2 micros and flush about 70 to 80micros. Also the available bytes for write after the write function should be less than before, which would indicate unsent bytes.
The last point can be seen with lower baudrates, where most of the bytes get sent during flush(), as expected. Althoug the time for write() is still too high.

EDIT:
I have little experience with interrupts on the teensy, thus its just a thought: Would it be possible to achieve a fast returning write() by disabling the responsible interrupt right before write() and immediatelyactivate it afterwards?

Code:
// bare minimum for blocking Serial2.write

#define BAUD 20000000
#define ADDBUFFERSIZE (255)

uint8_t* addSndBuf;

uint8_t* buf2;
uint16_t bufLength2 = 132;

int microsA = 0;
int microsB = 0;
int microsC = 0;
int microsD = 0;

int A = 0;
int B = 0;
int C = 0;

void setup() {
   Serial.begin(115200);
  while (!Serial && millis() < 2000) {
    // wait up to 2 seconds for Arduino Serial Monitor
  }


  addSndBuf = new uint8_t[ADDBUFFERSIZE];
  Serial2.addMemoryForWrite(addSndBuf, ADDBUFFERSIZE);
  Serial2.begin(BAUD, SERIAL_8E1); 
  CCM_CSCDR1=105450240;       //UART_CLK_SEL bit set to 0

  buf2 = new uint8_t[bufLength2];
}

void loop() {
  microsA = micros();
  for (int i = 0; i < bufLength2; i++) buf2[i] = i;
  microsB = micros();
  A = Serial2.availableForWrite();
  Serial2.write(buf2, bufLength2);
  B = Serial2.availableForWrite();
  microsC = micros();
  Serial2.flush();
  C = Serial2.availableForWrite();
  microsD = micros();

  Serial.println("+++++++++++++++++++++++++");
  Serial.print("sending takes = "); Serial.println(microsC-microsB);
  Serial.print("flush takes   = "); Serial.println(microsD-microsC);
  Serial.print("logging takes = "); Serial.println(microsD-microsA);
  Serial.print("bytes to send = "); Serial.println(bufLength2);
  Serial.print("A = "); Serial.println(A);
  Serial.print("B = "); Serial.println(B);
  Serial.print("C = "); Serial.println(C);
  Serial.println("+++++++++++++++++++++++++");

  delay(100);   // more than enough time to send!
}

20 MBaud
Code:
+++++++++++++++++++++++++
sending takes = 72
flush takes   = 1
logging takes = 75
bytes to send = 132
A = 294
B = 294
C = 294
+++++++++++++++++++++++++

16 MBaud
Code:
+++++++++++++++++++++++++
sending takes = 25
flush takes   = 67
logging takes = 94
bytes to send = 132
A = 294
B = 201
C = 294
+++++++++++++++++++++++++

6 MBaud; pll3_80m
Code:
+++++++++++++++++++++++++
sending takes = 18
flush takes   = 219
logging takes = 238
bytes to send = 132
A = 294
B = 175
C = 294
+++++++++++++++++++++++++

6 MBaud; osc_clk
Code:
+++++++++++++++++++++++++
sending takes = 18
flush takes   = 225
logging takes = 245
bytes to send = 132
A = 294
B = 176
C = 294
+++++++++++++++++++++++++
 
I am not sure if the Arduino documentation is necessarily 100% correct. That is if you tell it to write 50 bytes and while doing so, an ISR happens that says feed me... it will grab the next byte(s) off of the queue during the isr...

You can always use the method: Serial2.availableforwrite()
which will give you a number of bytes free in the output queue, which is good number to not exceed if you don't want it to block...
 
Thanks for your fast response!
As you can see in my code, I already use availabeforwrite(), and it shows the whole buffer is empty prior calling write(), so thats no issue.
I agree, that arduino is most likely not correct on that regard. It is clear to me, that write() gets interrupted by the ISR, which explains its lengthy execution.

I have recently used a non-blocking I2C library, which works as expected. Every call is less than a micro. Isn't that possible for UART?
 
If you look at cores\Teensy4\HardwareSerial.cpp, the one-byte write() function calls write9bit(), and that's where all the work of writing to the UART registers is done. This function has a set of disable/enable calls that don't exist for the equivalent function in T3, and I don't know why. The one-byte write() function is called by the multi-byte write() function in Print.cpp.

The "problem" is that at 20 MHz, interrupts happen so fast that sending is complete by the time the multi-byte write() function returns. I did an experiment as follows. First, I commented out the disable/enable pair at the bottom of write9bit(), and added a pair of disable/enable around the while() loop in the multi-byte write(). With this configuration, all of the data gets written to the buffer before any transmit interrupts occur, and you get the behavior you want, as shown below. At lower baud rates, you get behavior more like this because the interrupt rate does not overwhelm the CPU. I think it would be "safe" to make these changes. Note that it does end up taking slightly longer for all of the data to actually be sent. This may be the tradeoff Paul was looking at. The code as-is provides the fastest serial throughput.

+++++++++++++++++++++++++
sending takes = 12
flush takes = 73
logging takes = 86
bytes to send = 132
A = 294
B = 167
C = 294
+++++++++++++++++++++++++
 
That looks good!
It sounds like thats about the idea I had in my head :D

I will try that myself later.
But for some real-talk now: does it really change anything? It seems the interrupts themself consume a lot of time, otherwise they would'nt all occur during the write() call, right?
When I "postpone" the interrupts to after write(), don't they cost me the same amout of time but spread across my other computational stuff?
Of course it's still helpfull as I can start a I2C or SPI transfer immediately after .write(), but I am still curious
 
does it really change anything?

That's a good question. Yes, there is no free lunch. The original code starts sending immediately on the call to write(). The modified code disables interrupts, and thereby prevents sending data, until all of the bytes are in the transmit buffer. However, with the modified code, even though you return to loop() in about 12 us, the processor will then be almost entirely consumed by TX interrupts until all bytes have been sent. Therefore, very little can happen in your main loop(). That's just how it is with interrupt-driven serial comm at 20 MHz. Did you gain anything? Not really, and in fact you consumed 10 or 11 us more of processing time by writing everything to the software buffer with interrupts disabled.
 
You are right, I have to see wether or not it is usefull. Like I said I could use it to start other communications while serial is busy.
It might be a stupid question, but why is UART interrupt-driven in the first place?
 
why is UART interrupt-driven in the first place?

Because the alternative is polling. Do you really need 20 MHz? If you use 1 MHz, the behavior will be (relatively) non-blocking, and you'll be able to do other things during the time TX is happening.

EDIT: One question someone else might be able to answer is whether DMA can be used with the UART to reduce/eliminate CPU interrupts.
 
@KurtE fleshed out/wrote most of the T_4.x serial code during Beta for the huge set of 7-8 Serial UARTS on the 1062, so would have better answers.

The FIFO's are smaller 4? byte on the T_4.x's 1062's more numerous UARTS. The T_3.x's may have larger 8 byte FIFO's - making for fewer interrupts - but the FIFO's are not on all ports - at least on T_3.2.

Interrupts are responsive as needed to keep the Rx and Tx data flowing without polling. Alternative might be DMA - but that is its own issue and wasn't done that way.
 
16 MHz might be enough too, but I need to send >1.5 MByte/sec.
I just tried something. Additional to your suggestion, of pulling the dis-/enable out of the single write call, I pulled "port->CTRL |= LPUART_CTRL_TIE;" out of there too. Its enough to enable the transmit interrupt once after all the bytes are in the buffer.
This makes the multi-byte write() much faster, it takes 8us now. Still a lot if you ask me, but I dont understand the rest of the write9bit function good enough to make any more assumptions about what is necessary and what not
 
Its enough to enable the transmit interrupt once after all the bytes are in the buffer.

That has the same effect as disabling interrupts while putting bytes in the software buffer, in terms of giving you non-blocking behavior for write(), but I'd say it's preferable, since interrupts are not disabled on every byte write(). It results in slightly less throughput for very high baud rates, but I would take that trade-off.

I assume the disable/enable calls at the end of write9bit() are there because of the setting of TIE, and all of the possible timing permutations and what might be in the hardware and software FIFOs. It might have been a "defensive" measure, and it would be interesting to hear from @KurtE as to why that is necessary for T4 and not for T3. The way this all works is different than I've seen for other platforms, where there is a more distinct separation between the software FIFO and the UART driver. There are definitely situations where I would want to be able to make calls to write() without causing interrupts to be disabled.
 
Just out of curiosity, I looked at the STM32 Arduino Core file HardwareSerial.cpp, and the approach there I think has some advantages.

For T4.x, the write() function in HardwareSerial.cpp takes a single uint8_t argument. The multi-byte write() function is in Print.cpp, and it has a while loop that calls the single-byte write() the appropriate number of times.

For STM32, the "primary" write() function in HardwareSerial.cpp is the multi-byte write(). The single-byte write() is in the same file, and it just calls the multi-byte write() with a size of 1. This makes more sense because, for example, the transmitter only needs to be enabled once, no matter the size of the write.

For T3.x, there is a putchar() function that writes only 1 byte. For UARTs with hardware FIFO, there is a multi-byte write() that is independent of putchar(), and for UARTs without hardware FIFO, the multi-byte write() simply calls putchar() in a loop.
 
Paul does take Pull requests. So if you feel like trying a different approach, go for it.
The code was developed knowing 24mhz clock going into Serial and goal of baud maybe up to 2meg, pushing to maybe 3 or 4...

Note: at one point I did have it handling the Print class write(buffer, cnt) and you can go back to that, the issue that you need for it to handle all of the cases.

Simple things like by default, it uses a circular buffer to hold the write data... But there is also an option to add secondary buffer, so instead of first wrapping around it continues on to the second buffer and when reaches second buffer, it goes back to start of first...

Needs to handle the Transmitter enable stuff and likewise half duplex. And ran into some complexities here where there was issue where last byte was put into the tx shift register and so fifo queue empty and sets up so next interrupt will be for queue empty, but during that time another character comes in and is put into FIFO but the trigger for it did not work so the Transmit enable pin was not set to right state... Note, that is the only reason why the interrupts are disabled is the updating of the port ctrl register was getting tromped on.

As for disabling the interrupts for longer period of time, I would try to avoid it. If you truly trying to run at 20mbs and you for example setup a 10K secondary buffer and say I want to output 4K to buffer, and you disable for the whole time, then if you are also receiving on this port or another at same rate, it would not take long to overrun your rx fifo and lose data.

Could probably hack in code to not disable IRQ to set the flag if we know that the queue has >N bytes in it. But not sure the extra code would be worth it.

Could maybe define a new option like: SERIAL_FAST_QUEUE which would put everything onto the queue before updating the head pointer and the TIE, would need to make sure to handle where you actually fill the buffer and then wait for space to come available, as to make sure that it is setup to do transfers...

Hope that helps

Kurt
 
Back
Top