Is there a faster alternative to Serial1.write? (4.1)

intj

Member
I'm sampling a port every 1 µs using IntervalTimer. In my main loop I'm processing the samples and for the most part each sample is processed right after the ISR collected it. It's acceptable if the processor falls behind as it can quickly catch up again, it can process about 4 samples between interrupts where a new sample is created. I also need to print data to the serial port on occasion. I'm using Serial1 for this (Serial1.write) and I'm trying to write one character between processing the samples. My main loop does only 2 things: A) Check if there is a new sample that needs to be processed and if yes do so, B) see if there is any data that needs to be written to the serial port and if yes, write the next byte. Right now I'm just using anywhere from 1 to 4 bytes with long pauses in-between so I'm not worried about the FIFO and I doubt it matters but the baud rate is set to 230,400 baud.

I have 2 issues/questions.
1: Writing a byte to the hardware serial port seems to take unusually long, about 1 µs. I can run a ton of code in that amount of time so I'm not sure why this takes so long. Is there a faster way? I don't care about RX on this port and I don't even have to use the FIFO, I just want to write one byte and start the UART transmission.

2: When I'm writing the byte right now, sometimes it delays my 1 µs IntervalTimer interrupt by a small amount. My interrupt cannot be delayed. The serial write is a low priority for me, I don't care if that gets delayed. Is there a way to reduce the priority of the serial function and/or increase the priority of the IntervalTimer ISR?

Many thanks for any help!
 
There are some places in the hardware serial code, including write, where interrupts are disabled briefly. If your timer interrupt is your highest priority, I suggest raising its priority to 0 (highest) as shown below rather than changing the priorities of the UART interrupts. When you say "sampling a port" do you mean reading digital inputs or analog inputs? I can't really comment on the execution time issue without seeing your code. It would help if you put together a small program that contains the basics of your program, i.e. the IntervalTimer and its handler, and a simple loop() that shows how you are processing the data read in the timer handler, how you are writing to UART, and where/how you are measuring execution time.

Code:
  mytimer.begin( handler, period );    // configure timer
  mytimer.priority( 0 );            // set highest priority
 
You can increase like this:

sample timer.priority(64); // increase priority of sample timer interval timer

Lower priority numbers actually mean higher priority. Think of it this way: higher numbers have to wait longer than lower numbers.

Priority 64 should mean that your timer interrupt handler won't be interrupted by the hardware serial port. However, the hardware serial ports do disable interrupts occasionally--usually when accessing the serial port read and write buffers (which are separate from the FIFOs).

If you are doing anything more complex than Serial1.write(mybyte) , such as Serial1.print(mybyte), you will have many clock cycles of delay during the format conversion.
 
At 230400 baud, each bit takes 4.34us. Each byte in 8N1 format requires 43.4us.

I don't have a clear idea of your code, but you did say "I'm sampling a port every 1 µs using IntervalTimer" and "I'm trying to write one character between processing the samples". You didn't say how many samples you process between each byte, but if it's less than 44 samples of course you can expect Serial1.write() to need to wait when the buffer inevitably fills up.

Maybe this is relevant or maybe not. Difficult to give more specific advice when all we have it a vague description of your program. Would be much better if you showed us the real code. Best would be a small but complete program which allows anyone to reproduce the problem by copying into Arduino IDE and uploading to a Teensy.
 
Thanks a lot for looking into this and I will try the timer.priority as it may solve one of my issues. The main issue for me is how long it takes to write a byte to UART.

Below is the example code, it writes just one byte per second to Serial2:

C:
#include <Arduino.h>
#include <stdint.h>
#include <inttypes.h>

#define TEST_PIN_YEL 27
#define TEST_PIN_GRN 25
#define TEST_PIN_BLU 32

#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)

IntervalTimer ttimer;

volatile uint32_t in32[256];
volatile uint8_t in_i;
volatile uint8_t out_i;

void timer_isr(void) {
    digitalWrite(TEST_PIN_YEL, HIGH);
    in32[in_i++] = IMXRT_GPIO6_DIRECT;
    digitalWrite(TEST_PIN_YEL, LOW);
}

void setup(void) {
    in_i = 0;
    out_i = 0;

    //Test pins
    pinMode(TEST_PIN_YEL, OUTPUT);    //ISR
    pinMode(TEST_PIN_GRN, OUTPUT);    //Process sample
    pinMode(TEST_PIN_BLU, OUTPUT);    //TX UART char

    //Serial2
    pinMode(8, OUTPUT);
    pinMode(7, INPUT);

    Serial2.begin(230400);

    pinMode(23, INPUT);
    pinMode(22, INPUT);
    pinMode(21, INPUT);
    pinMode(20, INPUT);
    pinMode(19, INPUT);
    pinMode(18, INPUT);
    pinMode(17, INPUT);
    pinMode(16, INPUT);
    pinMode(15, INPUT);
    pinMode(14, INPUT);
    pinMode(41, INPUT);
    pinMode(40, INPUT);
    pinMode(39, INPUT);
    pinMode(38, INPUT);

    ttimer.begin(timer_isr, 1);
}

void loop(void) {
    static uint32_t sample_count;
    sample_count = 0;

    while (1) {

        if (in_i != out_i) {

            digitalWrite(TEST_PIN_GRN, HIGH);
            //process sample here
            out_i++;
            sample_count++;
            digitalWrite(TEST_PIN_GRN, LOW);

            if (sample_count > 1000000) {
                sample_count = 0;

                digitalWrite(TEST_PIN_BLU, HIGH);
                Serial2.write(0x55);
                digitalWrite(TEST_PIN_BLU, LOW);

            }
        }
    }
}

Here is a screenshot from the scope showing the test pins (yellow = ISR, green = process sample, blue = Serial2.write):

teensy.jpg
 
Try using digitalWriteFast() rather than digitalWrite(). I can't remember the execution times, but it's much faster. I think the least intrusive way to measure execution time is to read the running cycle counter ARM_DWT_CYCCNT before and after Serial2.write(), compute the delta and print to USB serial. Be sure to use uint32_t variables.
 
post #5 code looks to send one Serial2 character every Million samples? Once per second?

If that is the case then the UART buffers are not being over filled causing any wait or loss.

As p#4 notes the transfer time at 230Kb takes a fixed time - though with empty buffer ideally the .Write() should take just enough time to register the Byte into the Tx buffer. Though the UART finding the buffered byte may trigger the Tx interrupt causing delay to perform that - which seems to be a feature to start the transfer sooner. But that byte fits the FIFO and after it goes, nothing else needs to be done?

Does the timing change if the UART baud rate is set to double or ten times?
 
1 us for a timer interrupt is very short.
If the T4.1 runs at 600 MHz, then maybe you have a few cycles of CPU time before the next timer interrupt strikes. If running at 150 MHz CPU Speed, then expect no time left at all for the code in loop()...


The call to serial2.write() takes quite a few instruction steps, and triggers a LPUART interrupt. Processing that interrupt then takes time also, and when the LPUART interrupt priority has same or higher priority as the timer interrupt, then it will be blocking the timer interrupt.

Changing all your DigitalWrite into DigitalWriteFast does help a little bit. Because (with the 600 Mhz CPU Speed) the 1 us interrupt service then needs ~100 ns less to toggle the output pin.

Modifying the interrupt priority settings also helps.
Try this in setup instaed of the ttimer.begin(timer_isr, 1) line:
Code:
    ttimer.begin(timer_isr, 1);
    Serial.print("default priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

    ttimer.priority (3*16); //  lower number = higher priority
    Serial.print("modified priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

//    Serial.print("Prio LPUART6 (serial1) is ");
//    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART6));
    Serial.print("Prio LPUART4 (serial2) is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART4));

Back to the first question: Is there a faster alternative to Serial1.write? (4.1) - yes, using DMA_UART instead of the default HardwareSerial().
Or, as you're not really interested in checking UART TX register (or FIFO) is empty, by just writing the 8 bits word directly into the relevant UART data register. For serial2 int T4 that is LPUART4. Address 0x40190000, offset 32.

As in:

Code:
#include <Arduino.h>
#include <stdint.h>
#include <inttypes.h>

#define TEST_PIN_YEL 27
#define TEST_PIN_GRN 25
#define TEST_PIN_BLU 32

#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)

IntervalTimer ttimer;

volatile uint32_t in32[256];
volatile uint8_t in_i;
volatile uint8_t out_i;

void timer_isr(void) {
    digitalWriteFast(TEST_PIN_YEL, HIGH);
    in32[in_i++] = IMXRT_GPIO6_DIRECT;
    digitalWriteFast(TEST_PIN_YEL, LOW);
}

void setup(void) {
    in_i = 0;
    out_i = 0;

    //Test pins
    pinMode(TEST_PIN_YEL, OUTPUT);    //ISR
    pinMode(TEST_PIN_GRN, OUTPUT);    //Process sample
    pinMode(TEST_PIN_BLU, OUTPUT);    //TX UART char

    //Serial2
//    pinMode(8, OUTPUT);
//    pinMode(7, INPUT);

    Serial2.begin(230400);

    pinMode(23, INPUT);
    pinMode(22, INPUT);
    pinMode(21, INPUT);
    pinMode(20, INPUT);
    pinMode(19, INPUT);
    pinMode(18, INPUT);
    pinMode(17, INPUT);
    pinMode(16, INPUT);
    pinMode(15, INPUT);
    pinMode(14, INPUT);
    pinMode(41, INPUT);
    pinMode(40, INPUT);
    pinMode(39, INPUT);
    pinMode(38, INPUT);

    ttimer.begin(timer_isr, 1);
    Serial.print("default priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

    ttimer.priority (3*16); //  lower number = higher priority
    Serial.print("modified priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

//    Serial.print("Prio LPUART6 (serial1) is ");
//    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART6));
    Serial.print("Prio LPUART4 (serial2) is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART4));
 
    LPUART4_CTRL &= ~(LPUART_CTRL_TIE | LPUART_CTRL_TCIE); // disable TX interrupts
}


#include <imxrt.h>

void Serial2WriteFast (uint16_t x)
{
  LPUART4_DATA = x;
}

void loop(void) {
    static uint32_t sample_count;
    sample_count = 0;

    while (1) {

        if (in_i != out_i) {

            digitalWriteFast(TEST_PIN_GRN, HIGH);
            //process sample here
            out_i++;
            sample_count++;
            digitalWriteFast(TEST_PIN_GRN, LOW);

            if (sample_count > 1000000) {
                sample_count = 0;

                digitalWriteFast(TEST_PIN_BLU, HIGH);

                Serial2WriteFast (0x55);
//                Serial2.write(0x55);

                digitalWriteFast(TEST_PIN_BLU, LOW);

            }
        }
    }
}
 
To my surprise changing to digitalWriteFast made a huge difference, increasing the performance by nearly 300%.
I will try the DMA_UART too and report back. The green test pin is so fast now my scope can't even capture it anymore :D probably need to use the little ground spring at this point.
On a side-note I've been really impressed with the Teensy 4.1. I originally started with an ESP32 and it was hopeless. Then I tried a Teensy 3.2 that I had laying around and I got the 1 µs interrupt to work but there was no time left for processing but still impressive. Then I bought the Teensy 4.1 and yeah, just really impressive.

Here is a new screenshot, only changed to digitalWriteFast:

1711826035834.png
 
1 us for a timer interrupt is very short.
If the T4.1 runs at 600 MHz, then maybe you have a few cycles of CPU time before the next timer interrupt strikes. If running at 150 MHz CPU Speed, then expect no time left at all for the code in loop()...


The call to serial2.write() takes quite a few instruction steps, and triggers a LPUART interrupt. Processing that interrupt then takes time also, and when the LPUART interrupt priority has same or higher priority as the timer interrupt, then it will be blocking the timer interrupt.

Changing all your DigitalWrite into DigitalWriteFast does help a little bit. Because (with the 600 Mhz CPU Speed) the 1 us interrupt service then needs ~100 ns less to toggle the output pin.

Modifying the interrupt priority settings also helps.
Try this in setup instaed of the ttimer.begin(timer_isr, 1) line:
Code:
    ttimer.begin(timer_isr, 1);
    Serial.print("default priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

    ttimer.priority (3*16); //  lower number = higher priority
    Serial.print("modified priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

//    Serial.print("Prio LPUART6 (serial1) is ");
//    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART6));
    Serial.print("Prio LPUART4 (serial2) is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART4));

Back to the first question: Is there a faster alternative to Serial1.write? (4.1) - yes, using DMA_UART instead of the default HardwareSerial().
Or, as you're not really interested in checking UART TX register (or FIFO) is empty, by just writing the 8 bits word directly into the relevant UART data register. For serial2 int T4 that is LPUART4. Address 0x40190000, offset 32.

As in:

Code:
#include <Arduino.h>
#include <stdint.h>
#include <inttypes.h>

#define TEST_PIN_YEL 27
#define TEST_PIN_GRN 25
#define TEST_PIN_BLU 32

#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)

IntervalTimer ttimer;

volatile uint32_t in32[256];
volatile uint8_t in_i;
volatile uint8_t out_i;

void timer_isr(void) {
    digitalWriteFast(TEST_PIN_YEL, HIGH);
    in32[in_i++] = IMXRT_GPIO6_DIRECT;
    digitalWriteFast(TEST_PIN_YEL, LOW);
}

void setup(void) {
    in_i = 0;
    out_i = 0;

    //Test pins
    pinMode(TEST_PIN_YEL, OUTPUT);    //ISR
    pinMode(TEST_PIN_GRN, OUTPUT);    //Process sample
    pinMode(TEST_PIN_BLU, OUTPUT);    //TX UART char

    //Serial2
//    pinMode(8, OUTPUT);
//    pinMode(7, INPUT);

    Serial2.begin(230400);

    pinMode(23, INPUT);
    pinMode(22, INPUT);
    pinMode(21, INPUT);
    pinMode(20, INPUT);
    pinMode(19, INPUT);
    pinMode(18, INPUT);
    pinMode(17, INPUT);
    pinMode(16, INPUT);
    pinMode(15, INPUT);
    pinMode(14, INPUT);
    pinMode(41, INPUT);
    pinMode(40, INPUT);
    pinMode(39, INPUT);
    pinMode(38, INPUT);

    ttimer.begin(timer_isr, 1);
    Serial.print("default priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

    ttimer.priority (3*16); //  lower number = higher priority
    Serial.print("modified priority ttimer is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_PIT));

//    Serial.print("Prio LPUART6 (serial1) is ");
//    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART6));
    Serial.print("Prio LPUART4 (serial2) is ");
    Serial.println(NVIC_GET_PRIORITY(IRQ_LPUART4));
 
    LPUART4_CTRL &= ~(LPUART_CTRL_TIE | LPUART_CTRL_TCIE); // disable TX interrupts
}


#include <imxrt.h>

void Serial2WriteFast (uint16_t x)
{
  LPUART4_DATA = x;
}

void loop(void) {
    static uint32_t sample_count;
    sample_count = 0;

    while (1) {

        if (in_i != out_i) {

            digitalWriteFast(TEST_PIN_GRN, HIGH);
            //process sample here
            out_i++;
            sample_count++;
            digitalWriteFast(TEST_PIN_GRN, LOW);

            if (sample_count > 1000000) {
                sample_count = 0;

                digitalWriteFast(TEST_PIN_BLU, HIGH);

                Serial2WriteFast (0x55);
//                Serial2.write(0x55);

                digitalWriteFast(TEST_PIN_BLU, LOW);

            }
        }
    }
}
Thank you sicco so much! I tried your LPUART4_DATA, and I didn't think it could get much faster but it got way faster yet again, by 200%, wow.

1712194073164.png


I'm just so impressed with this little board that I bought 2 more because I'm sure I'll use them at some point, many thanks!

1712194474363.png
 
I have one more update, to show the effects of the timer interrupt priority. Below is the actual program that's performing work and without any of the optimizations implemented from this thread. Yellow is the ISR that is sampling the input pin, green is processing the sample, and cyan is writing data to the serial port.
1712230524326.png

The thing to notice is the irregular interval of the ISR which I can't have.
The ISR has a priority of 128 and the UART a priority of 64.
Now the same thing but the ISR priority was changed to 48:
1712231053924.png

They look perfect, and just to make sure I'm not seeing things, here is with persistent display turned on:
1712231290206.png

It's completely perfect as far as I'm concerned. I also tried priority 0 and it made even a slight bit better but beyond what I need so I will stick with 48 for now. This concludes what this thread was about and again many many thanks to you guys!
 

Attachments

  • 1712231458269.png
    1712231458269.png
    32.4 KB · Views: 60
  • 1712231692837.png
    1712231692837.png
    26 KB · Views: 50
  • 1712231810492.png
    1712231810492.png
    27.5 KB · Views: 59
Back
Top