Help with delayMicroseconds() and UART init, inside an IRQ_UART0_STATUS function

Status
Not open for further replies.

shawn

Well-known member
I'm seeing that, depending on the position of UART0 initialization, delayMicroseconds() either has an effect or it doesn't.

Some system details:
  1. The following code is running from within a IRQ_UART0_STATUS interrupt function.
  2. Signals and timings are being measured with a Saleae logic analyzer.
  3. Code snippet 2 takes 150us longer to react than code snippet 1. The only difference is the location of the uart_.begin() call.
  4. The test timing code around delayMicroseconds(150) prints an expected 150 or 151.
  5. Removing delayMicroseconds(150) from code snippet 2 causes it to execute in the exact same time, meaning that the presence or not of delayMicroseconds(150) has no effect.

Code snippet 1:
Code:
uint32_t eopTime = micros();
uart_.end();
uint32_t delay = 176;
uint32_t dt = micros() - eopTime;
uint32_t t = micros();  // Testing code
//if (dt < delay) {
//  delay -= dt;
  delayMicroseconds(150);  // Changed to 150 instead of 'delay', for illustration
//}
Serial.println(micros() - t); // Testing code, always prints 150 or 151
uart_.begin(baud1, format1);
constexpr uint8_t zero[1] = {0};
txFunc_(zero, 1);  // This writes directly to UART0 using the registers

Code snippet 2, takes approximately 150us longer:
Code:
uint32_t eopTime = micros();
uart_.end();
uart_.begin(baud1, format1);  // This line has moved
uint32_t delay = 176;
uint32_t dt = micros() - eopTime;
uint32_t t = micros();  // Testing code
//if (dt < delay) {
//  delay -= dt;
  delayMicroseconds(150);  // Changed to 150 instead of 'delay' for illustration
//}
Serial.println(micros() - t); // Testing code, always prints 150 or 151
constexpr uint8_t zero[1] = {0};
txFunc_(zero, 1);  // This writes directly to UART0 using the registers

I suspect that because I'm doing this inside an interrupt, things are affected in a way I don't understand. My next step is probably to write my own baud-setting routine, but I was hoping to use the convenience of uart_.begin().

I'm sure delayMicroseconds() works fine, however, the following two facts lead me to believe something is up around this function, maybe due to the compiler or something?
  1. The timing difference between the two code snippets is about 150us.
  2. Removing delayMicroseconds(150) has no effect.
 
The interesting parts for me are:
  1. Printing the time difference between two micros() calls wrapped around delayMicroseconds(150) always prints 150 or so for both code snippets, so I'm sure (assuming micros() is correct) that call is working, and
  2. Removing delayMicroseconds(150) from code snippet 2 has no effect on the timing.
 
The cycle counter could work into similar math to get those exact values. Not shown what Teensy? But there are F_CPU cycles per second at whatever compiled clock from T_3.2 to T_3.6. Using uint32_t's and subtracting 'diff = after - before' will return the elapsed number of cycles even across rollover.
 
Removing delayMicroseconds(150) from code snippet 2 has no effect on the timing.

I'm deeply skeptical of this. 150 us is pretty much forever in the context of this sort of code!

But the cold reality is I'm going to ignore this unless you post a complete program I can run here on a Teensy to actually observe the problem. These code fragments are pretty much worthless for getting to the bottom of whatever's wrong.
 
That's a fair request. I'll put something more complete together. I was hoping there'd be something obvious I'm not seeing, but yes, I'll create a runnable program.
 
Here is a sketch using cycle counts to show how long is spent in a measured code section, using delayMicroseconds.

This won't work on T_LC - not sure what Teensy or version of IDE or TeensyDuino are in use.

Code:
uint32_t cycles;
volatile uint32_t udiff = 0;
void setup()
{
  Serial.begin(115200);
  while (!Serial && millis() < 5000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);

  ARM_DEMCR |= ARM_DEMCR_TRCENA;
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;

[B]  // put this outside loop to measure a time period
  cycles = ARM_DWT_CYCCNT;
  { // measure time spent here
    delayMicroseconds( 150 );
  }
  udiff = ARM_DWT_CYCCNT - cycles;
[/B]
[U]  // This in loop will show period
  if ( 0 != udiff ) ShowDiff( );
[/U]}

void ShowDiff( ) {
  float fdiff;
  Serial.print( "\t took ");
  fdiff = udiff;
  fdiff = fdiff / (F_CPU / 1000000.0);
  Serial.print( fdiff );
  Serial.println();
  udiff = 0;
}

void loop() {
  // repeat test just to see over a range
  for ( int ii = 1; ii < 10; ii++ ) {
    cycles = ARM_DWT_CYCCNT;
    delayMicroseconds( ii * 50 ); //     delay( ii * 50 );
    udiff = ARM_DWT_CYCCNT - cycles;
    Serial.print( " us Delay = ");
    Serial.print( ii * 50 );
    Serial.print( "\t w/Cycles diff ");
    Serial.print( udiff );
    if ( 0 != udiff ) ShowDiff( );
  }
  Serial.println();
  delay( 4000 );
}
 
I've made a reproducible example and it can be found here: View attachment RDMResponderDelay.zip.

Update: Use Teensy 3.2’s. The “Teensy1” and “Teensy2” below refer to “Teensy#1” and “Teensy#2”.

This is a branch off my open-source TeensyDMX project to which I'm adding custom responder capabilities. I've trimmed much of the code so that it's smaller.

The steps to reproduce what I'm seeing
  1. Prepare Teensy1 and load it with examples/RDMSender/RDMSender.ino. This will cause it to send a basic RDM request over Serial1 every two seconds.
  2. Prepare Teensy2 with the main project. I've packaged it using PlatformIO structure, so either build with that or your own way.
  3. Connect Teensy1.TX1 to Teensy2.RX1. When Teensy2 sees the packet, it will send a simple response over Teensy2.TX1. Note that Teensy2.TX1 doesn't need to be connected to anything.
  4. With a logic analyzer, observe Teensy1.TX1 and Teensy2.TX1. In my setup, I have Teensy2.TX1 set to record on a negative edge, and by default, it captures 100ms before that as well, so the sent packet (Teensy1.TX1) can be seen as well. If you don't have a logic analyzer, I've attached two images (below) that show the two signals both with and without the delayMicroseconds(150) line. Channel 5 (bottom) is Teensy1.TX1 and Channel 4 (top) is Teensy2.TX1.
  5. You are observing the time difference between the end of the Teensy1.TX1 packet and the start of the Teensy2.TX1 packet.
  6. Comment out line 299 of Responder.cpp and re-run to see that the time difference doesn't really change. This line is the delayMicroseconds(150) call that executes just before the BREAK is sent from Teensy2.TX1. You should see a time difference that is almost the same as that produced by the code containing the delay.
  7. As a third experiment, try a larger delay, for example, 1000us. It is obvious that this has an effect, but short by about 200us. It's smaller values of the delay that appear to have no effect. It's almost as if values greater than about 200us subtract 200us.

The images
With the delay (284us): with-delay.jpg
Without the delay (279us): without-delay.jpg

What's happening
I'm using my own ISR's to process incoming serial data (Teensy2). When the start of a packet (the BREAK) is detected via a framing error, bytes are collected and passed to the Responder object (line 278 of Receiver.cpp), in this case an instance of RDMResponder. When it, via a state machine inside RDMResponder::processByte(), detects that a proper packet was received, RDMResponder::processPacket() is called and the packet data is returned to be sent over Teensy2.TX1.

The relevant code is in Receiver::receiveByte(), in Receiver.cpp. Once a packet is received from the Responder (again, line 278 of Receiver.cpp), it is sent directly to TX1 (line 303 of Receiver.cpp) by first sending a BREAK (the single zero, line 303 of Receiver.cpp) and then the packet data (line 312 of Receiver.cpp). delayMicroseconds(150) is called just before. So, in theory, that delayMicroseconds() call should affect the time difference between the last byte received from Teensy1.TX1 and sent from Teensy2.TX1.

Another experiment would be seeing the effect of relocating the uart_begin() call (line 287) to just below line 301 (i.e. relocating it). Notice that delayMicroseconds() has a slightly different and longer effect.

From my vantage point, relocating the pre-BREAK uart_.begin() call affects how delayMicroseconds() works. This probably has something to do with this being inside an interrupt, but that's my best guess. It looks mysterious, but maybe I'm missing something obvious.

Thank you to those willing to look at this. I know how solid the Teensy subsystems are, so I'm more inclined to believe this is my fault, I just don't understand what.
 
Last edited:
Could also use an RDM-capable DMX device and pass the signals via an RS485 converter. I feel this setup is simpler. :)
 
Oh, wait. No. If you mean Teensy 2.0, I’m referring to “Teensy #1” and “Teensy #2”. Use Teensy 3.2’s.
 
Thanks for the info - then if you were measuring time and questioning if interrupts are messing with Delay_us() the above code will work to see how long it is really taking.
Also interesting to know the version ide Arduino IDE and TeensyDuino sources.
 
PlatformIO v3.6.2a2, framework-arduinoteensy v1.144.0 (spelunking in the download, I found this: teensy31.build.flags.defs=-D__MK20DX256__ -DTEENSYDUINO=144).

I'll go try your timing code and see what happens...
 
[...time passes...]
It looks like delayMicroseconds(150) takes just about 150us, according to that cycle counter.
(Curious, why did you make udiff volatile but not cycles?)
 
Now this is interesting. When I put the cycle counter around the whole function, the delayMicroseconds() call makes no difference to the cycle count. Maybe the compiler is doing something unexpected... I just tried compiling with -O0 to no effect.
 
[...time passes...]
It looks like delayMicroseconds(150) takes just about 150us, according to that cycle counter.
(Curious, why did you make udiff volatile but not cycles?)

udiff I expected to be set within an interrupt perhaps - and referenced outside in loop(). By the same token - cycles is used locally in that interrupt.
 
PlatformIO v3.6.2a2, framework-arduinoteensy v1.144.0 (spelunking in the download, I found this: teensy31.build.flags.defs=-D__MK20DX256__ -DTEENSYDUINO=144).

Can you post both programs usable with Arduino?

Sorry, I really don't have time to mess with PlatformIO right now.
 
I believe the problem is ultimately due to a limitation in the UART hardware. After initializing the hardware, it seems to spend 1 character time doing some sort of internal initialization before anything you're tried to transmit actually goes onto the TX1 pin.

To see this in action, I put some digitalWriteFast lines into your code, to pulse pins 11 and 12 during the portions of the code which wait.

Code:
    // COMMENT OUT THE FOLLOWING LINE TO SEE THAT THERE'S NO EFFECT:
     digitalWriteFast(12, HIGH);
    delayMicroseconds(1);
    //delayMicroseconds(149);
     digitalWriteFast(12, LOW);

    setRXNotTX(false);
    constexpr uint8_t zero[1] = {0};
    txFunc_(zero, 1);
     digitalWriteFast(12, HIGH);
    delayMicroseconds(1);
     digitalWriteFast(12, LOW);

and in uart0_tx() function:

Code:
  // Wait for transmission complete
  digitalWriteFast(11, HIGH);
  while ((UART0_S1 & UART_S1_TC) == 0);
  digitalWriteFast(11, LOW);

With the delay set to only 1 microcsecond, here's what I see. Pin 12 is the red trace and pin 11 is the blue trace.

file1.png

If I uncomment the extra 149 us delay, it's easy to see the delayMicroseconds() function is having an effect.

file2.png

The red trace definitely is growing from a tiny 1 us pulse to a wide 150 us pulse.

I pretty sure the reason you're not seeing the UART waveforms change much is due to some sort of initialization or timeout the UART does at startup. I can't find any mention of this in Freescale's documentation. Maybe there's something we're doing in Serial1.begin() that isn't optimal? Or maybe this is simply the way this UART works? I just don't know why it does this.

But at least these waveforms can help with understanding what's determining this unexpected timing. The delayMicroseconds() function definitely is working.

Here's a quick photo of how I tested.

DSC_0261_web.jpg

I'll also attach the modified code, even though the edits are pretty trivial.
 

Attachments

  • forum_54269.zip
    25.7 KB · Views: 58
Thanks, Paul, for looking at this. This would explain why moving the uart_.begin() call to after the delay extends the total duration. It's almost as if UART initialization runs concurrently with other code?

I did some more investigating and narrowed the effect down to setting the UART_BDL register after setting the UART_BDH register. It looks like it takes roughly 200us for the baud rate to "lock in" (because of that single-character delay you mention). I'm currently looking for some state I can examine to determine when the UART_BDL register actually finishes taking effect. I'm hoping that one or some of the "transmit- or receive-ready" register bits indicate this fact.

Or maybe it takes time to enable the transmitter? I noticed that disabling UART_C2_TE takes approximately 41us. Maybe something similar happens when enabling it? Looking more...

[Update:] Or maybe the effect isn't seen until actual transmit via UART_D and UART_S1? The effect certainly seems delayed until UART_S1_TC is measured. Hmmm... I guess I just have to factor in one extra character of delay time.
 
Last edited:
It's almost as if UART initialization runs concurrently with other code?

Yeah, it does appear as if the UART is doing so sort of internal initialization that takes approx 1 byte time. Or maybe it's related to the idle detection time? I really do not know. To be honest, this is an aspect of the UART I've not really explored much. Usually Serial1.begin() is called just once in setup(), so most applications never see this sort of issue while running.

Since work is under way for Teensy 4, which will have completely different UART hardware, maybe this could be a good time to talk about what sort of features are missing from the normal HardwareSerial API, that require this sort of direct register access? We're going to keep making the 3.x hardware for quite some time. But the new hardware will offer so much more capability at approx the same price... so any code tightly tied to the UART hardware isn't going to be portable to vastly improved hardware.
 
I wonder from my (I admit) restricted point of view if there is still much need for UART communication out of slow legacy applications. If I understood well, CAN bus can handle multi-node communication in a much more efficient way. And as long as communication remains local, there are higher speed serial communication engines like SPI.

When I hear UART, my first association are the 75baud mechanical teletype machines which I had to repair during my military service in 1983...
 
Maybe in some distant future stage lighting gear will all migrate to CAN bus. But the reality of today's market is DMX, which is 8N2 serial at 250 kbps.
 
Maybe in some distant future stage lighting gear will all migrate to CAN bus. But the reality of today's market is DMX, which is 8N2 serial at 250 kbps.
Aaaah, thank you for enlarging my horizon! Didn't think of DMX. Now thinking of it, I would have expected that similar to Midi for everything audio, DMX could have moved to USB.
 
Status
Not open for further replies.
Back
Top