Teensy 4.1: release thread from interrupt

would any of these help?
I think, not really.

But I found something interesting in the processor reference manual in "49.6.1.7 LPUART Status Register (STAT)", the field labeled RXEDGIF. Am I reading that correctly? is that an interrupt on the start of recieving a character?
 
But I found something interesting in the processor reference manual in "49.6.1.7 LPUART Status Register (STAT)", the field labeled RXEDGIF. Am I reading that correctly? is that an interrupt on the start of recieving a character?
That allows for an interrupt on an active edge on the RX pin. If you want an interrupt on receipt of a byte, you would use RIE.
 
Some few things and places across the CORES code disables interrupts. Interrupts are not LOST - only queued until re-enabled with the encompassed 'delay' during that 'brief' time.
The code tree there is a file cores/teesy4/util/atomic.h. I think this is not used in the T4 code, but it is an interesting bit of history.

It seems to be a carry over from the AVR which on quick perusal seems to lack exclusive store and load instructions. It is a kluge using instructions for global interrupts and priority. It seems some where quite happy that it was ported to the M3. But the M3 does have STREX/LDREX.
 
That allows for an interrupt on an active edge on the RX pin. If you want an interrupt on receipt of a byte, you would use RIE.
Aha! Yes indeed that is it.

A quick grep indicates that it is not referenced in the code. Does the driver use DMA instead?

It is a little convoluted to find it and I think the text doesnt mention it.

CTRL[RIE] (page 2928) enables interrrupts from STAT[RDRF] (register full, page 2924, last line) and STAT[RDRF] is set either on a byte or when the FIFO crosses WATER[RXWATER].

So, that's simple enough. It would be a wonderful driver. Why isn't it used?
 
That time delay is the problem. It also accounts for the jitter in interrupt latency that we discussed in another thread. And that is s severe limitation. You simply can't do hard real-time like that. We really aught to hunt all of these down, or all of those that might show up routinely as in serial, and fix them, and add conspicuous warnings to the documentation of all the others.
Is it not the case that the following will always need to be considered when assessing interrupt jitter:

1) fact: a higher priority interrupt (lower interrupt number on the Teensy) will always block a lower priority interrupt, and therefore, delay the execution/completion of the lower priority interrupt
2) fact: a higher priority interrupt will always suspend a lower priority interrupt, and therefore, delay the execution/completion of the lower priority interrupt

Because basic interrupt processing hierarchy exists, and because interrupt processing operates while respecting/enforcing those priority relationships, won't interrupt processing always thus have some jitter ?? And, this jitter will be potentially quite unpredictable, since you can't necessarily predict/control when your interrupt processing will be interrupted, by what, & how many times !! Changing the use of a global interrupt disable may reduce that particular source of jitter, but will certainly not eliminate jitter completely. The only way to guarantee no jitter is to be top dog on the interrupt hierarchy (NMI, in the old days . . . I'm showing my age).

Mark J Culross
KD5RXT
 
So.... in HardwareSerial.cpp, it seems the RIE is set and interrupts are connected, and perhaps "irq_handler()" is the code at


So, all we need to is use the STREX/LDREX to set a flag there, and in available() instead of disabling interrupts and checking the buffer, we check the flag. We only use STREX/LDREX to clear it, i.e. when we read bytes from the buffer.
 
Is it not the case that the following
When we design a real time system, what interrupts may occur and when is part of the design.

Having something that gratuitously and unnecessarily loops over disabling interrupts, will in many cases make it unnecessarily impossible to implement a given real time system on a given platform. That is the case here.
 
CTRL[RIE] (page 2928) enables interrrupts from STAT[RDRF] (register full, page 2924, last line) and STAT[RDRF] is set either on a byte or when the FIFO crosses WATER[RXWATER].

Why isn't it used?

Looks to me like it is used, via macro CTRL_ENABLE

Code:
#define CTRL_ENABLE       (LPUART_CTRL_TE | LPUART_CTRL_RE | LPUART_CTRL_RIE | LPUART_CTRL_ILIE)
#define CTRL_TX_ACTIVE              (CTRL_ENABLE | LPUART_CTRL_TIE)
#define CTRL_TX_COMPLETING    (CTRL_ENABLE | LPUART_CTRL_TCIE)
#define CTRL_TX_INACTIVE           CTRL_ENABLE
 
Not sure why you keep saying it loops over it when that only happens during yield(), which your program shouldn't be entering unless it's idle. And the duration of time that interrupts are disabled is practically nothing; looking at the C code it's probably less than a dozen opcodes, which is nothing compared to the overhead of exception entry.
 
There are about 6-7 uses of __disable_irq - IIRC only one is not on UART Buffer empty - that is Tx of a byte? :: if (head == tail) {
They are short snips of code that should take a dozen or two CPU cycles.
IIRC the UART FIFO's are 4 bytes and watermark is 2 bytes?
STREX/LDREX is great for read protection. Single value write could work in some cases - but the STREX/LDREX will trigger a repeating while() so some manipulations could be problematic like {ii++}, and may of the UART protections can manipulate both HEAD and TAIL values as well as sometimes storing a value at the new indicated location.

This code runs in only 36-39 cycles after Paul added the _ASM to save a couple cycles and it is more than run with IRQ disable:
Code:
uint32_t micros(void)
{
    uint32_t smc, scc;
    do {
        __LDREXW(&systick_safe_read);
        smc = systick_millis_count;
        scc = systick_cycle_count;
    } while ( __STREXW(1, &systick_safe_read));
    uint32_t cyccnt = ARM_DWT_CYCCNT;
    asm volatile("" : : : "memory");
    uint32_t ccdelta = cyccnt - scc;
    uint32_t frac = ((uint64_t)ccdelta * scale_cpu_cycles_to_microseconds) >> 32;
    if (frac > 1000) frac = 1000;
    uint32_t usec = 1000*smc + frac;
    return usec;
}
 
Last edited:
I am also very confused because here this started out as a discussion of using USB serial and is now discussing the details of the hardware UART serial driver... not to mention the original issue even earlier was how to wait for a flexPWM interrupt.

What's the practical issue here that can be physically demonstrated? Seems like a bunch of conjecture and hand-waving.
 
Looks to me like it is used, via macro CTRL_ENABLE

yes, we crossed messages. I found it and the hander. It looks like the patch is pretty doable, just takes a little care and attention to detail.

It is tricky though and quite easy to mess it up. We will have to think about it carefully.

I notice we have another experienced realtime person in the thread. So, maybe let's confer or collaborate?

I think the initial points of interest are the irq handler, i.e. the producer, available() and all of the consumers meaning every variation of read.

First thoughts are that the irq handler increments the flag, avalable() only reads the flag so it does not need any synchronization since we are not threading reads, and the reads can similarly check the flag, and then only need synchronization when they fetch the last character they are going to fetch and decrement the flag by that amount.

Something like that. Anybody have a better scheme?
 
may of the UART protections can manipulate both HEAD and TAIL values as well as sometimes storing a value at the new indicated location.

The trick is to design it so as to minimize the synchronizations and what is synchronized. But many of the tricks are not new.

Generally, when something is incremented by only one thread, it might not need to be synchronized.
 
Okay, I am signing off for the night. This was super productive I think. We now have a clue.
 
I am also very confused because here this started out as a discussion of using USB serial and is now discussing the details of the hardware UART serial driver... not to mention the original issue even earlier was how to wait for a flexPWM interrupt.

What's the practical issue here that can be physically demonstrated? Seems like a bunch of conjecture and hand-waving.

This got tacked onto another thread, I forgot how. Here is a summary of what is going on, or at least at seems to me:

The problem is that interrupt latencies are unreliable on the platform.

One very likely culprit is repeatedly globally disabling and enabling interrupts to synchronize access to the read buffer in Serial.available(), Serial. read() and etc. These are called from loop.

The file cores/teensy4/atomic.h documents that disabling interrupts for this purpose was a kluge that started with the AVR, The AVR lacks instructins for exclusive memory acces. There is no other way to support aomtic operations on memory in the AVR.

The ARM processors do have these instructions, LDREX and STEX. Most moden processors do, and scenarios such as synchronizing access to a buffer, is exactly for what those kinds of instructions were invented.

Perhaps it remains to convince you that abusing global interrupts in this way is a problem and not a matter of speculation or handwaving. But from our previous interactions, which I enjoy, I am guessing you know this.
 
But Serial is the object for the USB serial port, and you've been discussing the mechanics of the UART serial port... I find it hard to believe there's any practical limitation here other than one being imagined.
 
This is a very interesting discussion, I think it's possibly time to add my 2p (2¢ or equivalent in your local currency...)
1) fact: a higher priority interrupt (lower interrupt number on the Teensy) will always block a lower priority interrupt, and therefore, delay the execution/completion of the lower priority interrupt
2) fact: a higher priority interrupt will always suspend a lower priority interrupt, and therefore, delay the execution/completion of the lower priority interrupt
I think there's another point missing here, which strikes to the heart of the issue:
3) any code, whether foreground or in an ISR, has the ability to globally mask interrupts using __disable_irq(). An immediate consequence of that is that a lower-priority ISR can block a higher-priority one

As a general point, I think perhaps @DrM has unreasonably high expectations of the Teensyduino ecosystem. It is conceived as a hobbyist platform, owing its roots to Arduino. It is also a mainly open-source system, which results in very variable code quality.

A minor further issue is that although ostensibly a "bazaar" model (anyone can contribute), in practice there's a strong element of the "cathedral" model - everything that goes into a Teensyduino release is filtered via a single person, who even if he's doing nothing else cannot be expected to scrutinise every line submitted. There is thus a very cautious approach to significant changes, exacerbated by zero delegation - there is no mechanism for a Trusted Committee for Removal Of Interrupt Masking, or indeed any other desirable sweeping change.

On a specific note, @DrM, can you generate the canonical small sketch, compilable using the Arduino IDE, and runnable with minimal hardware, which demonstrates the issue you actually have? I'm guessing something like a hardware timer output to produce pulses at a known interval, a GPIO interrupt to capture them, and a separate (UART? USB serial?) process which demonstrably interferes with the capture due to using __disable_irq(). It would be a bonus if the interrupt-disabling process could be shown to stop working [reliably] if it leaves interrupts enabled.

I count 57 instances of __disable_irq() in the Teensy 4 cores, so fixing them all would definitely be a slog, but not an impossible goal.
 
The load and store exclusive instructions only work for normal memory. They don't work for addresses which are hardware registers or memory mapped peripherals, or in ARM-speak strongly ordered memory.

Maybe they can be put to good use in some places we currently disable interrupts, and maybe some of those interrupt disables are overly conservative and cautious in the first place. But to think these special instructions can replace all interrupt disable within code that manipulates hardware is probably overly optimistic. They only work with ordinary memory, not hardware registers.
 
Last edited:
The load and store exclusive instructions only work for normal memory. They don't work for addresses which are hardware registers or memory mapped peripherals, or in ARM-speak strongly ordered memory.

Hi Paul,

The exclusive load and store are used to protect a single fixed address in normal memory, which is then used to protect whatever you want to protect.

Think of it as functioning like a semaphore or mutex. In fact that is what it is. Except that instead of threads, we are only dealing with the isr and loop, or other isrs.
 
Actually they do work for any memory. The address used in the instructions is irrelevant because the cortex-m7 has a reservation size of 0xFFFFFFFF, i.e. the entire memory space. This simplifies the operations to this:
- the ldrex* instructions set the exclusive access flag
- clrex, any exception entry or the strex* instructions all clear the exclusive access flag
- strex* only succeeds if the exclusive access flag is set when it begins execution. The address used does not need to match the preceding ldrex* instruction.

They're really not a suitable replacement for semaphores or mutexes, and especially not appropriate to use in an ISR because you have to use them in a loop and it may take an indeterminate amount of time for the strex instruction to succeed - that's exactly the sort of problem that is trying to be avoided. They are designed more for atomically updating lists, or any other sort of read-modify-write operation.
 
Last edited:
As a general point, I think perhaps @DrM has unreasonably high expectations of the Teensyduino ecosystem.

I am still flabbergasted by that remark, i.e. that because Arduino is amateur, therefore Teensy should be a disfunctional hack and amateur, too. Thank G-d for our Cathredal and the team here that attends and contributes.

A) That some of the rest of the arduino world might be amateur, is not an excuse for disabling what would otherwise be a fantastic platform.

B) Loading the platform with a bunch of ill advised ad thoroughly amateur hacks, does not make it easier for the next amateur to pile their own unnecessary ill advised hack on top and expect it to anything more than intermittently blink an led.

I think we are better than that.
 
The address used does not need to match the preceding ldrex* instruction.
Super cool and really goog point, though what it might be used for in out case does not immediately come to mind.

In our case, an ISR servicing the incoming serial data and loop() checking or retrieving it, we probably want to stay with the usual construct, increment the contents of one memory location to indicate the data has been produced (added to the buffer), and decrement when it is consumed.

We might need to keep it as short as possible to avoid having loop() cause the ISR to miss something. Hence, skilful use, after pushing the charaacter onto the buffer, do the strex/ldrex thing to increment the flag.

Don't do the strex/ldrex thing to access the hardware or get the character from the fifor and push it onto the buffer. Do it afterwards with a single flag location, to signal you did it.


Correction: If we are only using the fifo, and not a second buffer, then things are simpler and more robust. (I need to look at the code more closely.) In that case, the ISR only needs to ascertain how many characters were added and then do the LDREX/STREX to set the flag. Characters are never lost except on overflow. Perfect.
 
Last edited:
P/S it occurs to me that if done properly, a lot of the cautions about what you can do in another isr, go away. It really is worthwhile to fix this.
 
The exclusive load and store are used to protect a single fixed address in normal memory, which is then used to protect whatever you want to protect.

Think of it as functioning like a semaphore or mutex. In fact that is what it is. Except that instead of threads, we are only dealing with the isr and loop, or other isrs.

I don't understand how this is useful for interrupts. Maybe you could explain in deeper detail?

To give some detail myself, I'm struggling to imagine how an always-enabled interrupt would handle the case where the semaophore/mutex is locked by the main program. For example, let's imagine simple hardware serial (not USB just to keep this simple) where the bytes are arriving at a fast baud rate. What do we do with those freshly arrived bytes? If we leave them in the hardware without clearing status flags, the interrupt will just keep repeatedly triggering without letting the main program run to finish up its work and release the semaphore. If we clear the interrupt status, the main program gets the run when we return, but we won't get another interrupt until even more data arrives. But if we didn't read the data before clearing the interrupt status, it'll very likely be overwritten as more data arrives. Maybe I'm just missing something or lacking imagination because I've always crafted interrupt code the normal way, but I just can't see how practical interrupt code would be written to handle this scenario where it gets to always run as soon as possible but lacks access the memory needed to actually store the incoming data.
 
Back
Top