Teensy 4.1: release thread from interrupt

I don't understand how this is useful for interrupts. Maybe you could explain in deeper detail?

Good morning Paul,

You are right. However, the problem is the same, but the synchronization technique needs to be a little different.

Perhaps it is not needed, but to recap the problem, it is the following and in specfically when this is called from loop, as is typical.


My original idea to solve this is probably the right one (apologies for the digression):

Where as it is now looping over __disable_irq() .... __enable_irq(); instead, check a flag and only when the flag is set indicating that there is something to retrieve, and only after retrieving the data (not peek), only then call __disable_irq() and only to change the flag. The ISR of course sets or increments the flag, depending on what semantic you want it to have.

And better still, and if feasible, don't use __disable_irq(), but instead only disable that one interrupt. I am not familiar enough with the internals (of the interrupt architecture on the arm) to know if that would be sufficiently robust.


Thank you Paul for picking up the discussion.

(Apology for the edit, it seemed to be that the typos made it unreadable.)
 
Last edited:
Full quote, for context.
As a general point, I think perhaps @DrM has unreasonably high expectations of the Teensyduino ecosystem. It is conceived as a hobbyist platform, owing its roots to Arduino. It is also a mainly open-source system, which results in very variable code quality.
Response to the first sentence only was
I am still flabbergasted by that remark, i.e. that because Arduino is amateur, therefore Teensy should be a disfunctional hack and amateur, too.
I'm not saying anything about what Teensy or Teensyduino "should be". I am saying that its roots are in Arduino, and I don't believe anyone would argue that the intent of Arduino was to bring embedded coding to a mass audience, i.e. hobbyists. I did not mention the word amateur.

You yourself have correctly observed that core elements of the code could be done better (which is the case for pretty much any piece of code), some of which have a detrimental impact on the applications you want to be able to write. Many, of course, are not so impacted.

I am 100% suggesting one possible reason for the variable code quality, in that it's been contributed by many people of differing levels of ability, and by the very nature of the "bazaar" model, has not necessarily been subjected to rigorous quality control. Some people write code and test harnesses for Continuous Integration purposes; some write decent quality code with few (but not zero) bugs, which nevertheless makes it into released libraries; and some are happy to submit a Pull Request for code which clearly hasn't received much testing at all.
Thank G-d for our Cathredal and the team here that attends and contributes.
Amen to the latter, but I am TBH getting a little disgruntled with the former.
A) That some of the rest of the arduino world might be amateur, is not an excuse for disabling what would otherwise be a fantastic platform.
I have zero idea what you're driving at here. Who said anything about "disabling" Teensy?
B) Loading the platform with a bunch of ill advised ad thoroughly amateur hacks, does not make it easier for the next amateur to pile their own unnecessary ill advised hack on top and expect it to anything more than intermittently blink an led.

I think we are better than that.
Darn right we are, but we could also be better than we currently are.

I believe the existing semi-cathedral approach does prevent the majority of the "bunch of ill advised ad thoroughly amateur hacks" from getting into the codebase, but equally it makes it stunningly frustrating to get improvements added, or even considered. I'm aware of at least one contributor who left the Teensy community in disgust for that very reason, as far as I could tell (though I'm equally aware he's back on the forum under another ID, and is also pretty bad at considering Pull Requests...).

As it stands, the only certain path to fixing your issues within Teensyduino is to fix them yourself, test them thoroughly, and put in a Pull Request. If you don't feel "familiar enough with the internals", then you need to provide the aforementioned sketch which demonstrates the issue, and hope someone else is motivated enough to do the fix, testing and PR.

I truly have no idea how the process of making Teensyduino better can itself be made better.
 
As it stands, the only certain path to fixing your issues within Teensyduino is to fix them yourself, test them thoroughly
Paul has been pretty amazing about this.

This one turns out to be pretty simple, I might take a look at it, or maybe Paul will get to it first. I have a very busy few days ahead of me.

The other thing, it is the internals of the interrupt architecture on the ARM, as regards to disabling a specific interrupt from loop() rather than its ISR, that I was referring to in the immediate preceding post to Paul. I would hope there is a way to do it atomically other than disabling global interrupts, but nothing surprises me.
 
By and large, I believe it is possible to disable specific interrupts, at least those for peripherals which use the NVIC (Nested Vectored Interrupt Controller); things like SysTick and PendSV are treated differently.

The Teensyduino imxrt.h file provides access in a fairly simple way, so you have the macro NVIC_DISABLE_IRQ(n) to disable IRQ n; you have to find the IRQ number from the Reference Manual, table 4-2. The act of disabling the IRQ is a single register write, so it's atomic, but of course there's a gotcha - we are not provided with an atomic way to read and disable an IRQ, so we can put it back the way it was.

Assuming @jmarsh is correct in post #96 (and I've always found him to be well-informed!) it would presumably be possible to implement a better macro using LDREX/STREX so we could write something like
C++:
bool wasEnabled = NVIC_READ_AND_DISABLE_IRQ(n);
// do some stuff here
if (wasEnabled)
    NVIC_ENABLE_IRQ(n);
 
bool wasEnabled = NVIC_READ_AND_DISABLE_IRQ;


Yes, atomic read and disable for an IRQ. Perfect.

However, that too is not something that should be looped over. The solution for simple producer-consumer scenarios is still a flag, and only synchronize the flag.
 
Hmm ... I'm not sure we're talking about one single thing here, it may be two separate issues.

Looking at the code you linked in post #101, I absolutely agree it's far from ideal for the application to loop fast while calling the HardwareSerialIMXRT::available() method. I can see a number of reasons for that assertion:
  • the time taken for the global disable / enable cycle, which increases latency beyond your acceptable limits
  • even at 1Mbps, characters only arrive every 10µs, during which 6,000 CPU cycles could be doing useful work
  • calling this in a fast loop will increase the probability that the issue is seen, but it will never be zero
    • of course, the fast loop might help show a flaw that would otherwise be hard to trace...
  • using global interrupts is Just Wrong here and most other places; the LPUART interrupts are the only ones that should be masked
    • with the caveat that no-one should be calling any of the methods from interrupt code, a point often overlooked by many
For that specific piece of code, there would be a couple of options:
  • maintain a separate available_characters value, and just return it when requested. I'm not a fan of that approach, because it duplicates a value you can calculate, which can give rise to a mismatch between the source values and the pre-calculated one
  • just disable the LPUART interrupts - that won't interfere with your ISR, or indeed anyone else's
For all cases where interrupts are disabled, they should be as specific as possible (i.e. only for the peripheral / source that might result in erroneous results if it fired during the calculation), and they should never be enabled, only restored to the prior state.

Switching to the (I think) separate issue of the NVIC atomic read and disable, I believe it's inherent in the LDREX / STREX architecture that a loop is required: if the STREX fails, you have to loop back, re-claim the memory area and try again. As noted in post #96, there's no built-in limit to the number of tries (though it could be coded in), so the time required could be arbitrarily long. However, I can't immediately see a need to disable an interrupt from within an ISR, so it only impacts foreground code, with a low probability, and thus maybe doesn't matter (counter-examples welcomed at this point).

I may have a crack at writing and testing NVIC_READ_AND_DISABLE_IRQ(), just to satisfy my curiosity if nothing else.
 
I am happy it is not in that driver, and embarassed or course, but also a little dissapointed tha the culprit is still not found
 
I am happy it is not in that driver, and embarassed or course, but also a little dissapointed tha the culprit is still not found

Culprit to what? We are so far off in the weeds, can you please tell us again what exactly is your question regarding USB Serial?
 
I previously ran timing measurements for basic operations, set a pin, read and set, interrupt latency, etc. I tried it using a scope and then using the cycle counter. The two methods agree.

It looks like in some instances the Teensy might be sporadically loosing about 50 instruction cycles. That is a lot for real time.

Aside, the UNO R4 looses about 200 cycles sporadically.
 
50 instructions is practically nothing at 600MHz, it's ~83 nanoseconds. It's likely less than the overhead of exception entry, and there's a systick interrupt that is constantly firing every 1ms.
This still isn't a practical problem unless you have a real-world scenario where it affects something.
 
a real-world scenario where it affects something.

Sure, easy,

The timing for the gates for a CCD, the timing for an ADC, similarly a time to digital converter for photon tagging, and simply counting pulses from a PMT.

The problem for the first two is partially resolved with the flexpw, it still needs interrupts in both cases and in the first it is limited because it is only a 16 bit counter.

What is the systick interrupt being used for?
 
Why am I doing with this Teensy, you might ask.

The project is to make open designs for high end instrumentation available and affordable. The collaborators in the designs are late career instrumentation physicists. We have had downloads by underfunded scientists on five continents. Some use our designs because they can program their experiments in the Arduino environment.

And of course the Teensy is simply the best of all of the arduinos for this purpose.
 
The timing for the gates for a CCD, the timing for an ADC, similarly a time to digital converter for photon tagging, and simply counting pulses from a PMT.
These are all timing/trigger problems, that can be solved with hardware timers. It's not clear what interrupt latency has to do with any of them.
Timers aren't limited to 16 bits; quad timers are 16-bits each with four timers in each module that can be cascaded, giving a maximum total of 64-bits.
 
I previously ran timing measurements for basic operations, set a pin, read and set, interrupt latency, etc. I tried it using a scope and then using the cycle counter. The two methods agree.

It looks like in some instances the Teensy might be sporadically loosing about 50 instruction cycles. That is a lot for real time.

Maybe the best path forward would be whatever extra work is needed to publish these timing measurement tests and clearly document how we should run them (eg, scope connected, other external hardware, etc).
 
These are all timing/trigger problems, that can be solved with hardware timers. It's not clear what interrupt latency has to do with any of them.
Timers aren't limited to 16 bits; quad timers are 16-bits each with four timers in each module that can be cascaded, giving a maximum total of 64-bits.

Simple example, let's say we run the convert signal for an ADC from a FlexPWM submodule. How do you start the SPI 30 nsec after the trailing edge of each CNVST pulse?

You can use an interrupt and setup the relative timing with other channel of the submodule that is not connected to a pin.
 
You can use an interrupt and setup the relative timing with other channel of the submodule that is not connected to a pin.
You don't need the interrupt / cpu intervention. That timer can be routed through the crossbar to trigger SPI directly.
Then SPI can be read by DMA to fill a buffer, only requiring CPU intervention when the buffer is full (and automatically switching to a new buffer to give the CPU plenty of time to process the old one).
 
@jmarsh, based on your post #96 I tried to make an atomic NVIC_READ_DISABLE_IRQ(n) function. However, it's not behaving when used with an actual NVIC_ICERx register - the STREXW claims the store didn't occur, but in fact it did (which AFAIK is not a valid result, according to ARM).

Code appended here - can you see any obvious error I've made? (If you define FAKE_IT then it tweaks the code to use RAM, which does appear to work as expected):
C++:
/*
 * Test of possible NVIC_READ_DISABLE_IRQ() implementation
 */
#include <arm_math.h>

#define noFAKE_IT

#if defined(FAKE_IT)
#undef NVIC_ICER0
#undef NVIC_ENABLE_IRQ
#undef NVIC_DISABLE_IRQ
#undef NVIC_IS_ENABLED

uint32_t dummy[20];
uint32_t& NVIC_ICER0 = dummy[0];

#define NVIC_ENABLE_IRQ(n)      (*(&NVIC_ICER0 + ((n) >> 5))  =  (1 << ((n) & 31)))
#define NVIC_DISABLE_IRQ(n)     (*(&NVIC_ICER0 + ((n) >> 5)) &= ~(1 << ((n) & 31)))
#define NVIC_IS_ENABLED(n)      (*(&NVIC_ICER0 + ((n) >> 5)) & (1 << ((n) & 31)))
#endif

/*
 * Attempt at atomic
 * "read of existing IRQ enable status and disable it".
 */
inline int NVIC_READ_DISABLE_IRQ(int n)
{
  volatile uint32_t* icer = (&NVIC_ICER0) + ((n) >> 5);
  uint32_t mask = 1 << ((n) & 31);
  int retval;
  int m=0;

  Serial.printf("Addr: %08X; mask: %08X; value: %08X; tries: ", (uint32_t) icer, mask, *icer);
  do
  {
    retval = (__LDREXW(icer) & mask) ? 2 : 0;
    asm("dmb");
#if defined(FAKE_IT)
    retval += __STREXW(0,icer); // have to fake it if it's a memory location
#else 
    // When using the NVIC registers, this returns a "locked out" result,
    // i.e. says the store did NOT take place, BUT in fact the IRQ has
    // been disabled
    retval += __STREXW(mask,icer);
#endif 
  } while (NVIC_IS_ENABLED(n) && m++ < 10);
  Serial.printf("%d; result=%d\n", m, retval);

  return retval;
}

#define IRQ_TEST IRQ_Reserved1

void printState(int irq, int wasEnabled = -1)
{
  if (wasEnabled >= 0 && (wasEnabled & 1))
    Serial.println("STREXW said store did not complete");
  Serial.printf("IRQ %d is %sabled\n", irq, NVIC_IS_ENABLED(irq)?"en":"dis");
}

void setup()
{
  while (!Serial)
    ;

  Serial.println("Start from disabled state:");
  printState(IRQ_TEST);
  int wasEnabled = NVIC_READ_DISABLE_IRQ(IRQ_TEST);
  printState(IRQ_TEST, wasEnabled);
  if (wasEnabled & 2)
    NVIC_ENABLE_IRQ(IRQ_TEST);
  printState(IRQ_TEST);

  Serial.println("\nStart from enabled state:");
  NVIC_ENABLE_IRQ(IRQ_TEST);
  printState(IRQ_TEST);

  wasEnabled = NVIC_READ_DISABLE_IRQ(IRQ_TEST);
  printState(IRQ_TEST, wasEnabled);
  if (wasEnabled & 2)
    NVIC_ENABLE_IRQ(IRQ_TEST);
  printState(IRQ_TEST);

  Serial.println("\nDisable:");
  NVIC_DISABLE_IRQ(IRQ_TEST);
  printState(IRQ_TEST);
}

void loop() {
  // put your main code here, to run repeatedly:

}

I realise this may well be diverging from OP's original issue...
 
I think it's safe to say the System Control Space range invokes undefined behaviour.
You don't need to do this though, since there are NVIC registers that let you enable/disable specific interrupts with direct writes rather than read-modify-write sequences (that could be interrupted).
 
But ... if you need to restore the NVIC IRQ enable flag to its previous state, you need to read it and then modify it ... and an interrupt could occur between those two accesses resulting in the read-back value being stale.

Most code fails to restore the previous state, just enables an interrupt that it has previously disabled, without regard to its state prior to the disable.
 
Back
Top