Teensy 4.1 Freeze (~1786ms)

Status
Not open for further replies.

UdoZ

Well-known member
Hi, I am currently trying the first of many programs from Arduino Due + Ethernet Shield to Teensy 4.1 + NativeEthernet, as Teensy 4.1 promises much better performance and capabilities. Unfortunately, I am hitting an issue, I cannot understand and would very much appreciate any input.
The programs does Home Automation, reading data from devices, installed throughout my home, using either Ethernet, KNX (Serial1) or RF (Serial 2). The original program has perfectly worked for several years. The program itself consists of many sub routines, which are mostly short but very project dependent. Posting the complete software here would at this stage not be helpful, because of its length, particularly as most subroutines are not even running/called, when the freeze happens.

Teensy acts both as Server and client, with a typical Loop() time of less than 40us, increasing to approx.. 150us, when the HueBridge is called for switching. All other processing is faster.

I am using EEPROM at the start to read project dependent parameters. These parameters can be changed trough a WebInterface (not active at this moment), so EEPROM is only accessed during Setup().

There is a singe timer ISR, which reads the KNX Bus (9600baud) every 60ms – an interrupt which takes approx. 300us at most, usually less than 25us. All other information is polled/processed inside Loop().

There is also an Interrupt on Pin 9, triggered when a command is received by RF (pre-processed by a Mini Pro), but that interrupt is very rare, very short and was never activated during those instances of “freeze”.

I am also accessing an OLED display by SDI, once per minute, usually only writing a few characters – an activity which does not happen during the Loop which shows those long freeze.

I have stated my concerns (issue #16) on https://github.com/vjmuzik/NativeEthernet/issues but received no response, hence I try here too, adding some additional information, I collected since then:

Initially (see my description on GitHub), I could detect that calling client.available() [both when acting as server or client] Teensy 4.1 froze for 1786.5ms. Not each time, but infrequently with an average frequency of approx. 1.5 per hour. In other words, NativeEthernet took such long time to provide the number of bytes read by the client.

Since then, for testing purposes, I removed all those connection to servers, which gave long responses, such reading out all data from my two Philips Hue Bridges. In addition, I added many timing markers, allowing me to measure the time different subroutines ran.
It showed me that this 1786ms freeze happens also outside calling client.available(), even outside any subroutine accessing actively at that time the Ethernet port.

I don’t know how NativeEthernet works in detail, I also do not know which house keeping Teensy might do in the background, therefore my current assumption is, that something is going on inside NativeEthernet (or Serial, Serial1, Serial2, SDI), maybe a buffer overrun or similar.

Before undertaking a further strip down to my program, even considering replacing NativeEthernet all together (instead using a SDI connected Ethernet shield), I would very much appreciate if somebody has an idea where I should look first.

Thanks for reading and thanks for any assistance.
 
This might be a coincidence, but 1786ms is quite close to 2^30 processor clock cycles at 600MHz, or 2^28 timer ticks at 150Mhz. Just a wild hunch ... a missed timer somewhere.
 
I believe I also have the same problem as you do. I have been trying to migrate my code from a teensy 3.6 to 4.1 and I also started having these freezes for about 1700ms at seemingly random places in the code. In my project the teensy is sending and receiving data over UART and Ethernet. The frequency of the freezes are similar to yours, however sometimes it happens several times in a rather short time and after that it might be working fine for a couple or hours before it happens again.

My original thought when I started finding these freezes was that something was happening with the Native Ethernet library, since that was the only change in the code I did when I switched from the 3.6 and 4.1. So I changed back to using the wiz850io I used with the teensy 3.6, but the freezing persisted. The exact same code will run just fine on the teensy 3.6 but freezes randomly on the 4.1.

I have been trying to find where in the code the freeze occur by checking how long an operation takes by checking the micros() before and after, but I have not been able to find a specific place, but it seems to be random. There could of course be something in my code causing these freezes for me, but since the code has been tested extensively on the teensy 3.6 I don't believe that is the case.

I would also be thankful for any insight into this problem and any ideas of what I can do to figure out what is causing this.
 
Posting the complete software here would at this stage not be helpful, because of its length, particularly as most subroutines are not even running/called, when the freeze happens.

Does the problem happen with smaller programs?
 
This might be a coincidence, but 1786ms is quite close to 2^30 processor clock cycles at 600MHz, or 2^28 timer ticks at 150Mhz. Just a wild hunch ... a missed timer somewhere.

I'm fighting a similar issue on T4.1, same freeze, happening randomly about 1.8 s for 600 MHz clock but 2.8 at a MCU clock of 396 MHz.
Freeze was observed with I2S-DMA that runs at elevated priority. visual symptoms are: Led stopped flickering and DMA count dropped and other metrics changed.
I2S data rate is elevated (750 Hz), processing inside ISR is at most 50%. PC is connected with MTPDisk Serial.
My program is rather complex, and effect is not always reproducible.
The relation to clock speed is interesting, as it indicates that it is not due to a well defined timeout.

My working hypothesis is that somewhere in the system interrupts are disabled while waiting for something. If this is related to a MCU clock counter, then such a freeze could make sense.
So far I have not find the culprit. I completely removed Time (TimeLib) library, as it is based on MCU clock count but it did not eliminate the freeze.
I'm using the following libraries SD, SdFat, SPI, Wire, MTP_t4 and some CMSIS FFT float routines , rest is within sketch folder. All compiled with makefile.

@Paul, is somewhere a list of cores ISR priorities (I know tempmon, which should not have an impact, triggers at priority 0, default is 128, but USB?)

Edit: Actual situation: I corrected the Makefile to use "-mfpu=fpv5-d16" and "-larm_cortexM7lfsp_math" as suggested in cores/Teensy4/Makefile instead of "-mfpu=fpv4-sp-d16" and "-larm_cortexM4lf_math" , which I do not recall where they coming from.
For the time being cannot reproduce the freeze.
 
Last edited:
ANOTHER INSTANCE FOUND:

I made some more measurements with a further stripped down program (removed additionaly for this test : NativeEthernetUdp.h and correlated NTP time setting, which had happened only during Setup())

Firstly, I confirm Captain's observation that the ~1700ms Freeze happens at irregular times. Tonight I saw Freezes at 19:22, 19:32, 20:42, 21:48, 1:09 (3times within a minute - I don't know if the followed each other directly, as I did not (yet) measure this), 1:52 (again 3 times), 4:06, 5:56 and 7:46. Freeze lasted once 1701ms, mostly 1786 - 1789.5 ms.

However, the more interesting part seems this:

I use a short subroutine which measures 2 simple switch positions, followed by an analog reading of a phototransistor (V << Vcc), details see here:


i_SwitchUp+=digitalRead(SwitchUp); ZeitStopA=micros();
i_SwitchDw+=digitalRead(SwitchDw); ZeitStopB=micros();
i_Hell+=analogRead(HellSensor); ZeitStopC=micros();


These values are measured over 8 Loop()s and are then averaged, at which time i_SwitchUp=0; i_SwitchDw=0; i_Hell=0;

It is: SwitchUp=41 (INPUT_PULLUP) SwitchDw=40 (INPUT_PULLUP) HellSensor=A9 (INPUT)


All Freezes, observed last night, happened between ZeitStopB and ZeitStopC, at the time analogRead() was called. I don't believe this is a coincidence but it surely shouldn't be caused by analogRead(), if everything works as it is supposed to do.

This new observation, together with the earlier fact of Freeze often (but not always) observed during client.available() [when the original program still included the complete readout of Philips HueBridges, which each sent around 6000 bytes) might help, at least, I hope.

Regards; Paul's question, if that Freeze happens also happens with smaller programs, I can only say, that I will now, remove part after part of my program, because as it stands, Teensy 4.1 isn't a suitable replacement for Arduino Due.
 
It would be good to have a (short) program that we can try.
This is the code of analogRead:
Code:
int analogRead(uint8_t pin)
{
    if (pin > sizeof(pin_to_channel)) return 0;
    if (calibrating) wait_for_cal();
    uint8_t ch = pin_to_channel[pin];
    if (ch == 255) return 0;
//    printf("%d\n", ch);
//    if (ch > 15) return 0;
    if(!(ch & 0x80)) {
        ADC1_HC0 = ch;
        while (!(ADC1_HS & ADC_HS_COCO0)) ; // wait
        return ADC1_R0;
    } else {
        ADC2_HC0 = ch & 0x7f;
        while (!(ADC2_HS & ADC_HS_COCO0)) ; // wait
        return ADC2_R0;
    }
}
(you can find it in analog.c)
so, indeed there are some waiting loops.
It might be a compiler problem or bug, or an other weired (cpu pipelines?) thing.
I could imagine that the while() continues when an interrupt occurres?
That's a wild shot into the blue...
You could try this (for both (while)
Code:
ADCxxxHC0 = ch;
[COLOR=#b22222]asm ("dsb":::"memory");[/COLOR]
while (!(ADC xxx _HS & ADC_HS_COCO0)) { [COLOR=#b22222]asm ("dsb":::"memory");[/COLOR] }; // wait
I'm courious if that works? It would be really great to have a program ...
 
It would be good to have a (short) program that we can try.
This is the code of analogRead:
Code:
int analogRead(uint8_t pin)
{
    if (pin > sizeof(pin_to_channel)) return 0;
    if (calibrating) wait_for_cal();
    uint8_t ch = pin_to_channel[pin];
    if (ch == 255) return 0;
//    printf("%d\n", ch);
//    if (ch > 15) return 0;
    if(!(ch & 0x80)) {
        ADC1_HC0 = ch;
        while (!(ADC1_HS & ADC_HS_COCO0)) ; // wait
        return ADC1_R0;
    } else {
        ADC2_HC0 = ch & 0x7f;
        while (!(ADC2_HS & ADC_HS_COCO0)) ; // wait
        return ADC2_R0;
    }
}
(you can find it in analog.c)
so, indeed there are some waiting loops.
It might be a compiler problem or bug, or an other weired (cpu pipelines?) thing.
I could imagine that the while() continues when an interrupt occurres?
That's a wild shot into the blue...
You could try this (for both (while)
Code:
ADCxxxHC0 = ch;
[COLOR=#b22222]asm ("dsb":::"memory");[/COLOR]
while (!(ADC xxx _HS & ADC_HS_COCO0)) { [COLOR=#b22222]asm ("dsb":::"memory");[/COLOR] }; // wait
I'm courious if that works? It would be really great to have a program ...

I suspect we need memory barriers and even dsb way more often than we have it now.. perhaps due to the M7 six stage pielines..
 
Frank, I surely will try that, but it would not iron out those Freezes when while(!client.available()) { "wait"} is executed, would it?
However, for the moment, I have simply removed that singe "analogRead()" from my test program altogether - let's see what happens.
 
What versions of TeensyDuino are in use?

Freeze is not a Fault - as some code still running?

Not related to Ethernet as that was removed?

This 'freeze' is generally unseen/unreported behavior. Smaller program would perhaps be shareable and reproducible and also remove some mystery about why it happens.

@mjs513 had one 'similar ??' in a T_4.? Beta - but a reason and fix were found for that - and not recurred. So long ago not sure if it was Beta hardware 4.0 or 4.1 - but it was startup with an SPI display IIRC ...
 
No, but perhaps there are things that influence each other.. these things are hard to predict...

I've seen this more often. I'm not 100% sure, but I think a "volatile" (<- the registeres are declared this way) does NOT mean, do it ASAP. It only means "do it". Its not really a barrier if i get it right. Its a sequence point.
So the write directly before the while may not happen in time.. or still resides in a memory access pipeline..?
(A comment from someone who knows the compiler internals or "C" internals better than me would be apprecciated - perhaps I'm totally wrong with my assumption )

@MichaelMeissner? Ping :)
 
Last edited:
MPU:
The setting looks like this:
Code:
    SCB_MPU_RBAR = 0x40000000 | REGION(i++); // Peripherals
    SCB_MPU_RASR = DEV_NOCACHE | READWRITE | NOEXEC | SIZE_64M;
i see a problem here:
SIZE_64M;
- Shouldn't be the 0x40000000 area much larger? At least the ADC is beyond that.

edit:
No, wrong, i calculated wrong :)
So, the MPU knows about that.
 
Last edited:
What versions of TeensyDuino are in use?

Freeze is not a Fault - as some code still running?

Not related to Ethernet as that was removed?

This 'freeze' is generally unseen/unreported behavior. Smaller program would perhaps be shareable and reproducible and also remove some mystery about why it happens.

I am using Teensyduino 1.8.5 inside IDE 1.8.13. NativeEthernet has (so far) not been removed, because it is an essential part of my current project. As I said, I slowly cutting down my program in order to allow easier fault finding, but that takes time, because the freeze isn't a frequent issue.

Well, finally I consider unexpected delays of significant duration (which I named here Freeze) are a real issue for real time programs, whether we call them faults or "by design" :)
 
I am using Teensyduino 1.8.5 inside IDE 1.8.13. NativeEthernet has (so far) not been removed, because it is an essential part of my current project. As I said, I slowly cutting down my program in order to allow easier fault finding, but that takes time, because the freeze isn't a frequent issue.

Well, finally I consider unexpected delays of significant duration (which I named here Freeze) are a real issue for real time programs, whether we call them faults or "by design" :)

TeensyDuino version reported in the Help / About window of the IDE would show a different number than 1.8.5 or 1.8.13. Current version is 1.53 with 1.54 in Beta.

When there is a "Fault" - that is an unhandled exception that stops any processing.
 
Sorry for my typo, I use 1.53 within 1.8.13.

did that here yesterday too, mistyping TD ver # :) ...

Be assured this isn't in any way being dismissed - just questions to get to the root of the problem. FrankB good to have on the issue.

@mjs513 - can you provide context for that odd Beta behavior with display? It may not relate - but it is the closest thing to this that seems to have happened and it was resolved by some 'simple startup' change ...
 
More tests- still not detecting the origin of those Freezes!

I did another test over night, with these changes:

Removed: Teensy RTC, which was in the past only called at Setup to set the internal clock
Re-Added: <NativeEthernetUdp.h>, used once in setup for setting internal clock - re-added, as earlier removal had made no change

Furthermore, I changed (hopefully I have interpreted Frank's suggestion properly, as I made until now never changes inside the IDE) analog.c inside Programme(x86)/Arduino/hardware/teensy/avr/cores/teensy4/ as follows

Code:
int analogRead(uint8_t pin)
{
	if (pin > sizeof(pin_to_channel)) return 0;
	if (calibrating) wait_for_cal();
	uint8_t ch = pin_to_channel[pin];
	if (ch == 255) return 0;
//	printf("%d\n", ch);
//	if (ch > 15) return 0;
	if(!(ch & 0x80)) {
		ADC1_HC0 = ch;
	        asm ("dsb":::"memory");
		while (!(ADC1_HS & ADC_HS_COCO0)) { asm ("dsb":::"memory"); }; // wait
		return ADC1_R0;
	} else {
		ADC2_HC0 = ch & 0x7f;
                asm ("dsb":::"memory");
		while (!(ADC2_HS & ADC_HS_COCO0)) { asm ("dsb":::"memory"); }; // wait
		return ADC2_R0;
	}
}

There were again freezes at 20:17, 21:05, 22:13, 6:09 (3 freezes, the 2nd 3 loops later, the 3rd then 4 loops after the 2nd freeze), 6:32, 7:12
All freezes were exactly between ZeitStopB and ZeitStopC inside

Code:
ZeitStopB=micros();
  i_Hell+=analogRead(HellSensor);    
  ZeitStopC=micros();



As we can see, there can be very short time differences or very long between freezes.

Unfortunately, I do not understand Frank's addition to analog.c, hence I need your assistance, what we can learn from this additional test. Apologies for slow the slow progress in stripping down my program, but rare freezes are difficult to pin down.
 
Additional info to the event of 3 freezes, close to each other [6:09 (3 freezes, the 2nd 3 loops later, the 3rd then 4 loops after the 2nd freeze)]

First Freeze: the Loop(), which showed the 1st Freeze ran less than 1ms (time resolution 1ms), before analogRead was called - the sub checking 2 switches and that single analog Signal happens last inside my loop. This 1st Loop also included one call to the Server within my program - serving time 48us
2nd & 3 Freeze: Loop() before Freeze also less than 1ms, no client connected and no external server called by NativeEthernet

During this time was also no external Interrupt served by my program. The timer interrupt read the 9600baud KNX Bus 59, or 60 times (as expected, as period=30ms) with an average run time of less than 120us during this specific event.
 
Micros doesn't play with interrupts - it does do one interrupt restart check to do atomic read of two DWORDS - assuming the micros() was just to watch timing?

What if those are replaced with ZeitStopB=millis() or ZeitStopB=ARM_DWT_CYCCNT in both places?
Code:
  ZeitStopB=micros();
  i_Hell+=analogRead(HellSensor);    
  ZeitStopC=micros();

Second thing with current micros() code in place - is that calculation of i_Hell critical? Replace the analogRead(HellSensor); with a digitalReadFast( SomePin ); perhaps with the pin held HIGH and i_Hell+=digitalReadFast( SomePin ) * 'expected value';

Those checks would rule out micros() and if somehow it still fails with digitalRead() that would show this to be a side effect of a problem in another area ... maybe something similar tried as TLDR ...
 
I can easily replace ZeitStopX with millis(), as it was purely use for time watching. i_Hell is not essential, so I can play with it. However, as I do not fully understand your suggestion, I like to repeat it:

I connect A9 to Vcc; replace i_Hell+=analogRead(A9); with i_Hell+=digitalReadFast(A9); - I am unclear, what you meant with * 'expected value'. Could you kindly clarify.

Thanks for your assistance!

and in order to prevent any misunderstanding: i_Hell is not measured inside my Timer ISR, just as last action of Loop()
 
Last edited:
Since I_Hell not essential it may not matter.

But the suggestion - if the analogRead() was to return 888 as the 'expected value' then use : i_Hell+=digitalReadFast( SomePin ) * 888;

Just so the alteration doesn't result in some other side effect that interferes with the test.
 
Ok if dsb does not work it must be an other effect.

What are the types of ZeitStopABC, and how do you calc the difference?
 
Status
Not open for further replies.
Back
Top