Teensy 4.1 Freeze (~1786ms)

Status
Not open for further replies.
a) Frac > 1000? what if #define SHOW_CLOCK_CARAT 1
IntervalTimer clocked100ms;
? it looses usecs. (under special circumstances only - can happen with long time disabled interrupts)
b) uint32_t usec = 1000*smc + frac; -> smc is 32 bit, too and counts endless. so it can be very high and then get multiplied with 1000.
As said, not really an important issue, and very unlikely to be the reason for the freeze.

Maybe the trouble was long interrupt off caused a miss of a whole millis tick at slow F_CPU? micros() would advance and be accurate then missed millis tick would result in backing up a whole millisecond on next TICK update?

It would take a long miss of Millis_Tick to overflow twice - single overflow apparently happens at slow/24 MHz or so as some issues were found (in a thread somewhere) so Paul added that conditional to the original code and that corrected that 'problem' with micros() - at least not jumping ahead of millis() and returning confusing values that was somehow breaking something as posted back then. When millis tick is missed the micros() could be right - but this 'fixes' that - didn't seem to introduce any other issues - except anything based off of micros() [user polling or elapsedMicros] will suffer when millis() gets behind for some rounded down clock ticks when it doesn't have to. But at full 600MHz speed millis never seemed to get behind ... only under 50 Mhz or somewhere did it happen.

... above was thinking outloud ... it only was reported at slow MHz and in looking at the change it seemed odd - but this corruption of micros could cause issues in use relying on micros() for us timing ... the real fix might be "if (frac > 2000) call MillisTick ISR to catch it up"
 
One other totally RANDOM easy to test in sketch define and override the one 'background' task that runs outside loop() and on any delay() or yield():: void yield() {}

Are both sketches compiled default 'Faster'? Does it change going 'Debug'? Many other optimizations still untested or validated for general use AFAIK - and were seen to take a working FASTER sketch into trouble.
Clearly something causing this freeze - GCC or odd MCU setting/use or _isr() conflict/abuse?
> Need to find a way to cause or define it to understand the trigger/cause?
> Need a decent repro case 'others' can run - but with nothing over common between these two except the Specific COUNT observed ????

Let use this thread to find the reason for the freeze.
- indeed - noted a new thread would be proper ... and it might - or finding the prior thread that lead to the micros() change. Writing that post brought out the trouble it could cause as written.
>> Not sure where the rest of this is - but here is mention in thread : Teensyduino-1-49-Beta-3
 
I'm not necessarily pointing to to issue, but I do not understand
Code:
	do {
		__LDREXW(&systick_safe_read);
		smc = systick_millis_count;
		scc = systick_cycle_count;
	} while ( __STREXW(1, &systick_safe_read));
So, we do an exclusive load of systick_save_read (acquiring a lock, right?), but do not do anything with it, the value can be everything
if micros() is called again from an interrupting thread after acquiring the lock what would happen?
Is the value in systick_save_read really of no importance?
 
Hi Walter,
it's not a lock, it more an "interrupt detector".
If an (higher priority)interrupts occurs, the loop runs again, that's all.
 
As it was asked, I was compiling with -O2 and is now -Os

and my yield was
Code:
void yield(void) {}
but is now
Code:
void yield(void) 
{	// USB Serail - Add hack to minimize impact...
	if (yield_active_check_flags & YIELD_CHECK_USB_SERIAL) {
		if (Serial.available()) serialEvent();
		yield_active_check_flags &= ~YIELD_CHECK_USB_SERIAL;
	}
}
as I'm not using eventresponder
but this did not change behavior

My only change the made so far an impact was to use the proper (for T4.1) floating point compiler flag and library, but this may have only implicit changed the issue as the generated code is now different.
I think I said before, I had another coding issue where there were conflicting function definitions, however the compiler did not complain, so I did detect it only lately.
I had no 1.7s freeze for the last week, so I must have something done that removed the potential issue.
 
TEST, program compiled with CPU Speed: 396MHz, Optimize: Faster
Test used the very same program as yesterday afternoon (v149 w/o OLED affecting subs)

Summary, there have been now Freezes of 2033ms duration (at CPU 600MHz they were ~1786ms), located at several new positions

In detail:

Test Start: 16:49
19:53 First Freeze, inside a successful check, if a client had connected. The connection was detected within 10us. The short info, sent by another Arduino Due, simply requested some information. The process was successfully done, then client.close() and client,stop() executed. There are several while(!client.available()) inside this routine, but there has been no further time stamps inside this routine to pin-point the exact loaction.

I assume this Freeze is similar to those observed earlier in the testing sequence, although they appeared when the routine connected as a client to another server.

20:47, Second Freeze, between Zeit[6] and Zeit[7], inside subroutine „processWaterValveControl()” – the first time I saw a Freeze at this location. This routine calls several other routines, which are all unchanged from their successful use inside an Arduino Mega2560. Some of these routines use time comparisons with millis(), some use additionally short delay() s. However, this whole sequence of routines would only run for more than 1ms (but well below 2033ms) if there was a request by a user controlled switch, which wasn’t here the case. There was also no printing to USBSerial from any of these routines during this Freeze. I can easily remove this whole bunch of routines for further tests.

13 Loop() later, the Third Freeze, again at a different (never seen before) location, between Zeit[0] and Zeit[1]. During this time setToogleBits() is running, a routine which reads minute() and hour() of library <TimeLib.h>, and checks if a minute or hour have passed. There is neither a use of millis(), micros() or USBSerial inside my code, but I assume within TimeLib.h

13 Loop() later – again – the Fourth Freeze, during the time a check was done, if a client had connected – just as Freeze 1 above, BUT this time no client was detected, therefore no while was executed inside my code. However, some of these requests, which were executed, might still do this:
Code:
Ethernet.setRetransmissionCount(2) 
Ethernet.setRetransmissionTimeout(50); 
EthernetClient client = server.available();
client.setConnectionTimeout(100);

22:22, Fifth Freeze, between Zeit[33] and Zeit[34], this time inside poll_Switches_and_Hell(false), the routine, which has at present still analogRead replaced by digitalReadFast The Freeze happened between ZeitStopE and ZeitStop, see:
Code:
ZeitStopE=millis();
 if(micros()-t1>100) {Serial.print("dt_us>100(Inputs):"); Serial.println(micros()-t1);}
 ZeitStop=millis();

11 Loop() later – Freeze Six: identical to Freeze 3, above.

13 Loop() later, Freeze Seven: again a new one, this time between Zeit[5] and Zeit[6], inside checkTimedActions(showTimerControl), a routine which makes heavy use of millis(). In addition micros() are read at the start and the end of this routine, reporting a correct delay of 2033386us, whilst Zeit[6]-Zeit[5] = 2033ms report

00:00 Freeze 8: inside check for client, as Freeze 1 = client check

7:32 Freeze 9, identical to Freeze 1 = client check

7:45 Freeze 10, identical to Freeze 1 = client check

TestStop 9:46


SUMMARY:
The length of a Freeze is constant but depends obviously on CPU Speed. Somehow, everything seems to point to internal time keeping, be it micros(), millis(), elapsedMillis(), elapsedMicros(). Of course, I have used millis() and very rarely micros() in my code, at positions Freezes occur, but other Freezes show refernces to libraries, which also might use millis() at such points.

During above reported Freezes, only the Timer ISR (30ms) was active, the external Interrupt had not been called.



Whilst I do not rule out, that those Freezes are caused by my user code, I am becoming doubtful that this is the true case, seeing now Freeze Time dependent on CPU Speed.


My next step:
I will now run a further test with the identical program, but compiled with CPU Speed: 600MHz and now with Optimze: Debug. Furthermore I will prepare another test, which will use a program which has
(1) removed all user requests to micros()
(2) replaced my two ISRs with simple polling inside Loop().
 
Well, I must have misunderstood the purpose of Optimize "Debug", because I cannot upload code, compiled this way. my program compiles successfully, then starts uploading, which is confirmed as "completed" after a very, very short time, but at this stage, Teensy 4.1 USBSerial is no longer listed, or does my program is being executed. I pressed Teensy's button to return to Blink, and tried it a second time. Same result. I need to cancel this intended test until I know, what I am doing wrongly.
 
Test after removing all use of micros(), either replaced with millis() or commented out - otherwise identical to yesterday
Again compiled with CPU Speed: 600MHz, Optimize: Faster

I appreciate, it might be preliminary, as we have seen long periods without Freeze, but so far, I had 5 hours without any sort of Freeze. Maybe the culprit is indeed micros()? Anyway, I thought I let you already know - tomorrow morning, we should know better.
 
The only time keeping is based on millis() reference systick that is a 1KHz interrupt that does:
Code:
extern "C" void systick_isr(void)
{
	systick_cycle_count = ARM_DWT_CYCCNT;
	[B]systick_millis_count++;[/B]
}
The CYCCNT is recorded there to allow micros() to computer the offset to the last systick update. And micros just copies those two values - p#54 to calc us from those ms values.

Interesting if dropping micros() changes anything - as long as replacement doesn't remove code or actual trouble ... though just tracking the time shouldn't ...

Hoe many instances of micros() were there - where were they all located?

Did "DSB" get put in all the _ISR()'s as noted earlier?

The MCU detect of interrupt to redo the p#54 "__LDREXW" seems safe and simple and has generally been working as used since T_4.0 Beta period.
 
I posted on another thread, but I am also having a freeze that I can reproduce very easily.

I have modified SendReceiveString very slightly and if I run this program and then use a simple program like SocketTest to send random strings (I just type a few chars and hit enter) then the teensy dies if I send a empty string (Just hit enter and send an empty string). The Green LED stops flashing and I have to reset using the button on the teensy.

Here is the SocketTest Output and the SerialMonitor output.

SocketTest.jpgSerialMonitor.jpg

You can see the teensy gets the "SDSS" string and thereafter freezes.

Simon
 

Attachments

  • UDPSendReceiveString.ino
    3.7 KB · Views: 56
I want to run this test. I see this in the comments:

Code:
 A Processing sketch is included at the end of file that can be used to send
 and received messages for testing with a computer.

Can you point me to the copy of that Processing program, or whatever else you're using on the PC side to send the UDP packets and receive the replied?
 
/*
Processing sketch to run with this example
=====================================================

// Processing UDP example to send and receive string data from Arduino
// press any key to send the "Hello Arduino" message


import hypermedia.net.*;

UDP udp; // define the UDP object


void setup() {
udp = new UDP( this, 6000 ); // create a new datagram connection on port 6000
//udp.log( true ); // <-- printout the connection activity
udp.listen( true ); // and wait for incoming message
}

void draw()
{
}

void keyPressed() {
String ip = "192.168.1.177"; // the remote IP address
int port = 8888; // the destination port

udp.send("Hello World", ip, port ); // the message to send

}

void receive( byte[] data ) { // <-- default handler
//void receive( byte[] data, String ip, int port ) { // <-- extended handler

for(int i=0; i < data.length; i++)
print(char(data));
println();
}
*/

It's in the original NativeEthernet SendReceiveUDPString sketch. I cut it out as I was not using it, but I used SocketTest as defragster mentioned.
 
I posted on another thread, but I am also having a freeze that I can reproduce very easily.
however this seems a permanent freeze, while OP is a temporary freeze (1.7 s). Let's not diverge discussion.
we need to know the origin of the CPU clock dependent freeze of 2^30 CPU cycles (1.7s at 600 MHz), even if it is randomly and difficult to reproduce.
 
Test after removing all use of micros(), either replaced with millis() or commented out - otherwise identical to yesterday
Again compiled with CPU Speed: 600MHz, Optimize: Faster

I appreciate, it might be preliminary, as we have seen long periods without Freeze, but so far, I had 5 hours without any sort of Freeze. Maybe the culprit is indeed micros()? Anyway, I thought I let you already know - tomorrow morning, we should know better.

Yes, it was preliminary :( running the test over night resulted in 8 Freezes of 1789ms, just as before - I shortly will list their exact locations. However, we can say, removing micros() from user code had no effect.

However, what puzzles me most, and somehow I think (but do not know) that these two facts should give our best clues:
(1) The Freeze duration changed with CPU Speed - how could a user code do this at all
(2) The 30ms Timer ISR is still working - both in expected execution time and frequency - I measured using digitalWrite() inside the code with a connected Arduino Due
(3) when analogRead() was included the sketch froze (in all instances seen - okay, might not be always) exactly at this point in time

I continue to strip down my program to get us more facts. Thanks to everybody for their assistance.
 
.......

Did "DSB" get put in all the _ISR()'s as noted earlier?

No, that isn't done, because I do not understand, what I exactly need to do - please advise.

At present I prepare a program version which replaces both ISRs (the external interrupt is rarely triggered and was definitely not during any of those Freezes, I have seen. Nevertheless, I replace both with polling inside the Loop, ignoring that I might loose some events.
 
however this seems a permanent freeze, while OP is a temporary freeze (1.7 s). Let's not diverge discussion.
we need to know the origin of the CPU clock dependent freeze of 2^30 CPU cycles (1.7s at 600 MHz), even if it is randomly and difficult to reproduce.


Agreed.
So, what do we know:

- The duration is always the same.
- It's dependend on the CPU Clock.
- It seems to happen after an "event" - Udo showed that it happened a few lines later in his code when he removed analoRead.

I'd say we can be sure that it is either an interrupt or a very weired hardware issue.

The CPU Clock depended duration says:
- We can rule out everything that is not clocked by the ARM / IPG Clock. All other PLLs/PFDs/Frequencies are independend of this clock.
- IF the reason is somehow connected to the hardware, it's clock must be this PLL.
- The duration can not have it's reason "outside" the Teensy (like a pin-trigger, power problem etc..). (But its trigger can)


Then, we know:
It must be an extremely rare condition - otherwise we would have seen more reports.


So, what is CPU clock dependend?
- Waiting loops.
- Timers that are clocked/triggered by PLL1
- the ARM cycle counter
- ? Other things?
 
Last edited:
Good summary
Agreed.
So, what do we know:

- The duration is always the same.
- It's depended on the CPU Clock.
- It seems to happen after an "event" - Udos showed that it happened a few lines later in his code when he removed analoRead.

I'd say we can be sure that it is either an interrupt or a very weired hardware issue.

The CPU Clock depended duration says:
- We can rule out everything that is not clocked by the ARM / IPG Clock. All other PLLS/PFD/Frequencies are independend of this clock.
- IF the reason is somehow connected to the hardware, it's clock must be this PLL.
- The duration can not have it's reason be "outside" the Teensy (like a pin-trigger, power problem etc..). (But it's trigger can)


Then, we know:
It must be an extremely rare condition - otherwise we would have seen more reports.


So, what is CPU clock dependend?
- Waiting loops.
- Timers that are clocked/triggered by PLL1
- the ARM Cycle counter
- ? Other things?
 
We know a little more:
- Even if the freeze happens, the Teensy measures its time more or less correctly. So, time measuring and systick is not influenced. Its interrupt is not influenced and keeps running.
-> it can't be something with even higher priority (NMI / faults..)

Is this correct?

Maybe we should scan the core-code to find places where the cycle counter is used?
 
TestResults – Test after removing all micros() from user code (v151)
TestStart: 13:26 March 11

19:58 Freeze 1 – between Zeit[5] and [6] – covering sketch checkTimedActions() – seen before
This sketch includes many checks against millis(), like blinking lights or detecting length of actions – nothing time critical. No further time markers inside -> exact location of Freeze unknown. Zeit(6] – Zeit[5], measured outside this sketch & duration measurement inside this sketch indicate Freeze.

20:55 Freeze 2 - between Z[6] and Z[32] – covering processWaterValveControl() – seen before
(Zeit[7] to Zeit Zeit[31] are only reached at full minute, in this Freeze covering Loop not the case.
Sketch includes checks against millis(), see yesterday

00:05 Freeze 3 – identical to Freeze 2


00:11 Freeze 4 – identical to Freeze 1

21 Loop()s later Freeze 5, identical to Freeze 4

21 Loop()S later Freeze 6, identical to Freeze 4


**********
Whenever there are multiple Freezes inside a minute (=close together), there have have been exactly THREE, and the seem always just 15 to 25 Loops later. Average Freeze-free Loop() time is 22.3us
Why always THREE?

***********

2:47 Freeze 7, identical to Freeze 4

04:03 Freeze 8, identical to Freeze 4

04:41 Freeze 9, identical to Freeze 1!


07:30 Freeze 10, identical to Freeze 1!

21 Loop()s !!!!!! later, Freeze 11, identical to 1!

22 Loop()s later, Freeze 12, identical to 4!



TEST END 07:42

**********
We see again, that Freezes seem to happen at very specific, repetitive places. This has been the case with all different sketch versions. Considering that the Timer ISR runs with a period of 30ms, whilst Loop() times are more random, this seems very puzzling.
 
I do not understand this code...
Code:
    if (last) {
        last->next = (uint32_t)transfer;
        if (USB1_ENDPTPRIME & epmask) goto end;
        //digitalWriteFast(2, HIGH);
        //ret |= 0x01;
[COLOR=#0000ff]        uint32_t status, cyccnt=ARM_DWT_CYCCNT;
        do {
            USB1_USBCMD |= USB_USBCMD_ATDTW;
            status = USB1_ENDPTSTATUS;
        } while (!(USB1_USBCMD & USB_USBCMD_ATDTW) && (ARM_DWT_CYCCNT - cyccnt < 2400));[/COLOR]
        //USB1_USBCMD &= ~USB_USBCMD_ATDTW;
        if (status & epmask) goto end;
        //ret |= 0x02;
    }
It is in usb.c, schedule_transfer().Before the loop, interrupts get disabled.

What does the loop do?
Seems to be some polling? it repeatetly writes a bit, then waits with a timeout?

USB_USBCMD_ATDTW is :
#define USB_USBCMD_ATDTW ((uint32_t)(1<<14))
 
Last edited:
No, that isn't done, because I do not understand, what I exactly need to do - please advise.

At present I prepare a program version which replaces both ISRs (the external interrupt is rarely triggered and was definitely not during any of those Freezes, I have seen. Nevertheless, I replace both with polling inside the Loop, ignoring that I might loose some events.

Line provided before by Frank B can be placed at the end of all the ISR()'s :: asm ("dsb":::"memory");

Good to see there isn't an indication micros() causes the trouble - would have been a stunner.

Good questions Frank B - no answers. Can it be run with USB Serial Disabled? Send all output to Serial1 and use a second Teensy running a SerialPassthrough.ino

opps - didn't hit post ...
Went to the current copies of NativeEthernet and FNET from git and now have net connect with UDPSendReceiveString.ino - though it doesn't accept the connect? That may be a distraction ... couldn't find th eother thread referred to ...

Opps again - finally posted and refresh shows Frank's USB finding ....
 
It it possible that it has something to do with USB?
A print, perhaps?

A typical print out of my program during a 60sec period is this:

Code:
---------------------------------------------------------------------------KNXIP Teensy v151	10:46:00 12.3.2021 Hell:1023 S_Top:0 S_Bot:1

	PCC: 107		addC:0	client.close done	t(PC_ms)): 0


with the second line only appearing, when a client has connected to the Teensy 4.1. In addition, so far, to my recollection, I have not seen a Freeze inside a Loop(), which printed line 1. In actual fact, I believe, most Freezes happened in Loop()s, which had no USBSerial output at all. I will check that in future more closely.

----> at present I continue to prepare and then intend to run a sketch with no user ISR available. After that, depending on further thoughts of you experts, I will modify my ISR and prepare printout using Serial5 (Serial1 and Serial 2 are already in use)
 
Status
Not open for further replies.
Back
Top