Teensy 4.1 Freeze (~1786ms)

Status
Not open for further replies.
....
Went to the current copies of NativeEthernet and FNET from git and now have net connect with UDPSendReceiveString.ino - though it doesn't accept the connect? That may be a distraction ... couldn't find th eother thread referred to ....

Just to make certain - in case it might be important: I initially used NativeEthernet as included in Teensyduino 1.8.5, but later replaced it with the latest version 1.0.5 from GitHub. It made no difference to the Freeze.
 
I posted on another thread, but I am also having a freeze that I can reproduce very easily.

I have modified SendReceiveString very slightly and if I run this program and then use a simple program like SocketTest to send random strings (I just type a few chars and hit enter) then the teensy dies if I send a empty string (Just hit enter and send an empty string). The Green LED stops flashing and I have to reset using the button on the teensy.

Here is the SocketTest Output and the SerialMonitor output.

View attachment 24023View attachment 24024

You can see the teensy gets the "SDSS" string and thereafter freezes.

Simon

This permanent freeze is probably not related to the same problem. I had this freeze earlier. https://forum.pjrc.com/threads/66306-Teensy-4-1-freezes-when-receiving-UDP-packets-with-length-0

The ~1786ms freeze is still happening in my code even after getting rid of the permanent freeze.
 
Last edited:
It it possible that it has something to do with USB?
A print, perhaps?

To be completely honest,
I had always issues with USB after T3.2 times.
In my freeze case I thought it maybe PC, so I run program from battery, disconnecting PC but wrote log to uSD. 1.7s freeze continued to occur.
If USB, then it is the Teensy implementation.
Yes, I saw the schedule_transfer(), but understanding USB is for someone younger than me.
 
We know a little more:
- Even if the freeze happens, the Teensy measures its time more or less correctly. So, time measuring and systick is not influenced. Its interrupt is not influenced and keeps running.
-> it can't be something with even higher priority (NMI / faults..)

Is this correct?

Maybe we should scan the core-code to find places where the cycle counter is used?

Yes, that is correct, even micros() gives the correct answer (i.e. consistent with freeze duration)
 
Code:
[COLOR=#0000ff] } while (!(USB1_USBCMD & USB_USBCMD_ATDTW) && (ARM_DWT_CYCCNT - cyccnt < 2400));
[/COLOR]


I tried this with godbolt
- if the first condition in false, it does not check the timeout. It plays no role that ARM_DWT_CYCCNT is volatile. Thought that.
But is this while correct and intended?
Have to read the manual... but USB is not a thing i understand...


edit:
ATDTW: Add dTD TripWire - Read/Write. [device mode only]
This bit is used as a semaphore to ensure proper addition of a new dTD to an active (primed) endpoint's
linked list. This bit is set and cleared by software.
This bit would also be cleared by hardware when state machine is hazard region for which adding a dTD
to a primed endpoint may go unrecognized.
What does that mean: "state machine is hazard region"?
So, (if the bit is zero && no timeout) , repeat. Correct?
This seems ok (not knowing what it means)

But it seems to ignore the case when a timeout happened:
Code:
    if (last) {
        last->next = (uint32_t)transfer;
        if (USB1_ENDPTPRIME & epmask) goto end;
        //digitalWriteFast(2, HIGH);
        //ret |= 0x01;
        uint32_t status, cyccnt=ARM_DWT_CYCCNT;
        do {
            USB1_USBCMD |= USB_USBCMD_ATDTW;
            status = USB1_ENDPTSTATUS;
        } while (!(USB1_USBCMD & USB_USBCMD_ATDTW) && (ARM_DWT_CYCCNT - cyccnt < 2400));
        //USB1_USBCMD &= ~USB_USBCMD_ATDTW;
        if (status & epmask) goto end;
        //ret |= 0x02;
    }
    //digitalWriteFast(4, HIGH);
    endpoint->next = (uint32_t)transfer;
    endpoint->status = 0;
    USB1_ENDPTPRIME |= epmask;
    endpoint->first_transfer = transfer;
end:
    endpoint->last_transfer = transfer;
    __enable_irq();

?? what do you think?

In the meantime, another user reported the 1.7 secs freeze via PM.
 
Last edited:
Test v152 Sketch as v151 BUT both 30ms Timer ISR and external Int433 ISR replaced by polling inside the Loop = no USER CODE INTERRUPTS

Whilst I will report a little later my detailed analysis, here the important result: There are still 1787ms Freezes & if several happen within the same minute, then they are exactly 3, no more, no less.

Some thoughts:
Considering that the execution time of Loop() is rather short, if nothing happens (in the end this is a home automation program), we have a repetition of the same conditions over and over. However, the execution time of Loop() isn't exactly constant, because Serial1 reads the 9600baud KNX Bus, and that Bus has periods of less or more activity, hence Loop() execution time will jitter. And, I know this is speculation, if that jitter is large enough, the coincidence, causing that Freeze is no longer met, hence we have a single event, if the jitter is large enough or added over a small number of Loop()s {hence repetition is always approx. 15-20 Loop() later}, the condition has been left.

Finally, may I ask, if we understood why the Freeze changed to 2033ms when CPU speed was set to 396MHz? I had expected we would see an increase by 600/396, but I guess that was naive thinking
 
Detailed Review of Test v152 = v151 but no USER ISRs, see above

Note: please note time stamps, like Zt[x] or Zeit[x] are always recorded inside the Loop, i.e. before and after a given and then mentioned subroutine is called, hence the Freeze might have taken place directly before a subroutine was branched to or immediately after leaving it. Maybe, the Freeze might be related to the stack, which control such branching, but then this must be a well tested feature and might not be able to explain the constant time of these Freezes. However, I have also recorded times inside subs, recording Freezes, i.e. whilst subroutines did run.

From those details following, particularly after Freeze 8, makes me think, that Freezes can occur everywhere and that we see a more often Freezes at certain locations, might simply be a coincidence, caused by the longer execution time of such subroutine.

Below shows also (again) that there can be very little or many hours between Freezes, so these Freezes might have been easily overseen in the past.


TestStart: 12:47

19:17 Freeze 1: an interesting location, as inside the routine Int433() which is no longer executed when an external Interrupt, triggered by a connected Arduino Micro Pro occurs. It is in v152 polled inside Loop(), checking Serial2 (115200baud). It includes while(Serial2.available()), but there was nothing available or if less than 60 bytes, at which time an error had otherwise occurred and printed out.

20:44 Freeze 2: identical to 1


22:43 Freeze 3: identical to 1

11 Loop()s later, Freeze 4, between Zeit[3] & Zeit[4], a new position! Here is merely a KNX value checked to see if a given motion sensor should be served. If this would be the case, a KNX command would be executed together with an Serial.print message, indicating this. As no Serial.print was recorded, the whole sub was merely

Code:
if(Parameter[45]==0) { “this part was not executed, as it includes the Serial.print”}

17 Loop()s later, Freeze 5, between Zt[1] & Zeit[1] – here Loop() checks if a minute or hour has passed, using <TimeLib.h>. This type of Freeze has been seen before


00:28 Freeze 6, between Z[33] & Z[34], here the subroutine checks two switches and includes the digitalReadFast (which should rather be the analogRead) – a common position

02:40 Freeze 7, identical to 1, as it seems a preferred location – a location which checks Serial2


04:11 Freeze 8, a new and very interesting one. It happens between StartTimeLoop and Zt[0], see here:

Code:
void loop() 
{  
    StartTimeLoop=millis();  
    digitalWrite(LoopIndicator,HIGH);
    LoopCounter++;
    AnzahlInt433LastLoop=0;
Zt[0]=millis();


05:28 Freeze 9: as Freeze 1

12:11 Freeze 10, firstly, identical to Freeze 5.

Test Stop 13:09


I prepare now a further version which will write to Serial5, read by an Arduino Due, instead of using USBSerial

Thanks for everybody's patience!
 
Test v154 = v152 (=yesterday), but USBSerial replaced with Serial5

Version v154 replaced every "Serial." with "Serial5." in v152. Serial5 is set to run at 250000baud and connected to a Arduino Due, which reads Serial5 on its Serial1 RX, then passes this data to Due's Serial output, which then is recorded on a PC. Maybe not necessary, as Serial.begin() has been removed, but as not known to me, I have also removed the physical connection between the PCs USB connector and Teensy 4.1's USB connector and replaced it with a power only connection using a power supply with USB connector.

Test Start 15:00 (13.March)
Test End 08:15 (14.March) -> Test Time 17h15m

Test Result: not a single FREEZE

Well, we have seen, that it can take many hours until a Freeze takes place, but there was never such long time without Freezes, which seem to me a strong indication, that those assumptions, that maybe Teensy 4.1' USB implementation is cause of these Freezes, are indeed correct and hence the USBSerial implementation needs some correction.

I will now make a further test, which will put back all features, I have removed in the sequence of all these tests, I reported here, with the exception of USBSerial. This will include both ISRs, analogRead, SPI driven OLDED and longer lasting Ethernet client connections. If USBSerial is the cause, then Freezes should not reappear.

Back to the experts! Their help is now essential and will be highly appreciated.
 
Paul is the one with the hardware to monitor USB in any way and the experience in writing the code.

For that to work would take a repro case to run/examine.
 
Paul is the one with the hardware to monitor USB in any way and the experience in writing the code.

For that to work would take a repro case to run/examine.

The question is if he reads here...?
But it's difficult to know what happens where.
I think we know now that it is somehow connected with usb..
 
No doubt he reads here - his name is one of 60 folks who have read the thread - and that doesn't count any stealth Mega_Admin views.

But without posted repro it couldn't be looked at if there was all the time in the world ...


What is unique about the USB usage in this code? Are there no prints in _ISR()'s - just in loop or loop called func()'s?

Any incoming USB Serial data? Any usage of USB Host?
 
I try since yesterday to generate stand-alone freeze example, but no success. All too deterministic.

Unfortunately my freeze application, which does not freeze anymore (fortunately)
uses as HW
custom ADC (custom dual port I2S)
custom 6-uSD card (SPI)
SPI is modified to allow selected dedicated mode

I have two types of ISR
fast (750/sec) for acoustic data (I2S driven)
slow (10/sec) for compass (Timer driven)
data log is written once/second to Serial and to SDIO disk
Freeze occurred also when slow ISR (Timer based) was disabled.
All I2S call micros(), they also called now(), which is replaced by rtc_get()

Loop checks for incoming data (some sort of menu) but no data sent
no usage of usb host
PC Wind10
 
No doubt he reads here - his name is one of 60 folks who have read the thread - and that doesn't count any stealth Mega_Admin views.

But without posted repro it couldn't be looked at if there was all the time in the world ...


What is unique about the USB usage in this code? Are there no prints in _ISR()'s - just in loop or loop called func()'s?

Any incoming USB Serial data? Any usage of USB Host?

I don't believe there is anything special about my sketch, I merely saw this rare Freeze once and then tried to pin it down. And as my tests without any user ISR has shown, Freeze is not related to a user code based ISR. I also have shown, and recorded in this thread, that there is very little USBSerial writing in my sketch, except after a loop(), which lasted more than 1700us. I have also shown that the length of the Freeze depends on selected CPU speed, once more, I think, an indication, that this Freeze is not related to any user code.

Nothing is written through USBSerial throughout my overnight tests, but USBSerial remains connected to my PC, running Windows 10 Pro Version 2004. The PC uses an i7-7700, which is also receiving data from serval IP cameras, which makes it's CPU working between 50% and 100% regularly. I have no idea if that is of any relevance, considering there is little to none USBSerial user output.

In addition there are several other Arduino Controllers (Mega & Due) connected to this PC, meaning at times there might be several copies of Arduino IDE in use. However, that is not always the case and in any case should not affect a Teensy 4.1, which is not creating user printing.

What else do we need to know? I am more than happy to assist.



I don't understand what is meant with "Any usage of USB Host?", so I guess, I don't use it. My original sketch wrote out through Serial, could receive commands through Serial (albeit no commands were issued throughout my overnight test), it reads Serial1 at 9600baud continuously, sends rarely, but sends short commands through Serial1, and it reads from time to time Serial2 at 115200baud. In addition it includes a server functionality and it acts as clients, although this happens in my user code always in sequence. I also listed all libraries, I was using, all of them with the exception of EEPROM, NativeEthernet and TimeLib were removed for testing without affecting Freeze.

I am very happy to provide my code, just as I already did with Frank, but I cannot provide it here with some significant rework, as it includes too sensitive Information about my home automation. Finally, as this thread has always shown, I am not the only one, who has seen this specific Freeze.

Anyway, I am preparing at present another hardware, which might allow me to generate a more simple test sketch.
 
That's the CPU usage, mentioned above

Screenshot 2021-03-14 11.23.10.png
 
Yes, I looked over your code - but it's too long and needs your hardware, so I can't test much..
It's difficult to strip it down to a short example.

An other user reported the freezes with T4 VGA DOOM. Not sure if it's the same freeze.. I'll try to run that.
 
Yes, I looked over your code - but it's too long and needs your hardware, so I can't test much..
It's difficult to strip it down to a short example.

I appreciate that, and therefore I am building a second hardware and a shorter sketch, albeit I don't know yet, if that will allow me to create this Freeze. However, at least I can provide it with the same in- and output. It will take a little time, so please stay patient.
 
I appreciate that, and therefore I am building a second hardware and a shorter sketch, albeit I don't know yet, if that will allow me to create this Freeze. However, at least I can provide it with the same in- and output. It will take a little time, so please stay patient.

I can confirm the freeze with DOOM.
Could not measure so far, but quite possible that it is 1.7 seconds.
It happens repeatedly _before_ the game starts, on the wad-file selection screen.

Doom uses USB, but does not print anything in this state.
It uses the USB-Host, too. But it happens without a USB-Device connected.
It happens without long wait. I thought it was 3 times, but it seems to be more often (just wait longer)

The VGA loses the sync which is triggered by an interrupt (Priority 128)
The interrupt seems not to be called in this time, which means there is either an higher or same priority interrupt causing this.
 
Interesting:
Code:
// At very slow CPU speeds, the OCRAM just isn't fast enough for
// USB to work reliably.  But the precious/limited DTCM is.  So
// as an ugly workaround, undefine DMAMEM so all buffers which
// would normally be allocated in OCRAM are placed in DTCM.
#if [COLOR=#ff0000]1 || [/COLOR]defined(F_CPU) && F_CPU < 30000000
#undef DMAMEM
#define DMAMEM
#endif
(usb_serial.c)
With the read part added, it happens too, but much much later - after the first time, often.
Without, the first freeze happens after a shorter time.


(Side info: I tried GCC 10, then. Not only that it produces slower code - it need more RAM1. Doom does not compile with GCC10 :-( RAM usage too high.)
 
Tried GCC 9
NO freeze so far - (waited for 15 minutes, now)

So, what does that mean? Seems to be a condition that is not existent anymore, with GCC9.
A weird timing issue. Or strange bug. Something that is handled differently by GCC 5 and 9. Will be not easy to find. But at least for the DOOM-case we know exactly where it happens: Somewhere in usb serial.
Udo`s tests show the same.

Can't do more now.
Good night...zzzz.. have to work tomorrow.
 
Last edited:
my gcc 9:
Code:
C:\Arduino\hardware\tools\arm9\bin>arm-none-eabi-gcc --version
[COLOR=#0000ff]arm-none-eabi-gcc (GNU Arm Embedded Toolchain 9-2020-q2-update) 9.3.1 20200408 (release)[/COLOR]
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
Might be obvious to the experts, but as it wasn't to me, what would happen if I merely left Teensy 4.1 USB connected to my PC, without Serial.begin() and therefore without writing to USBSerial. All writing was via Serial5 -> Arduino Due - PC. Program: v154, as last test. Result after 18 hours: no Freezes -> Freezes are linked to the usage of USBSerial within a sketch. Connecting it and leaving it connected does not result in Freezes (at least not frequent ones). As I said, this might be expected, but I thought, I better mention it.
 
Last edited:
what would happen if I merely left Teensy 4.1 USB connected to my PC, without Serial.begin() and therefore without writing to USBSerial.

Serial.begin does not do anything (I even do not use one)
USB_serial is always used if usb_type is set to Serial. you could test using Seremu to see if it is the Serial implementation that freezes.
if you do write to Serial, obviously the write path will be used, which includes some IRQ and while operations (schedule_transfer)
Note, FrankB mentioned that usb_type RawHid does not freeze.
 
Serial.begin does not do anything (I even do not use one)
USB_serial is always used if usb_type is set to Serial. you could test using Seremu to see if it is the Serial implementation that freezes.
if you do write to Serial, obviously the write path will be used, which includes some IRQ and while operations (schedule_transfer)
Note, FrankB mentioned that usb_type RawHid does not freeze.

Thanks for pointing out! Do I understand this correctly? Depending on USB Type selected, the IDE will select and compile differently?
ANd certainly, I will run a test tonight using Raw HID.
 
Status
Not open for further replies.
Back
Top