Teensy MicroMod freeze/hang when using multiple libraries together

Rezo

Well-known member
Hi all,

I've been working on a long time project that uses the following libraries:
IntervalTimer - used for handling a GUI library timing mechanism
LVGL v7/8 - a GUI library
FlexCAN_t4 - CAN bus communication
ILI948x_t4_mm - my own display driver that uses FlexIO & DMA over an 8080 parallel interface
Teensy MicroMod running at 398Mhz

If using these four libraries at the same time, the Teensy will hang after working for 5-10 seconds.
If I disable the CAN transmission, it works fine.
If I disable the DMA transfer to the screen and use good old polling method, it works fine.

I am in the process of building a simplified program to demonstrate the behavior, but the real question is how do I debug this if the Teensy just hangs/freezes and stays in that state until it's powered down and back up?

I don't get any crashing, so crashReport is useless in this case, and USB comms die out when it freezes so I can't see where it's dying if I do use serial prints.

Any suggestions on how to tackle this?
 
Seems I posted something about this in recent weeks on different thread ... didn't find it yet.
> Wondering the impact of "IntervalTimer - used for handling a GUI library timing mechanism" - seems that was maybe the context of the prior post?

In looking found this note that may apply to this code: "blocking and non blocking(DMA) transfers."
> If an option does it change if the DMA blocks or not?

Given a hard hang without Fault it will be hard - hopefully getting "simplified program to demonstrate the behavior" will show the problem goes away or easier to debug.
 
@defragster I believe it was a friend of mine posting about the use of the same display library, LVGL and NRF24 over SPI.
He was having the same issue, he also changed from DMA transfers to standard polling method and the freeze has gone away.

The question is, with DMA enabled on FlexIO and intervalTimer both running on interrupts with the same priority - is there any chance both of them together would cause this freeze? And why eliminating CAN stops the issue from manifesting as well - I know these are a lot of questions being asked, but I'm just wondering if there is something obvious that I am not taking into account
 
@defragster I believe it was a friend of mine posting about the use of the same display library, LVGL and NRF24 over SPI.
He was having the same issue, he also changed from DMA transfers to standard polling method and the freeze has gone away.

The question is, with DMA enabled on FlexIO and intervalTimer both running on interrupts with the same priority - is there any chance both of them together would cause this freeze? And why eliminating CAN stops the issue from manifesting as well - I know these are a lot of questions being asked, but I'm just wondering if there is something obvious that I am not taking into account

Been long enough to forget who and details :) But, if you recognize the name it does seem the _isr() code was made to set a flag and left to loop() to process ...
FOUND IT ... NRF ... - @Dogbone06: pjrc.com/threads/69605-Teensy-Micromod-freeze-and-timing-issues-LVGL-NRF24
> it was NRF _isr causing trouble
-> it went working
> newest post still shows active NRF _isr() pin causing grief as code developed

Easy to believe interrupt actions and DMA could conflict. Not sure how CAN runs ( supported with _isr() or DMA ? ) and why it would impact/interfere.

Seems the common issue is LVGL and " running 8 bit parallel. Im using David Resniks DMA display library"
-> not sure if the DMA behavior could be improved, or if there is some fundamental timing conflict or contention for memory bus?
 
Yep that's the guy!
CAN runs an isr for the rx, but the issue will happen with transmitting which just happens through the main loop with a millis() delay.
DMA code was written mostly with the help of Eric Eason, which I believe is as good as can be right now, but might be able to go under some optimization. I based it most of the logic of off NXP's RT1050 8080 FlexIO demo for driving a display.
But, I know DMA is conflicting with something as turning it off kills the freeze issue
 
Any suggestions on how to tackle this?
Sounds like a good case for a logic analyzer watching any interrupt pins and those indicating DMA active, and maybe supplemented with other pins showing entry/exit of other code to see what leads up to the hang, and perhaps where to start looking.

That is what I've seen @KurtE doing for complex interactions of SPI and USB, etc.
> this is faster and viewable in context with the LA capture
> Also using Serial.flush() (?) - after prints will avoid a hang after that from preventing pending message transmission giving false feedback on where the code was when the 'last print' appeared.

Any chance the Teensy on doing a Tx gets a ping on the Rx interrupt?

@tonton81 might have an answer to: Does Rx trigger _isr on end of message or start? And does device doing Tx see its own output as 'received'?
> LA could watch a pin HIGH before CAN_Tx() and set LOW when completed, that with monitor of the Rx_isr pin would clarify perhaps.
 
Im still trying to build out a simpler program that will demonstrate the issue, but meanwhiles I’ve tried the following:
1. As the two frame buffers sit in DMAMEM, I've added align 32 attributes to them - same behavior.
2. I’ve tried to move both frame buffers into RAM1 - same issue.
This issue has occurred with both my library writing in DMA mode + LVGL(v7/v8) + I2C with either FlexCAN or SPI.
If I disable I2C (touch screen) it still freezes.
Is there anything in common between SPI and FlexCAN? Clock sources? Interrupts? Anything?

The logic analyzer would have been a great option but I don’t have access to spare pins as this is on a custom pcb with a MicroMod.
Ill see if I can squeeze anything out with an ATP board
 
Given the pin shortage and hard crash :( and :(

Perhaps the FOUR 32 words of the 1062's NVRAM could be used to track in some way? That memory is slow to write as it is on a slower clock - perhaps 32 MHz.
Example code is here: 1062-MCU-has-16-bytes-of-NVRAM-on-RTC-unit
> writing status info to these 4 DWORDS will be printable on warm restart - or COLD if there is an RTC battery.
> some more details on that thread

The DMAMEM also survives WARM restart - so it might work when using DTCM/RAM1 for DMA buffers instead of DMAMEM since it fails there - it writes at 150 MHz - but has to be flushed from CACHE for it to be present on the restart.
> to emulate a personal DMAMEM storage area hopefully this will be enough as PJRC has it done: github.com/PaulStoffregen/cores/blob/master/teensy4/CrashReport.cpp#L255
> It picks an address and makes a structure at the top of the file that is zeroed and flushed in the linked code
Using an address under that used area should survive warm restart and the struct could ignore the CRC, but save data there on entry/exit of suspect functions or code perhaps and then print it in setup on startup.
> Paul added some BreadCrumb tracking feature for the next release of Teensyduino in the above file and github.com/PaulStoffregen/cores/blob/master/teensy4/CrashReport.h

DMAMEM offers more space than the RTC NVRAM - each have their drawbacks - but if one works to help narrow down the point of failure {without adding a new fault path} it could be helpful.
 
Thanks for the tips! BTW its not even a hard crash - just a full on freeze. USB dies out at some point too.

Using a lot of Serial prints I can confirm that the Teensy will freeze as soon as the DMA channel is enabled but as it freezes, It won't trigger my DMA error callback, so even if something is off - I would never know.
 
Interesting if you try something with one or both ...

Soft/Hard: Does anything work in the background? An IntervalTimer interrupt toggling a pin? Watchdog timer? Have seen USB die and an _isr() still running.
 
I have a few millis intervals in the main loop:
1. Tells LVGL how much time has passed
2. Calls LVGL task handler (object drawing etc)
3. CAN transmits using standard mailboxes
4. Custom function that updates LVGL objects on the screen with CAN data

Interrupts:
1. Display driver DMA interrupt - when it’s triggered it calls an LVGL function to let it know it can write the next frame buffer to the screen
2. CAN RX callbacks - though the issue occurres regardless if they are actually triggered or not.

Now, if I don’t initialize FlexCAN or don’t transmit messages, or, disable DMA transfers and push screen data out via a for loops - the issue does not occur.
But as mentioned before, my friend had this issue as well except he was using an NRF receiver over SPI and not CAN.
 
Is it possible some _isr()/DMA repeat is taking too long to run and consuming all CPU cycles? That can prevent any appearance of running and stop USB output or execution of lesser interrupts of loop code.

The prior noted LA pin toggles would show that ... if only it was an option. Short of that the DMAMEM !CrashReport type logging trace of what is called/when and how often might point that out.
 
Did a hack of CrashReport to make a StaticTrace that gives access to both the prior mention RTC or RAM2 'semi static' storage areas.

Question: When the 'soft crash' happens can the Teensy be programmed without a Button press? p#9 says "USB dies out at some point too." - so that suggests the CPU is just in a busy loop - perhaps as noted 'overwhelmed' by cascading interrupt behavior.

In making StaticTrace I see that the storage using RTC is actually much faster than RAM2 without having to write to that memory and then wait for the cache flush.

I'll post the code as it is working and self documented. YMMV as to how the available storage might serve the purpose at hand, perhaps changing bits or values as code enters or exits. No provision for reading what is written was made, though it could/should be added.

As far as SPEED the RTC writes appear to take only 2 CPU cycles (based on writing the cycle counter 3 times) and the same writes to RAM2 appear to take 85 to 100 cycles. So keeping 4 local unit32_t's and updating and writing them with StaticTrace.crumbRTC( {1-4}, VALUE ); would be optimal.
Code:
[ATTACH]28016._xfImport[/ATTACH]

This is an INO sketch using the files for StaticTrace in the same folder to build.
 
@defragster thanks for the sketch and all the info.
I've been out the past few weeks with Covid (my 2nd time) and family matters, so trying to find time to get back to this issue and start troubleshooting.
From the testing I did just before covid, I ran about 10-15 tests. The Teensy would hang every single time after the dma channel was enabled and would never trigger the attached interrupt. But, it didn't do this from the first dma transfer. It would happen about 7-10 seconds in from device startup.
I cannot program the teensy using the Boot button from what I recall, but I can test this again to verify.
 
Welcome back! Hope the sketch helps - let me know if it seems to, or there are questions.

If you can get a battery on the VBAT - those 4 DWORDS will survive reprogramming and even power loss restart.

How you might use them to track state and progress ... ???

One to count entry each for DMA and _isr? Another each to use bits to track location in each from entry to exit?
 
@defragster I'm going to call this a big waste of your time, and I apologize up front.
I just felt like I had to do more basic testing before using your suggested troubleshooting code and steps:
1. I was able to flash by clicking the boot button after it would freeze
2. I thought I had lost USB as my older test board had a bad USB C connector, so movement would kill USB comms
3. I found the issue to be a while loop in my callback - link
I replaced clearing the FlexIO timer flag with a Microsecond delay and that has fixed the issue. The original NXP FlexIO display driver example stated that this specific while loop could pose an issue (and here we can see that it did), and that replacing it with a short delay will fix getting suck in that loop. I didn't think of it as it worked all along, but as soon as other hardware peripherals (CAN/SPI) can into play, it must have slowed/sped something up enough for that while loop's condition to be true.

So, it seems to be working now with no apparent freezing, was running for 20min until I unplugged it.
Thank you for your time put into assisting!
 
Glad you got it working!

No problem it was a fun exercise writing the StaticTrace code. It may come in handy some time.

Bummer, you didn't get to use it to see it work - or needed changes. But in this case simple Print/Flush sprinkle in the code would have showed where it was stalling.
 
Back
Top