"watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker ...]"

jaggz

New member
Code:
"watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker ...]"

Is it possible Teensyduino could cause this?
I've experienced this system locking up this CPU twice now, in two days, during repeated dev and flashing of a Teensy 4. I work on it for hours at a time though, and with only two times it's not that high of a coincidence, but it *has* only happened during times that I'm actively coding+flashing the board, and many programs still function, but Arduino and Teensyduino's little window end up being completely locked up (and can't be killed), and then some other software begins to fail. I can't shut down (with things starting to get frozen) and have to cold-reset.

Syslog does not show the "soft lockup" message, instead it shows this log:

Code:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1444 __queue_work.cold.52+0xc/0x35
Modules linked in: ...  nvidia_drm(POE) ... nvidia_modeset(POE) ...
CPU: 3 PID: 0 Comm: swapper/4 Tainted: P           OE     4.19.0-12-amd64 #1 Debian 4.19.152-1
Hardware name: ASUS All Series/Z97-PRO, BIOS 3503 04/18/2018
RIP: 0010:__queue_work.cold.52+0xc/0x35
Code: b6 ff ff 48 c7 c7 50 64 a5 89 c6 05 2f 01 09 01 01 45 31 ed e8 75 35 04 00 e9 51 ba ff ff 48 c7 c7 f8 e2 a3 89 e8 64 35 04 00 <0f> 0b 48 8b 3b c6 07 00 0f 1f 40 00 e9 37 be ff ff 48 c7 c7 98 64
RSP: 0018:ffff9c0f7ed03d68 EFLAGS: 00010046
RAX: 0000000000000024 RBX: ffff9c0f7ed25900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9c0f7ed166b8 RDI: ffff9c0f7ed166b8
RBP: 0000000000000200 R08: 0000000000001003 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000004
R13: ffff9c0f7e818c00 R14: ffffffff89aeb720 R15: ffff9c0cbcf9d790
FS:  0000000000000000(0000) GS:ffff9c0f7ed00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f50c84b1000 CR3: 000000081ea0a006 CR4: 00000000001606e0
Call Trace:
 <IRQ>
 queue_work_on+0x34/0x40
 __usb_hcd_giveback_urb+0x84/0x140 [usbcore]
 xhci_giveback_urb_in_irq.isra.40+0x7d/0xf0 [xhci_hcd]
 xhci_td_cleanup+0xfb/0x160 [xhci_hcd]
 xhci_irq+0x627/0x2330 [xhci_hcd]
 ? rt2800usb_txstatus_timeout.isra.9+0xe0/0xe0 [rt2800usb]
 __handle_irq_event_percpu+0x46/0x190
 handle_irq_event_percpu+0x30/0x80
 handle_irq_event+0x3c/0x5c
 handle_edge_irq+0x97/0x1e0
 handle_irq+0x1f/0x30
 do_IRQ+0x49/0xe0
 common_interrupt+0xf/0xf
 </IRQ>
RIP: 0010:cpuidle_enter_state+0xb9/0x320
Code: e8 dc b3 b0 ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 0e a6 b6 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
RSP: 0018:ffffbdf9c31d3e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
RAX: ffff9c0f7ed220c0 RBX: 00007b6ba08e6b3e RCX: 000000000000001f
RDX: 00007b6ba08e6b3e RSI: 000000002004c001 RDI: 0000000000000000
RBP: ffff9c0f7ed2a310 R08: 0000000000000002 R09: 0000000000021980
R10: 0001ed7ba350115c R11: ffff9c0f7ed210a8 R12: 0000000000000004
R13: ffffffff89cb8978 R14: 0000000000000004 R15: 0000000000000000
 do_idle+0x228/0x270
 cpu_startup_entry+0x6f/0x80
 start_secondary+0x1a4/0x200
 secondary_startup_64+0xa4/0xb0
---[ end trace 1fe9bed41dff0c67 ]---
 
Sorry I am probably not much help on this one, as I am guessing it is a Linux setup, and don't know enough of the internals.

But might help others that might be able to help more to know some additional information like what type of machine and which OS...
From your message of Syslog I am assuming a 64 bit Linux. ..

Likewise what version of Arduino and Teensyduino. Or are you using some other build setup...
 
On Win 10 a build yesterday showed a register dump compile failure.

Did note read (beyond seeing compile fail and register spew) or save the Compile output - just hit "F7" in sublime to recompile and then it worked.

This is a first AFAIK in the years here. May have just been a disk I/O error?

> but OP error is not during compile
 
Last edited:
Again from the message above:
Code:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1444 __queue_work.cold.52+0xc/0x35
Modules linked in: ...  nvidia_drm(POE) ... nvidia_modeset(POE) ...
CPU: 3 PID: 0 Comm: swapper/4 Tainted: P           OE     4.19.0-12[COLOR="#FF0000"]-amd64 #1 Debian 4.19.152-1[/COLOR]
I am somewhat sure that it is a form of linux...
But a MAC might do this as well? as underlying some of it is some form of linux.

If MAC maybe ran out of memory in Terminal monitor? As the new beta does mention a fix: Serial Monitor fix memory leak on MacOS

But reading the dump information more, it looks like a PC running some form of Linux.
Code:
Hardware name: ASUS All Series/Z97-PRO

Also guessing something to do with display: Modules linked in: ... nvidia_drm(POE) ... nvidia_modeset(POE) ...
 
Details update [sorry]

Okay, one would think I would know better by now.
This is a Debian Stable (Buster) system, using the backports repository to get the latest-possible available updates.
The list of modules from the log I snipped down because it was several lines, each of 1000+ chars. I realize now that leaving the official non-free Nvidia module in there was then misleading, instead of its intent to just inform that I *am* using the "outside" Nvidia gpu drivers (not that it probably relates to the problem).
It is, as shown in the logs, an ASUS Z97-Pro motherboard, with Intel I7-4790K (4 core, 8 thread) CPU.

Sorry to make you guys have to guess at that. That was irresponsible of me!
 
Back
Top