Loader crashes, Monitor closed, Linux

Status
Not open for further replies.
Sounds great!

Also, for info, there are two directories on Github
https://github.com/boblark/TestLoadT
that are the Arduino outputs, .eep, .hex, .elf, .sym and .lst:
CompiledFiles_0_OK T4.0 with begin() in setup()
CompiledFiles_1_NG T4.0 with begin() in constructor
The latter directory duplicates some of the ZIP, but does not have anything for T3.6.
 
FWIW, and so that Bob doesn't think he's crazy, I am having a very similar issue.
OS: Ubuntu 18.04.03
Kernel: 4.15.0-112-generic
System: Lenovo P50, i7-6820HQ
Teensyduino: 1.53
Arduino: 1.18.12
Affected hardware: Teensy 4.1, custom carrier PCB with Magjack

Observations:
1. Same exact symptoms w.r.t. Arduino IDE and Teensy Loader as originally described by Bob.
2. Very similar kernel log to Bob (posted below).
3. I actively use many types of Teensy's and have for years on this exact system and setup. The issue only occurs with T4.1 for me (I don't have a T4.0).
4. The issue happens with INO/loads of all types, so I'm not sure it's app or library-specific. I am using NativeEthernet/FNET in most/all recipes.
5. The issue happens spuriously, and when it occurs I need to reboot, but after I reboot I can flash that same image successfully. The only way to reproduce the bug seems to be plugging/unplugging the T4.1, and after a random number of times it will occur. I would SWAG the probability of occurrence around 1/20.

Opinion: I'm not a Linux/kernel/USB expert, but this does have the feeling of a kernel ACM bug, and specifically something related to timing/race condition. My uninformed guess is that this has less to do with the T4.1 itself and more to do with the IMXRT USB speed/timing that may pair unfortunately well with a kernel-side bug.

Kernel log is below, beginning of the log shows some successful plug/load/unplugs, exciting stuff at the end.

Code:
[ 1023.058052] usb 1-2: USB disconnect, device number 53
[ 1023.363692] usb 1-2: new high-speed USB device number 54 using xhci_hcd
[ 1023.512704] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1023.512708] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1023.512711] usb 1-2: Product: USB Serial
[ 1023.512713] usb 1-2: Manufacturer: Teensyduino
[ 1023.512715] usb 1-2: SerialNumber: 7652530
[ 1023.513352] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1029.137552] usb 1-2: USB disconnect, device number 54
[ 1029.443664] usb 1-2: new high-speed USB device number 55 using xhci_hcd
[ 1029.592488] usb 1-2: New USB device found, idVendor=16c0, idProduct=0478
[ 1029.592490] usb 1-2: New USB device strings: Mfr=0, Product=0, SerialNumber=1
[ 1029.592491] usb 1-2: SerialNumber: 000BAD45
[ 1029.593493] hid-generic 0003:16C0:0478.000E: hidraw3: USB HID v1.11 Device [HID 16c0:0478] on usb-0000:00:14.0-2/input0
[ 1030.466703] usb 1-2: USB disconnect, device number 55
[ 1030.771691] usb 1-2: new high-speed USB device number 56 using xhci_hcd
[ 1030.920288] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1030.920291] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1030.920293] usb 1-2: Product: USB Serial
[ 1030.920295] usb 1-2: Manufacturer: Teensyduino
[ 1030.920296] usb 1-2: SerialNumber: 7652530
[ 1030.920910] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1243.292603] usb 1-2: USB disconnect, device number 56
[ 1243.603662] usb 1-2: new high-speed USB device number 57 using xhci_hcd
[ 1243.752574] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1243.752578] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1243.752581] usb 1-2: Product: USB Serial
[ 1243.752583] usb 1-2: Manufacturer: Teensyduino
[ 1243.752585] usb 1-2: SerialNumber: 7652530
[ 1243.753244] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1251.186166] usb 1-2: USB disconnect, device number 57
[ 1251.491877] usb 1-2: new high-speed USB device number 58 using xhci_hcd
[ 1251.640579] usb 1-2: New USB device found, idVendor=16c0, idProduct=0478
[ 1251.640582] usb 1-2: New USB device strings: Mfr=0, Product=0, SerialNumber=1
[ 1251.640584] usb 1-2: SerialNumber: 000BAD45
[ 1251.641647] hid-generic 0003:16C0:0478.000F: hidraw3: USB HID v1.11 Device [HID 16c0:0478] on usb-0000:00:14.0-2/input0
[ 1252.456342] usb 1-2: USB disconnect, device number 58
[ 1252.767696] usb 1-2: new high-speed USB device number 59 using xhci_hcd
[ 1252.916796] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1252.916800] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1252.916803] usb 1-2: Product: USB Serial
[ 1252.916806] usb 1-2: Manufacturer: Teensyduino
[ 1252.916808] usb 1-2: SerialNumber: 7652530
[ 1252.917544] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1342.511803] usb 1-2: USB disconnect, device number 59
[ 1342.819876] usb 1-2: new high-speed USB device number 60 using xhci_hcd
[ 1342.968695] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1342.968698] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1342.968700] usb 1-2: Product: USB Serial
[ 1342.968701] usb 1-2: Manufacturer: Teensyduino
[ 1342.968703] usb 1-2: SerialNumber: 7652530
[ 1342.969268] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1346.102181] usb 1-2: USB disconnect, device number 60
[ 1346.407887] usb 1-2: new high-speed USB device number 61 using xhci_hcd
[ 1346.556413] usb 1-2: New USB device found, idVendor=16c0, idProduct=0478
[ 1346.556415] usb 1-2: New USB device strings: Mfr=0, Product=0, SerialNumber=1
[ 1346.556416] usb 1-2: SerialNumber: 000BAD45
[ 1346.557276] hid-generic 0003:16C0:0478.0010: hidraw3: USB HID v1.11 Device [HID 16c0:0478] on usb-0000:00:14.0-2/input0
[ 1347.451222] usb 1-2: USB disconnect, device number 61
[ 1347.759652] usb 1-2: new high-speed USB device number 62 using xhci_hcd
[ 1347.908676] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1347.908679] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1347.908681] usb 1-2: Product: USB Serial
[ 1347.908682] usb 1-2: Manufacturer: Teensyduino
[ 1347.908684] usb 1-2: SerialNumber: 7652530
[ 1347.909290] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1486.663739] usb 1-2: USB disconnect, device number 62
[ 1486.971879] usb 1-2: new high-speed USB device number 63 using xhci_hcd
[ 1487.120579] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1487.120581] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1487.120582] usb 1-2: Product: USB Serial
[ 1487.120583] usb 1-2: Manufacturer: Teensyduino
[ 1487.120584] usb 1-2: SerialNumber: 7652530
[ 1487.121071] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1490.749554] usb 1-2: USB disconnect, device number 63
[ 1491.055671] usb 1-2: new high-speed USB device number 64 using xhci_hcd
[ 1491.204414] usb 1-2: New USB device found, idVendor=16c0, idProduct=0478
[ 1491.204415] usb 1-2: New USB device strings: Mfr=0, Product=0, SerialNumber=1
[ 1491.204416] usb 1-2: SerialNumber: 000BAD45
[ 1491.205341] hid-generic 0003:16C0:0478.0011: hidraw3: USB HID v1.11 Device [HID 16c0:0478] on usb-0000:00:14.0-2/input0
[ 1492.079530] usb 1-2: USB disconnect, device number 64
[ 1492.387668] usb 1-2: new high-speed USB device number 65 using xhci_hcd
[ 1492.536651] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[ 1492.536656] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1492.536659] usb 1-2: Product: USB Serial
[ 1492.536662] usb 1-2: Manufacturer: Teensyduino
[ 1492.536664] usb 1-2: SerialNumber: 7652530
[ 1492.537436] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[ 1620.372164] cdc_acm 1-2:1.0: failed to set dtr/rts
[ 1620.372549] cdc_acm 1-2:1.0: failed to set dtr/rts
[ 1620.895890] BUG: unable to handle kernel paging request at 0000000403ce40b8
[ 1620.895904] IP: usb_submit_urb+0xe/0x5b0
[ 1620.895907] PGD 0 P4D 0 
[ 1620.895912] Oops: 0000 [#1] SMP PTI
[ 1620.895915] Modules linked in: cdc_acm thunderbolt bnep binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nvidia_uvm(OE) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic arc4 nvidia(POE) intel_wmi_thunderbolt wmi_bmof iwlmvm kvm irqbypass mac80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel uvcvideo videobuf2_vmalloc videobuf2_memops aes_x86_64 snd_seq_midi videobuf2_v4l2 crypto_simd snd_hda_intel glue_helper snd_seq_midi_event videobuf2_core cryptd btusb btrtl snd_hda_codec btbcm iwlwifi btintel snd_hda_core intel_cstate i915 intel_rapl_perf videodev bluetooth snd_hwdep thinkpad_acpi serio_raw joydev input_leds snd_rawmidi rtsx_pci_ms nvram drm_kms_helper mei_me cfg80211 memstick media ecdh_generic
[ 1620.895953]  snd_seq mei ipmi_devintf drm snd_pcm ipmi_msghandler intel_pch_thermal i2c_algo_bit snd_seq_device fb_sys_fops snd_timer syscopyarea sysfillrect wmi sysimgblt snd shpchp mac_hid soundcore video sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid rtsx_pci_sdmmc e1000e psmouse ptp nvme pps_core rtsx_pci ahci nvme_core libahci
[ 1620.895979] CPU: 1 PID: 60 Comm: kworker/1:1 Tainted: P           OE    4.15.0-112-generic #113-Ubuntu
[ 1620.895982] Hardware name: LENOVO 20EN001FUS/20EN001FUS, BIOS N1EET86W (1.59 ) 08/28/2019
[ 1620.895989] Workqueue: events acm_softint [cdc_acm]
[ 1620.895995] RIP: 0010:usb_submit_urb+0xe/0x5b0
[ 1620.895997] RSP: 0018:ffffc03041b1fe20 EFLAGS: 00010206
[ 1620.896000] RAX: 0000000000000000 RBX: ffff9d7bea2dc020 RCX: 0000000000000000
[ 1620.896003] RDX: 0000000001400000 RSI: 0000000001400000 RDI: 0000000403ce4000
[ 1620.896005] RBP: ffffc03041b1fe48 R08: 0000000000000000 R09: ffff9d7c05148230
[ 1620.896007] R10: 0000000000000400 R11: ffff9d7c056ce110 R12: 000000000000000e
[ 1620.896009] R13: 000000000000000e R14: ffff9d7bea2dc020 R15: 0000000000000000
[ 1620.896012] FS:  0000000000000000(0000) GS:ffff9d7c1f440000(0000) knlGS:0000000000000000
[ 1620.896015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1620.896017] CR2: 0000000403ce40b8 CR3: 0000000156a0a005 CR4: 00000000003606e0
[ 1620.896020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1620.896022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1620.896024] Call Trace:
[ 1620.896030]  ? acm_submit_read_urb+0x42/0x90 [cdc_acm]
[ 1620.896035]  acm_softint+0x6b/0xe0 [cdc_acm]
[ 1620.896041]  process_one_work+0x1de/0x420
[ 1620.896046]  worker_thread+0x32/0x410
[ 1620.896049]  kthread+0x121/0x140
[ 1620.896053]  ? process_one_work+0x420/0x420
[ 1620.896057]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1620.896063]  ret_from_fork+0x35/0x40
[ 1620.896066] Code: f3 c3 48 83 c7 10 31 c9 ba 01 00 00 00 be 03 00 00 00 e8 d6 74 99 ff 5d c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 98 05 00 00 <48> 83 bf b8 00 00 00 00 0f 84 8a 05 00 00 55 48 89 e5 41 57 41 
[ 1620.896098] RIP: usb_submit_urb+0xe/0x5b0 RSP: ffffc03041b1fe20
[ 1620.896100] CR2: 0000000403ce40b8
[ 1620.896103] ---[ end trace 3c85607b72e43004 ]---
[ 1631.745491] usb 1-2: USB disconnect, device number 65
 
@buddym - 1.8.12 had a short life for issues it caused and IDE 1.8.13 popped out quickly. Problems may not relate to Ubuntu - or the problem at hand ... but ...

It might be good to upgrade to 1.8.13 and install the TD 1.53 for a better baseline.
 
FWIW, I went to IDE 1.8.13 and TD 1.53 a week or so ago and did not observe any change. My Linux kernel is 5.3.0-62, but again the change from 4.1.15 did not seem to change things.
 
I upgraded to 1.8.13, and was still able to repro the issue. I did a two experiments a bit more carefully in case the data is helpful.

Experiment 1:
1. Fresh reboot, do NOT start Arduino or Teensy loader
2. Plug T4.1 in, check dmesg, wait 5 seconds, unplug T4.1, check dmesg
3. Repeat 20 times for each USB port on my laptop.

Results was normal connects and disconnects for every trial.

Experiment 2:
1. Fresh reboot, start Arduino IDE
2. Plug in T4.1, check dmesg
3. Flash program onto T4.1, check dmesg
4. Unplug T4.1, check dmesg
5. Repeat from step 2

Trials 1 through 10 resulted in normal behavior. On trial 11:
* Plugged in T4.1, normal dmesg kernel log
* Clicked Upload in Arduino IDE
* Arduino IDE stalled for a few seconds, and produced the message:
Code:
Teensy did not respond to a USB-based request to enter program mode.
Please press the PROGRAM MODE BUTTON on your Teensy to upload your sketch.
An error occurred while uploading the sketch

* Checked dmesg log, and saw this:
Code:
[28057.022289] usb 1-2: new high-speed USB device number 44 using xhci_hcd
[28057.171404] usb 1-2: New USB device found, idVendor=16c0, idProduct=0483
[28057.171409] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[28057.171412] usb 1-2: Product: USB Serial
[28057.171415] usb 1-2: Manufacturer: Teensyduino
[28057.171417] usb 1-2: SerialNumber: 7652530
[28057.172260] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[28066.882586] cdc_acm 1-2:1.0: failed to set dtr/rts
[28066.882986] cdc_acm 1-2:1.0: failed to set dtr/rts

* Pressed the Program button on T4.1
* Checked dmesg log, and saw this:
Code:
[28078.934804] usb 1-2: USB disconnect, device number 44
[28078.934894] ------------[ cut here ]------------
[28078.934897] kobject: '' (00000000e6636f70): is not initialized, yet kobject_get() is being called.
[28078.934926] WARNING: CPU: 2 PID: 24784 at /build/linux-WiQfz7/linux-4.15.0/lib/kobject.c:614 kobject_get+0x38/0x50
[28078.934927] Modules linked in: ccm cdc_acm thunderbolt bnep snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl binfmt_misc snd_hda_codec_generic nvidia_uvm(OE) x86_pkg_temp_thermal intel_powerclamp nvidia_drm(POE) coretemp arc4 nvidia_modeset(POE) nls_iso8859_1 nvidia(POE) intel_wmi_thunderbolt wmi_bmof kvm irqbypass crct10dif_pclmul crc32_pclmul iwlmvm ghash_clmulni_intel pcbc mac80211 i915 aesni_intel snd_seq_midi snd_hda_intel snd_seq_midi_event snd_hda_codec aes_x86_64 crypto_simd glue_helper snd_hda_core cryptd uvcvideo snd_hwdep intel_cstate iwlwifi videobuf2_vmalloc btusb videobuf2_memops btrtl thinkpad_acpi intel_rapl_perf videobuf2_v4l2 snd_pcm videobuf2_core btbcm serio_raw rtsx_pci_ms btintel snd_rawmidi joydev input_leds drm_kms_helper videodev nvram memstick cfg80211 media snd_seq bluetooth
[28078.934976]  drm snd_seq_device ipmi_devintf snd_timer ipmi_msghandler i2c_algo_bit fb_sys_fops ecdh_generic mei_me syscopyarea snd sysfillrect mei intel_pch_thermal sysimgblt shpchp soundcore wmi video mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid rtsx_pci_sdmmc e1000e psmouse ptp nvme pps_core rtsx_pci nvme_core ahci libahci
[28078.935006] CPU: 2 PID: 24784 Comm: kworker/2:2 Tainted: P           OE    4.15.0-112-generic #113-Ubuntu
[28078.935008] Hardware name: LENOVO 20EN001FUS/20EN001FUS, BIOS N1EET86W (1.59 ) 08/28/2019
[28078.935015] Workqueue: usb_hub_wq hub_event
[28078.935020] RIP: 0010:kobject_get+0x38/0x50
[28078.935022] RSP: 0018:ffffb3fb04f77a68 EFLAGS: 00010082
[28078.935025] RAX: 0000000000000000 RBX: ffff962a7aa10090 RCX: 0000000000000006
[28078.935027] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff962b5f496490
[28078.935029] RBP: ffffb3fb04f77a70 R08: 000000000000095e R09: 0000000000000004
[28078.935031] R10: ffffb3fb04f77bd8 R11: 0000000000000001 R12: 0000000000000292
[28078.935033] R13: ffff962a7aa10090 R14: 00000000fffffffe R15: ffff96291a52f440
[28078.935036] FS:  0000000000000000(0000) GS:ffff962b5f480000(0000) knlGS:0000000000000000
[28078.935038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28078.935040] CR2: 0000093a13eb8000 CR3: 000000049160a001 CR4: 00000000003606e0
[28078.935042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[28078.935044] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[28078.935045] Call Trace:
[28078.935053]  get_device+0x17/0x20
[28078.935057]  usb_get_dev+0x1e/0x30
[28078.935062]  usb_hcd_unlink_urb+0x5d/0xf0
[28078.935067]  ? __kernfs_remove.part.12+0x101/0x1f0
[28078.935072]  usb_kill_urb+0x4b/0xc0
[28078.935080]  acm_kill_urbs+0x59/0x70 [cdc_acm]
[28078.935084]  acm_disconnect+0x103/0x200 [cdc_acm]
[28078.935089]  usb_unbind_interface+0x7e/0x270
[28078.935095]  device_release_driver_internal+0x13a/0x220
[28078.935099]  device_release_driver+0x12/0x20
[28078.935103]  bus_remove_device+0xec/0x160
[28078.935107]  device_del+0x169/0x3b0
[28078.935112]  ? usb_remove_ep_devs+0x1f/0x30
[28078.935118]  usb_disable_device+0x9f/0x250
[28078.935122]  usb_disconnect+0xc6/0x270
[28078.935126]  hub_port_connect+0x7d/0xa30
[28078.935131]  hub_event+0x85e/0xb20
[28078.935137]  ? __switch_to_asm+0x35/0x70
[28078.935141]  ? __switch_to_asm+0x41/0x70
[28078.935145]  ? __switch_to_asm+0x35/0x70
[28078.935152]  process_one_work+0x1de/0x420
[28078.935157]  worker_thread+0x32/0x410
[28078.935161]  kthread+0x121/0x140
[28078.935164]  ? process_one_work+0x420/0x420
[28078.935168]  ? kthread_create_worker_on_cpu+0x70/0x70
[28078.935172]  ret_from_fork+0x35/0x40
[28078.935175] Code: 83 ec 08 f6 47 3c 01 74 0c f0 ff 40 38 0f 88 ed 3f 01 00 c9 c3 48 8b 37 48 89 fa 48 89 7d f8 48 c7 c7 c0 66 fa 84 e8 18 fd 6e ff <0f> 0b 48 8b 45 f8 eb d6 f3 c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 
[28078.935214] ---[ end trace 5744e350639f1c87 ]---
[28078.935222] BUG: unable to handle kernel paging request at 000000010995e10f
[28078.935233] IP: unlink1+0x25/0x110
[28078.935236] PGD 0 P4D 0 
[28078.935241] Oops: 0000 [#1] SMP PTI
[28078.935245] Modules linked in: ccm cdc_acm thunderbolt bnep snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl binfmt_misc snd_hda_codec_generic nvidia_uvm(OE) x86_pkg_temp_thermal intel_powerclamp nvidia_drm(POE) coretemp arc4 nvidia_modeset(POE) nls_iso8859_1 nvidia(POE) intel_wmi_thunderbolt wmi_bmof kvm irqbypass crct10dif_pclmul crc32_pclmul iwlmvm ghash_clmulni_intel pcbc mac80211 i915 aesni_intel snd_seq_midi snd_hda_intel snd_seq_midi_event snd_hda_codec aes_x86_64 crypto_simd glue_helper snd_hda_core cryptd uvcvideo snd_hwdep intel_cstate iwlwifi videobuf2_vmalloc btusb videobuf2_memops btrtl thinkpad_acpi intel_rapl_perf videobuf2_v4l2 snd_pcm videobuf2_core btbcm serio_raw rtsx_pci_ms btintel snd_rawmidi joydev input_leds drm_kms_helper videodev nvram memstick cfg80211 media snd_seq bluetooth
[28078.935281]  drm snd_seq_device ipmi_devintf snd_timer ipmi_msghandler i2c_algo_bit fb_sys_fops ecdh_generic mei_me syscopyarea snd sysfillrect mei intel_pch_thermal sysimgblt shpchp soundcore wmi video mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid rtsx_pci_sdmmc e1000e psmouse ptp nvme pps_core rtsx_pci nvme_core ahci libahci
[28078.935305] CPU: 2 PID: 24784 Comm: kworker/2:2 Tainted: P        W  OE    4.15.0-112-generic #113-Ubuntu
[28078.935308] Hardware name: LENOVO 20EN001FUS/20EN001FUS, BIOS N1EET86W (1.59 ) 08/28/2019
[28078.935313] Workqueue: usb_hub_wq hub_event
[28078.935319] RIP: 0010:unlink1+0x25/0x110
[28078.935322] RSP: 0018:ffffb3fb04f77a70 EFLAGS: 00010206
[28078.935326] RAX: ffff962a7aa10090 RBX: ffff962a7aa10040 RCX: 0000000000000006
[28078.935329] RDX: 00000000fffffffe RSI: ffff962a7aa10040 RDI: 000000010995dfff
[28078.935332] RBP: ffffb3fb04f77a98 R08: 000000000000095e R09: 0000000000000004
[28078.935335] R10: ffffb3fb04f77bd8 R11: 0000000000000001 R12: 000000010995dfff
[28078.935338] R13: ffff962a7aa10090 R14: 00000000fffffffe R15: ffff96291a52f440
[28078.935341] FS:  0000000000000000(0000) GS:ffff962b5f480000(0000) knlGS:0000000000000000
[28078.935345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28078.935347] CR2: 000000010995e10f CR3: 000000049160a001 CR4: 00000000003606e0
[28078.935350] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[28078.935353] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[28078.935355] Call Trace:
[28078.935362]  usb_hcd_unlink_urb+0x7f/0xf0
[28078.935366]  ? __kernfs_remove.part.12+0x101/0x1f0
[28078.935372]  usb_kill_urb+0x4b/0xc0
[28078.935378]  acm_kill_urbs+0x59/0x70 [cdc_acm]
[28078.935384]  acm_disconnect+0x103/0x200 [cdc_acm]
[28078.935389]  usb_unbind_interface+0x7e/0x270
[28078.935396]  device_release_driver_internal+0x13a/0x220
[28078.935401]  device_release_driver+0x12/0x20
[28078.935406]  bus_remove_device+0xec/0x160
[28078.935411]  device_del+0x169/0x3b0
[28078.935416]  ? usb_remove_ep_devs+0x1f/0x30
[28078.935422]  usb_disable_device+0x9f/0x250
[28078.935428]  usb_disconnect+0xc6/0x270
[28078.935433]  hub_port_connect+0x7d/0xa30
[28078.935439]  hub_event+0x85e/0xb20
[28078.935445]  ? __switch_to_asm+0x35/0x70
[28078.935450]  ? __switch_to_asm+0x41/0x70
[28078.935455]  ? __switch_to_asm+0x35/0x70
[28078.935462]  process_one_work+0x1de/0x420
[28078.935467]  worker_thread+0x32/0x410
[28078.935471]  kthread+0x121/0x140
[28078.935476]  ? process_one_work+0x420/0x420
[28078.935480]  ? kthread_create_worker_on_cpu+0x70/0x70
[28078.935486]  ret_from_fork+0x35/0x40
[28078.935490] Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fc 48 89 f3 48 83 ec 08 48 8b 46 48 48 83 78 38 00 74 23 <48> 8b 87 10 01 00 00 48 8b 40 68 e8 2b 67 4c 00 41 89 c5 48 83 
[28078.935543] RIP: unlink1+0x25/0x110 RSP: ffffb3fb04f77a70
[28078.935545] CR2: 000000010995e10f
[28078.935550] ---[ end trace 5744e350639f1c88 ]---

* Unplugged T4.1
*
Code:
/dev/ttyACM0
appears to still exist
* Plugged in T4.1, no additional dmesg log
* Click Upload in Arduino IDE, progress stuck at "Compiling Sketch..." about 10% on the status bar, IDE and Teensy Loader unresponsive when I try to close the windows
* Can only kill via
Code:
sudo pkill -f teensy*
and
Code:
sudo pkill -f java
 
Thanks, buddym for the experiments. I have to admit it helps to see that you also see the issue :)

I have been busy this last week on the Audio Vector Network Analyzer project, adding more capability. This is with T3.6 and involves the usual collection of compile and loads, sometimes quite a few. This was without without any usb issue when, out of the blue, the IDE hung, lost all response and produced an Oops that required a boot of Linux. After the reboot, the same code compiled and ran without a problem. The syslog entries (before reboot) were the same sort of usb activity along with "watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:3985]"

This is not repeatable like the T4.0 hex files, but seems to have a family resemblance.

Bob
 
It's the same kernel use-after-free bug. When the Teensy resets, some of the URB sent from the host to the Teensy need to be canceled. (acm_kill_urbs() in the oops, in this case.)
Like I linked to above, this case is being talked about among the Linux kernel USB stack developers here, but they seem to think the issue does not hit normal users at all, because nobody has forwarded such oopses to the linux-usb mailing list.

Based on Bug #208843, I don't think Teensy is the only one that triggers it; SparkFun Red-V seems to do it too.

It is possible this is related, but the oopses look different.

There are a number of suggested kernel patches that may solve this: this and this in particular.

I apologise for not being able to help with this directly, by contacting the Linux USB devs, sorry. I do know this stuff, but I am only a broken husk of a man (burnout, severe depression).
 
Many thanks for the info, Nominal Animal. The URB race condition description sounds like it. This would seem like the key to getting this under control. Great! No apologies needed, your insights are most valuable!

How might we submit reports? I get little dialog boxes upon "oops" that ask for crash log submission. I do this sometimes, but I have no feeling that they do anything.

OK, assuming the proposed temporary fix would solve the issue, it will still take time to work its way to us. For people like me, working at the Teensy code level, it would really help to have a work-around that might reliably prevent the issue. Paul, might some change in loader timing or maybe the order of USB calls around re-boot time, or whatever, hide this bug? Or anything else that I could do as a reliable work around?
 
As far as I can tell, we hit the kernel bug when Teensy resets, and there is still data begin sent to the Teensy. If we could ensure the reset packet is the last one we try to send, immediately closing the character device after sending it, we might be able to avoid the bug.

However, I happen to have an Ubuntu 18.04 LTS machine and a Teensy 4.0 I can try to reproduce the issue with (I don't have a 3.5 or 3.6), and if I can reproduce the issue, I can build a custom x86-64 Ubuntu kernel to see if they get rid of the race condition. If it does, we can report it to both Ubuntu kernel maintainers and the Linux USB maintainers.

As to reporting kernel bugs directly to kernel developers, the kernelnewbies.org/FoundBug page outlines the general procedure. (Simon Tatham's old How to Report Bugs Effectively is also a good read, for generally reporting bugs.)

In this case, we need the crash report. If you can run dmesg > /tmp/dmesg.txt immediately after, it too will help; you can anonymize (replace identifying information with asterisks or [EDITED OUT] or something similar), but ensure you have data for a second or so before the actual crash, and for a second or so after, because the context is useful. We also need to know which exact kernel you are running; run uname -a for the full name and architecture (although uname -rvmp is sufficient). I do believe you are running Ubuntu kernel linux-image-4.15.0-108-generic on x86-64. (I'm running 5.4.0.42-46~18.04.3 after my next boot; I'll see if I can temporarily downgrade.)

(Next, we should look at the oops report RIP: line, here usb_kill_urb+0x3a/0xc0, and use e.g. elixir.bootlin.com to find the definition of that function, usb_kill_urb, in your particular kernel. It happens to be in drivers/usb/core/urb.c. Then, we download the linux sources, and in the linux directory in it, run perl scripts/get_maintainer.pl -f drivers/usb/core/urb.c to find out who to contact. In this case, doing this to usb_kill_urb tells us to send email to Greg KH, linux-usb, and linux-kernel; and doing this to acm_kill_urbs adds Oliver Neukum (maintainer of the USB ACM driver) to the list. I've included these in the message prototype below.)

Optionally, you should subscribe to the linux-usb mailing list for now, so you don't miss any responses.

Send an email describing the issue, including the dmesg and the crash report as attachments. I would recommend describing the problem something like the following – but please use your own words and understanding. These are technical people, and interested in the work product, so best be concise but ensure you provide enough information. (Here, the dmesg oops is crucial, I believe – but I could obviously be wrong, too, so be prepared for follow-up questions and suggestions.)



To: linux-usb@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Oliver Neukum <oneukum@suse.com>, linux-kernel@vger.kernel.org
Subject: Oops when USB device resets - acm_kill_urbs()/usb_kill_urb()

Hello!

My Ubuntu 18.04 LTS system tends to Oops whenever I upload a new firmware to a Teensy 3.6 microcontroller with a native high speed USB 2.0 interface. This upload process involves the microcontroller rebooting itself, causing the USB device to disconnect. It looks like there are a few URBs in flight when this happens, and an oops happens in acm_kill_urbs()/usb_kill_urb().

I was told that here on linux-usb mailing list, something related has been discussed in the recent weeks, so I was hoping the attached oops and crash report helps. I am not very kernel-savvy, and not sure if I can compiler and run the newest vanilla kernel, but if there is something perhaps simpler I can help you with, I'd be happy to try to.

Best Regards,
MySignature



When dealing with free/open source developers, especially Linux kernel or any GNU project developers, it is important to remember that the core currency in FOSS is time and effort, not money. If you can present the case that here is a problem that hits me, and probably lots of others too, and would like to help fix it – especially trying to provide any information the developers might need to fix this! –, and are willing to try whatever is within your skill range, the response from the developers should be positive.

Unfortunately, sometimes even an obviously correct patch gets completely ignored. (In my case, that triggers my core issues, which is why I'm so hesitant to try and be more proactive here; hits me in a very soft spot, you see.) One should remember that these lists and people deal with a LOT of email, and sometimes things fall through the cracks. In my case, I should have sent those patches to Kernel Janitors instead.
 
Thanks again Nominal Animal, for the insights. As a general thing, I would like to learn more about the control process for kernel related software. I shall study the material you put together. I shall be away for a few days, so it will wait until then.

Also, as last this was left, Paul was going to investigate the crash issue. He has knowledge of both sides of the USB communications in detail. It is probably best to wait for Paul to get a chance to explore this, as that may get the best understanding.

Bob
 
I'm just glad to be of help. I've got another project going on – Teensy 4.0 and 4.1 interactive pinouts in a browser – so I'll concentrate on that for now. If you'd like me to try and find the bug and test the patches, drop me a note (email is best; my address is on my home page). :)
 
Have either of you used TyCommander? Seems Linux users get to self build it : github.com/Koromix/tytools/releases

It integrates a loader replacement. It doesn't push the RTC value to the Teensy {as that is yet unpublished?} - but otherwise performs HEX uploads to a Teensy.

If it fails in the same way that would tell something - if it just works without failing - that could be a temp workaround and be interesting ...
 
defragster, good idea, but a similar, but better result for me. I built the tycmd without a hiccup, and it seems to be working fine. For the good case with T3.6
Code:
$ tycmd upload ~/TL36_TestOutput_float.ino.hex
       upload@3052850-Teensy  Uploading to board '3052850-Teensy' (Teensy 3.6)
       upload@3052850-Teensy  Triggering board reboot
       upload@3052850-Teensy  Firmware: TL36_TestOutput_float.ino.hex
       upload@3052850-Teensy  Flash usage: 24 kiB (2.3%)
       upload@3052850-Teensy  Uploading... 100%
       upload@3052850-Teensy  Sending reset command
bob@bob-desktop:~$
and for T4.0
Code:
$ tycmd upload ~/TL40_TestOutput_float.ino.hex
       upload@6219740-Teensy  Uploading to board '6219740-Teensy' (Teensy 4.0)
       upload@6219740-Teensy  Triggering board reboot
       upload@6219740-Teensy  Firmware: TL40_TestOutput_float.ino.hex
       upload@6219740-Teensy  Flash usage: 29 kiB (1.4%)
       upload@6219740-Teensy  Uploading... 100%
       upload@6219740-Teensy  Sending reset command
       upload@6219740-Teensy  Failed to reset board '6219740-Teensy'
bob@bob-desktop:~$
The Teensy does not appear on the USB. I recover by using the 15-sec reboot to blinking LED.
But thee does not seem to be a system crash. This could be a big benefit. The syslog dialog for the T4.0 case is
Code:
Aug 14 11:30:49 bob-desktop kernel: [11147.798647] usb 3-10: new high-speed USB device number 19 using xhci_hcd
Aug 14 11:30:49 bob-desktop kernel: [11147.947080] usb 3-10: New USB device found, idVendor=16c0, idProduct=0486, bcdDevice= 2.79
Aug 14 11:30:49 bob-desktop kernel: [11147.947086] usb 3-10: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Aug 14 11:30:49 bob-desktop kernel: [11147.947089] usb 3-10: Product: Teensyduino RawHID
Aug 14 11:30:49 bob-desktop kernel: [11147.947092] usb 3-10: Manufacturer: Teensyduino
Aug 14 11:30:49 bob-desktop kernel: [11147.947095] usb 3-10: SerialNumber: 6219740
Aug 14 11:30:49 bob-desktop kernel: [11147.949955] hid-generic 0003:16C0:0486.000D: hiddev1,hidraw3: USB HID v1.11 Device [Teensyduino Teensyduino RawHID] on usb-0000:00:14.0-10/input0
Aug 14 11:30:49 bob-desktop kernel: [11147.950893] hid-generic 0003:16C0:0486.000E: hidraw4: USB HID v1.11 Device [Teensyduino Teensyduino RawHID] on usb-0000:00:14.0-10/input1
Aug 14 11:30:49 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:16C0:0486.000D
Aug 14 11:30:49 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0
Aug 14 11:30:49 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.1/0003:16C0:0486.000E
Aug 14 11:30:49 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.1
Aug 14 11:30:49 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10
Aug 14 11:30:56 bob-desktop kernel: [11154.412349] usb 3-10: USB disconnect, device number 19
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:16C0:0486.000D
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.1/0003:16C0:0486.000E
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.1
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10
Aug 14 11:30:56 bob-desktop kernel: [11154.827202] usb 3-10: new high-speed USB device number 20 using xhci_hcd
Aug 14 11:30:56 bob-desktop kernel: [11154.975583] usb 3-10: New USB device found, idVendor=16c0, idProduct=0478, bcdDevice= 1.05
Aug 14 11:30:56 bob-desktop kernel: [11154.975587] usb 3-10: New USB device strings: Mfr=0, Product=0, SerialNumber=1
Aug 14 11:30:56 bob-desktop kernel: [11154.975589] usb 3-10: SerialNumber: 00097D96
Aug 14 11:30:56 bob-desktop kernel: [11154.977055] hid-generic 0003:16C0:0478.000F: hidraw3: USB HID v1.11 Device [HID 16c0:0478] on usb-0000:00:14.0-10/input0
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:16C0:0478.000F
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0
Aug 14 11:30:56 bob-desktop upowerd[1059]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10
Aug 14 11:30:57 bob-desktop kernel: [11155.372350] usb 3-10: USB disconnect, device number 20
Aug 14 11:30:57 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:16C0:0478.000F
Aug 14 11:30:57 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0
Aug 14 11:30:57 bob-desktop upowerd[1059]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-10
Aug 14 11:30:57 bob-desktop kernel: [11155.751248] usb 3-10: new high-speed USB device number 21 using xhci_hcd
Aug 14 11:31:02 bob-desktop kernel: [11160.955700] usb 3-10: device descriptor read/64, error -110

This is different. What does this mean? For now, I am going to see how things go, using tycmd to replace the regular loader.
 
@Bob Larkin -
Cool - more info! And maybe a short term workaround - though I always prefer to use the TyCommander tool as it integrates a more robust SerMon - especially with multiple Teensy online.
Does it change on T_4.0 with a different build optimization? ... Change to byte order in HEX and resulting code/layout order etc.
 
Well, that did not go as well as we had hoped. I have been working on adding features to the Audio Vector Network Analyzer, https://forum.pjrc.com/threads/41326-Update-Audio-Vector-Network-Analyzer?highlight=AVNAhttp:// . This was using the tycmd loader discussed above. It also uses only Teensy Audio Library audio objects and T3.6 only.

The tycmd proved to not eliminate oops Linux crashes. They seem to occur randomly, perhaps on the average every 20 or 40 uploads, but widely varying. Rebooting the PC always clears the issue and the same hex file will usually load. The system log says the same old stuff about tying up a CPU for 23 seconds. The one thing this does is to allow me to see that , as expected, the problem never is associated with running the compiler/linker.

It does fit the model of a USB related "race condition" causing an action on the USB port when it is not yet available (in non-expert terms).

In short, tycmd moves the problem around, and the hex file that failed 100% with the Teensy loader and T4 does not for tycmd. So, I am going to follow the path suggested by Nominal Animal of reporting the oops to Linux developers.

But a couple of questions here. First, Paul since your Linux setup is similar to mine, could you test th hex file with the T4? And, For those that are doing reasonably intense development using Linux, who else besides me and Nominal Animal sees issues of this sort. Or, maybe more importantly, doesn't see any issues?

Thanks for the assistance, Bob
 
An Update: I have NOT seen a Linux crash in more than a month. This is only anecdotal, but for my system, I believe that Linux kernel updates have stopped the Oops crashes. In summary:

Last September I pursued the very helpful leads of Nominal Animal. I was able to do a compile and install of the Linux kernel, an experience in itself. I hand entered the proposed fix from post #33 above, recompiled and installed. That, unfortunately did not change the Oops crashes. They were caused by something else.

So, I attempted to get a simple, repeatable set of files, and was not doing well. Sometimes things would start to work, and fail with different messages. I was getting nowhere, it seemed, and so I decided to not file a report, lacking repeatable data.

The crashes continued, on and off, last fall. I have been doing a fair bit of T3 and T4 work, and often I could go a while without problems. Then an Oops crash would occur, or often several in a group. Big files seemed less prone to problems than "example" size. Then a few weeks ago, it occurred to me that I have not seen any Linux failures at all for some while. I was in the middle of a project with many hours of compile-link-loads. I have specifically watched for Linux issues, and nothing. Absolutely no Linux problems.

I have kept Ubuntu up-to-date with several kernel updates. Arduino and Teensyduino, I believe, are unchanged.

That is about it. Thanks to all for the help. Bob
 
This probably won't help, but I should mention the udev rules and Teensy Loader were recently updated for problems which came up with Ubuntu 20.04.2.

Turns out 20.04.2 has a broken GTK2 library. Ubuntu 20.04 and 20.04.1 were fine, and even 20.04.2 works if you started with any earlier version and get to 20.04.2 by updates. The GTK2 problem only happens with a fresh install of 20.04.2.

Starting with 1.54-beta7, we're linking Teensy Loader with GTK3. That probably won't make any difference it you're still on Ubuntu 18.04 where GTK2 works well.

The udev rules were also updated. Again, probably not an issue if using Ubuntu 18.04 and ordinary USB serial. The fixes address issues that came up with non-serial modes and on some slower systems like Jetson Nano.
 
Status
Not open for further replies.
Back
Top