Instability in Teensy 4.0

Thundercat

Well-known member
Hi all, I've spent the last year fine-tuning a Teensy 4.0 with a music fader sketch, and after long months of tweaking, found just the right settings for a balanced system. Faders do not output unwanted MIDI, and a button press is reliable as a rock. Overall, the system has been extremely stable and hardy.

Recently, for unknown reasons, the stability is gone. I'm getting spurious button press detections, and spurious MIDI output when faders are not dragged. In short, the system is an absolute mess, and I have NOT CHANGED THE CODE.

Of course I suspected heat issues, as I had tried overclocking to 816, so I down-clocked back to 600, but the system runs in the 50 - 60 C range at 600 MhZ, and at 600 the instability remains (whereas prior to all this, running at 600 had no instability issues). I need at least 600 speed as there's a screen that has graphics that need updating regularly.

I also suspected maybe the button I was using had gone bad but replacing it did nothing; the faders are new and mounted to a PCB. The Teensy is connected directly to a Mac.

I'm in a bit of despair as this goes in a commercial product and have no idea where to start fixing this. I'm watching my business go down the drain because I absolutely cannot ship a system in this kind of disarray.

The code is 30,000 lines long, no chance of posting, but again, the routines that handle the faders and the button presses have not been touched. I can't tell you how hard I worked to fine-tune all the variables just-so, so that the system is both responsive, and stable.

But now it's not. Suddenly, seemingly, and I can't understand why.

Has anyone run into this kind of issue before? The Teensy is mounted to a PCB, as are all faders and buttons, for reliability, and the air temperature here in England is very, very cold, so again, heat isn't the issue.

Thoughts?

Thanks,

Mike
 
Also, lately I've gotten a few "This error should never happen" messages when uploading the sketch; I'm posting sample output below, just in case there's anything here someone can spot. I'm not posting the whole log - just up to where it starts flashing.

Thanks,

Mike

Arduino: 1.8.19 (Mac OS X), TD: 1.57, Board: "Teensy 4.0, MIDI, 600 MHz, Fast, US English"

Memory Usage on Teensy 4.0:
FLASH: code:360424, data:1628768, headers:8216 free for files:34208
RAM1: variables:49856, code:358624, padding:1824 free for local variables:113984
RAM2: variables:21312 free for malloc/new:502976
Error reading Teensy Loader status! (tpc)This error should never happen (when using Arduino). Please report this to paul@pjrc.com. In Teensy Loader, click Help > Verbose Info, then click Log > Save As and attach the log file to your message. The log info is essential for any hope of figuring out what went wrong here!Error reading Teensy Loader status! (tpc)This error should never happen (when using Arduino). Please report this to paul@pjrc.com. In Teensy Loader, click Help > Verbose Info, then click Log > Save As and attach the log file to your message. The log info is essential for any hope of figuring out what went wrong here!


This report would have more information with
"Show verbose output during compilation"
option enabled in File -> Preferences.



17:52:52.177 (ports 5): Begin, version=1.57
17:52:52.177 (ports 5): USB device add callback
17:52:52.177 (ports 5): loc=14100000, vid=16C0, pid=0485, ver=0279, ser=12054840
17:52:52.177 (ports 5): actual serailnum=1205484
17:52:52.177 (ports 5): name: [no_device] (Teensy) MIDI
17:52:52.179 (ports 5): USB device remove callback
17:52:52.180 (ports 5): Serial add callback
17:52:52.189 (ports 5): Serial remove callback
17:52:52.401 (ports 5): HID Manager started
17:52:52.401 (ports 5): HID add callback, vid=16c0, pid=0485, ver=0279, loc=14100000, use=ffc9:4
17:52:52.401 (ports 5): found prior teensy at this loc, age=0.224
17:52:52.401 (ports 5): name: HID=16c0:0485.ffc9.4 (Teensy 4.0) MIDI
17:54:04.923 (post_compile 1): Begin, version=1.57
17:54:05.211 (loader): Teensy Loader 1.57, begin program
17:54:05.654 (loader): File "/Users/Mikey/Documents/Arduino/Fader_Pro_V4.994/Fader_Pro_V4.994.ino.TEENSY40.ehex", 1997824 bytes, and 5536 loader utility
17:54:05.659 (loader): ehex is valid, public key hash: 6031B41E 303B7C97 9E4D450D B0511B03 2DD5AB90 3A1263AF 8EA5DD73 860FAC41
17:54:05.660 (loader): File "Fader_Pro_V4.994.ino.TEENSY40.ehex". 1997824 bytes
17:54:05.691 (loader): Listening for remote control on port 3149
17:54:05.692 (loader): initialized, showing main window
17:54:05.824 (loader): remote connection 11 opened
17:54:05.825 (post_compile 1): Sending command: comment: Teensyduino 1.57 - MACOSX (teensy_post_compile)
17:54:05.828 (loader): remote cmd from 11: "comment: Teensyduino 1.57 - MACOSX (teensy_post_compile)"
17:54:05.829 (loader): remote cmd from 11: "status"
17:54:05.830 (post_compile 1): Status: 1, 0, 0, 0, 0, 0, /Users/Mikey/Documents/Arduino/Fader_Pro_V4.994/, Fader_Pro_V4.994.ino.TEENSY40.ehex
17:54:05.830 (post_compile 1): Sending command: dir:/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/
17:54:05.830 (loader): remote cmd from 11: "dir:/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/"
17:54:05.831 (post_compile 1): Sending command: file:Fader_Pro_V4.994.ino.hex
17:54:05.832 (loader): remote cmd from 11: "file:Fader_Pro_V4.994.ino.hex"
17:54:06.049 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.hex", 1997824 bytes
17:54:06.304 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.ehex", 1997824 bytes, and 5536 loader utility
17:54:06.309 (loader): ehex is valid, key hash: 6031B41E 303B7C97 9E4D450D B0511B03 2DD5AB90 3A1263AF 8EA5DD73 860FAC41
17:54:06.310 (loader): File "Fader_Pro_V4.994.ino.hex". 1997824 bytes
17:54:06.321 (loader): remote cmd from 11: "status"
17:54:06.323 (post_compile 1): Status: 1, 0, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.323 (post_compile 1): Sending command: auto:eek:n
17:54:06.323 (loader): remote cmd from 11: "auto:eek:n"
17:54:06.324 (post_compile 1): Disconnect
17:54:06.335 (loader): remote connection 11 closed
17:54:06.418 (post_compile 2): Begin, version=1.57
17:54:06.418 (loader): remote connection 3 opened
17:54:06.418 (post_compile 2): Sending command: comment: Teensyduino 1.57 - MACOSX (teensy_post_compile)
17:54:06.419 (loader): remote cmd from 3: "comment: Teensyduino 1.57 - MACOSX (teensy_post_compile)"
17:54:06.419 (loader): remote cmd from 3: "status"
17:54:06.419 (post_compile 2): Status: 1, 1, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.419 (post_compile 2): Disconnect
17:54:06.431 (post_compile 3): Running teensy_reboot: /Applications/Teensyduino.app/Contents/Java/hardware/teensy/../tools/teensy_reboot
17:54:06.431 (loader): remote connection 3 closed
17:54:06.432 (loader): remote connection 3 opened
17:54:06.433 (loader): remote connection 3 closed
17:54:06.446 (reboot 4): Begin, version=1.57
17:54:06.446 (reboot 4): location = usb:14100000
17:54:06.446 (reboot 4): portprotocol = Teensy
17:54:06.446 (reboot 4): portlabel = HID=16c0:0485.ffc9.4 MIDI
17:54:06.446 (reboot 4): Only location usb:14100000 will be tried
17:54:06.446 (reboot 4): USB device add callback
17:54:06.448 (reboot 4): loc=14100000, vid=16C0, pid=0485, ver=0279, ser=12054840
17:54:06.448 (reboot 4): actual serailnum=1205484
17:54:06.450 (loader): remote connection 3 opened
17:54:06.450 (reboot 4): USB device remove callback
17:54:06.450 (reboot 4): Serial add callback
17:54:06.458 (reboot 4): Serial remove callback
17:54:06.618 (reboot 4): HID Manager started
17:54:06.618 (reboot 4): HID add callback, vid=16c0, pid=0485, ver=0279, loc=14100000, use=ffc9:4
17:54:06.618 (reboot 4): usb scan found 1 devices
17:54:06.618 (reboot 4): found Teensy Loader, version 1.57
17:54:06.618 (reboot 4): Sending command: show:arduino_attempt_reboot
17:54:06.619 (loader): remote cmd from 3: "show:arduino_attempt_reboot"
17:54:06.619 (loader): got request to show arduino rebooting message
17:54:06.620 (reboot 4): Sending command: comment: Teensyduino 1.57 - MACOSX (teensy_reboot)
17:54:06.621 (loader): remote cmd from 3: "comment: Teensyduino 1.57 - MACOSX (teensy_reboot)"
17:54:06.621 (loader): remote cmd from 3: "status"
17:54:06.621 (reboot 4): Status: 1, 1, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.621 (reboot 4): hid_send_feature, device opened
17:54:06.622 (loader): remote cmd from 3: "status"
17:54:06.623 (reboot 4): Status: 1, 1, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.623 (reboot 4): status read, retry 0
17:54:06.723 (loader): remote cmd from 3: "status"
17:54:06.724 (reboot 4): Status: 1, 1, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.724 (reboot 4): status read, retry 1
17:54:06.825 (loader): remote cmd from 3: "status"
17:54:06.825 (reboot 4): Status: 1, 1, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.825 (reboot 4): status read, retry 2
17:54:06.905 (ports 5): HID remove callback
17:54:06.905 (ports 5): HID add callback, vid=16c0, pid=0485, ver=0279, loc=14100000
17:54:06.905 (ports 5): USB device remove callback
17:54:06.905 (ports 5): remove, loc=14100000
17:54:06.905 (ports 5): usb_remove: usb:14100000
17:54:06.905 (ports 5): del device: location=14100000
17:54:06.906 (loader): remote connection 11 opened
17:54:06.907 (ports 5): USB device add callback
17:54:06.927 (loader): remote cmd from 3: "status"
17:54:06.927 (reboot 4): Status: 1, 1, 0, 0, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:06.927 (reboot 4): status read, retry 3
17:54:06.945 (loader): HID/macos: attach callback
17:54:07.028 (loader): remote cmd from 3: "status"
17:54:07.028 (loader): HID/macos: number of devices found = 1
17:54:07.028 (loader): HID/macos: vid=1FC9, pid=0135, page=FF00, usage=0001, ver=1.01
17:54:07.029 (loader): Device came online, code_size = 100
17:54:07.029 (loader): Board is: NXP IMXRT1062 ROM
17:54:07.029 (loader): begin operation
17:54:07.245 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.hex", 1997824 bytes
17:54:07.513 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.ehex", 1997824 bytes, and 5536 loader utility
17:54:07.515 (loader): ehex is valid, key hash: 6031B41E 303B7C97 9E4D450D B0511B03 2DD5AB90 3A1263AF 8EA5DD73 860FAC41
17:54:07.516 (loader): File "Fader_Pro_V4.994.ino.hex". 1997824 bytes
17:54:07.516 (loader): set background IMG_ONLINE
17:54:07.517 (reboot 4): Status: 1, 1, 1, 1, 0, 8, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:07.549 (loader): HAB locked secure mode
17:54:07.552 (loader): sending ehex loader utility, 5536 bytes
17:54:07.557 (loader): run it..
17:54:07.672 (ports 5): HID remove callback
17:54:07.672 (ports 5): USB device remove callback
17:54:07.672 (ports 5): remove, loc=14100000
17:54:07.672 (loader): HID/macos: detach callback: is currently open device
17:54:07.699 (loader): ehex loader utility sucessfully started
17:54:07.701 (loader): remote cmd from 3: "status"
17:54:07.701 (reboot 4): Status: 1, 1, 1, 1, 0, 11, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:07.701 (loader): HID/macos: status: disconnected
17:54:07.701 (loader): end operation, total time = 0.672 seconds
17:54:07.703 (loader): redraw timer set, image 80 to show for 2000 ms
17:54:07.757 (loader): remote cmd from 3: "status"
17:54:07.757 (loader): HID/macos: number of devices found = 0
17:54:07.757 (loader): HID/macos: no devices found (empty set)
17:54:07.758 (reboot 4): Status: 1, 1, 0, 1, 0, 0, /var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/, Fader_Pro_V4.994.ino.hex
17:54:07.758 (reboot 4): status read, retry 4
17:54:07.758 (reboot 4): Success
17:54:07.758 (reboot 4): Disconnect
17:54:07.771 (loader): remote connection 3 closed
17:54:07.892 (ports 5): USB device add callback
17:54:07.892 (ports 5): loc=14100000, vid=16C0, pid=0478, ver=0107, ser=001264EC
17:54:07.892 (ports 5): actual serailnum=1205484
17:54:07.892 (ports 5): found prior teensy at this loc, age=0.986
17:54:07.892 (ports 5): name: [no_device] (Teensy 4.0) Bootloader
17:54:07.923 (loader): HID/macos: attach callback
17:54:07.923 (ports 5): HID add callback, vid=16c0, pid=0478, ver=0107, loc=14100000, use=ff9c:24
17:54:07.923 (ports 5): found prior teensy at this loc, age=0.031
17:54:07.923 (ports 5): name: HID=16c0:0478.ff9c.24 (Teensy 4.0) Bootloader
17:54:07.941 (loader): HID/macos: ser=001264EC
17:54:07.941 (loader): try to read feature report
17:54:07.942 (loader): got feature report, size = 384
17:54:07.942 (loader): 7393CD01 6031B41E
17:54:07.942 (loader): encryption is required, public key hash: 6031B41E 303B7C97 9E4D450D B0511B03 2DD5AB90 3A1263AF 8EA5DD73 860FAC41
17:54:07.942 (loader): secure mode is locked: this is Lockable Teensy
17:54:07.942 (loader): Device came online, code_size = 2031616
17:54:07.942 (loader): Board is: Teensy 4.0 (IMXRT1062), version 1.07
17:54:08.180 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.hex", 1997824 bytes
17:54:08.397 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.ehex", 1997824 bytes, and 5536 loader utility
17:54:08.399 (loader): ehex is valid, key hash: 6031B41E 303B7C97 9E4D450D B0511B03 2DD5AB90 3A1263AF 8EA5DD73 860FAC41
17:54:08.399 (loader): File "Fader_Pro_V4.994.ino.hex". 1997824 bytes, 98% used
17:54:08.401 (loader): File "Fader_Pro_V4.994.ino.hex" opened, but "Fader_Pro_V4.994.ino.ehex" will actually be used
17:54:08.419 (loader): set background IMG_ONLINE
17:54:08.647 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.hex", 1997824 bytes
17:54:08.880 (loader): File "/var/folders/kg/h7ksypnd2cs8fm2_7484py640000gn/T/arduino_build_461442/Fader_Pro_V4.994.ino.ehex", 1997824 bytes, and 5536 loader utility
17:54:08.882 (loader): ehex is valid, key hash: 6031B41E 303B7C97 9E4D450D B0511B03 2DD5AB90 3A1263AF 8EA5DD73 860FAC41
17:54:08.882 (loader): File "Fader_Pro_V4.994.ino.hex". 1997824 bytes, 98% used
17:54:08.884 (loader): File "Fader_Pro_V4.994.ino.hex" opened, but "Fader_Pro_V4.994.ino.ehex" will actually be used
17:54:08.905 (loader): elf appears to be for Teensy 4.0 (IMXRT1062) (2031616 bytes)
17:54:08.905 (loader): elf binary data matches hex file
17:54:08.905 (loader): elf file is for Teensy 4.0 (IMXRT1062)
17:54:08.907 (loader): using encrypted ehex (required - secure mode is locked)
17:54:08.924 (loader): begin operation
17:54:08.932 (loader): flash, block=0, bs=1024, auto=1
17:54:08.933 (loader): flash, block=1, bs=1024, auto=1
17:54:08.934 (loader): flash, block=2, bs=1024, auto=1
17:54:14.112 (loader): flash, block=3, bs=1024, auto=1
17:54:14.113 (loader): flash, block=4, bs=1024, auto=1
17:54:14.132 (loader): flash, block=5, bs=1024, auto=1
...
 
Recently, for unknown reasons, the stability is gone. I'm getting spurious button press detections, and spurious MIDI output when faders are not dragged. In short, the system is an absolute mess, and I have NOT CHANGED THE CODE.

Any chance a GND wire has become disconnected somewhere?
 
Thank-you Paul. I'll go over it with a fine-toothed comb. But it's PCB-based, and for that reason too.

It's 4:44am here in England, so I'll follow up tomorrow.

Thanks for the amazing support.

Mike
 
I'm sorry not to have updated - turns out I had to make a new PCB which I will be receiving in a few days. I'll remake the system and post back here with an update then.

Thanks again Paul for the amazing genius products and the fab support.

Mike
 
I'm sorry not to have updated - turns out I had to make a new PCB which I will be receiving in a few days. I'll remake the system and post back here with an update then.

Thanks again Paul for the amazing genius products and the fab support.

Mike

When you get a chance, please let us know what the issues were. It helps the rest of us, to debug stuff in the future. It doesn't matter if it was a blunder, or something wasn't as you expected. Info like this is useful for others, even if it is "don't do what I did". Thanks.
 
When you get a chance, please let us know what the issues were. It helps the rest of us, to debug stuff in the future. It doesn't matter if it was a blunder, or something wasn't as you expected. Info like this is useful for others, even if it is "don't do what I did". Thanks.

100%. I've gained so much by reading this and other forums and I'm aware the value of resolved issues. I'll definitely be posting back, and thanks for the encouragement.

I'm really nervous right now because the sketch I've developed for the past entire year is at stake, and I had gotten rid of all the instabilities and the system has been very solid for months and months. Makes no sense all of a sudden it would be going nuts, without a hardware or software change that would affect stability.

One thing I've also done is in the new PCB boards I've ordered I'm adding 0.1uF capacitors on all analog input pins - but this would not resolve the issue by itself, as I've not used those caps before and the system has been stable.

Thanks clinker8,

Mike
 
By any chance are these hand soldered boards or machine soldered? Either way, might be good to inspect the solder joints. Maybe one or more solder joints developed some cracks? This can happen at connector locations, or any where some flexing occurs.

I put in about 6 months into software development of an electronic lead screw, and know how you feel. Stuff doesn't just go crazy after working a while, unless something changed. Finding that something can be a challenge!
 
On the original question...

I'm getting spurious button press detections, and spurious MIDI output when faders are not dragged. In short, the system is an absolute mess, and I have NOT CHANGED THE CODE.

What if you *do* change the code? If you try uploading the simplest possible program which only reads the pushbuttons (no display or other stuff), maybe just use Serial.print() rather than MIDI and let it run for hours to let the Arduino Serial Monitor collect any info, are the inputs stable?

At least you could try to get some idea whether the problem is hardware or otherwise electrical interference, or something happening on the software side.
 
Thanks Paul and clinker8. I appreciate your thoughts.

Paul, it's a great idea, and I'll take your suggestion; I'll be able to proceed in a few days when I get updated PCB boards.

The core guts of it is just an ISR button interrupt that is as simple as it gets, actually taken right off one of your pages if memory serves.

Clinker8, I feel you, and I hope your electronic lead screw project is working.

Thanks to both of you, I'll follow up and report back.

Best,

Mike
 
Recently, for unknown reasons, the stability is gone. I'm getting spurious button press detections, and spurious MIDI output when faders are not dragged. In short, the system is an absolute mess, and I have NOT CHANGED THE CODE.

If the code hasn't changed it could well be EMI - something in the local electro-magnetic environment has changed and your circuitry is not robust to EMI. Or as others mentioned a loose connection has developed, probably ground-related, or even a ground-loop has been created somehow.
 
If the code hasn't changed it could well be EMI - something in the local electro-magnetic environment has changed and your circuitry is not robust to EMI. Or as others mentioned a loose connection has developed, probably ground-related, or even a ground-loop has been created somehow.

Hi Mark, thanks for the message. I agree in theory that there *must* have been some kind of ground loop or "loose wire," but it's a PCB, and one that had been working well for some time prior. I tried reflowing all solder joints, replacing a main switch, and then erroneously trying to adjust the code (wrong move, as the code was essentially the same).

Shawn; said:
Are the button inputs being debounced? You mentioned that you read them via an ISR.

Hi Shawn, yes, I developed my own debouncing routine many months ago that works very well.

As far as the situation now, I have good news/bad news.

The Good News:

I ordered new PCB boards, slightly updated (component connectors moved due to being in the way of other things). I also added 0.1uF capacitors from wiper to ground on all potentiometer input pins, placing the caps close to the input pins. I did not do this to solve the current issue, as I've stated, I haven't needed them for the last year and the system had been working well without them, but I added them just to add some stability in the hardware. Some folks have strong opinions on these caps in the Arduino forum; one well-known member suggests omitting these caps and just using code to handle all inputs and filter out the noise (which I was able to do successfully, but not quite as well as I would have liked).

So, other than moving some connectors around and adding the filtering caps, PCB's are the same.

Code is also the same as when I originally posted, aside some small bug fixes unrelated to current issue.

The result is now a rock-solid system, absolutely no jitter, and all the nightmare issues from before, gone. In fact it's never been this steady, and responsive, at the same time. It's a dream I've wanted to get to this point for literally a year. Many of my software attempts to remove jitter yet retain responsiveness did not work as well as I would have liked. I am using the AnalogResponsiveRead library of course.

And it works completely open outside the enclosure, so EMI does not seem to be a factor (thanks for the suggestion though Mark and Paul!).

The Bad News:

While I'm extremely happy about this, from a knowledge standpoint, I am not happy, because I still do not know why the previous board went haywire. The code is the same, apart from some minor bug fixes completely unrelated to the issue. And the previous PCB had been working well for some time, so I cannot attribute the instability to the PCB design or layout. Also no new electronics got turned on or added into the environment to add EMI to the mix.

I can only think that perhaps a solder joint went bad (although I reflowed all solder joints to no avail), or perhaps some static electricity damaged the Teensy I was using, as I had it outside the enclosure much of the time.

In any case, I have mixed feelings reporting back to you all - because I do not feel I've done due diligence to really track down what went wrong, and this is a commercial product so it's important. On the other hand, I don't know how to figure out what went wrong, as all previous attempts to do this failed.

So I just want to thank you all for your kind attention, and I'm so grateful for each one of you and your time. Thanks so much for your practical suggestions and help; the Teensy platform is absolutely amazing, and I'm so happy to be part of your community.

If anything else comes to mind, I'll be sure to report back here. I know that *something" happened with the previous system, and I'm very disappointed not to have been able to track it down, both for the practical reason that I don't want it to happen again or in the field, and also for sharing what went wrong and how to fix it.

Again, thanks everyone.

All my best,

Mike Phillips
 
If you wanted to rule out the Teensy, you could put it into your latest PCB and test to see if things are still stable.
 
If you wanted to rule out the Teensy, you could put it into your latest PCB and test to see if things are still stable.

Fair enough, but I don't want to risk another Teensy at this point; I'm in England and it costs a fortune to get even one here due to VAT and customs duties through the roof (I only use the lockable ones, which are not available anywhere except for PJRC directly).

If instabilities appear again I will consider it.

Thanks, that was a good suggestion.

Mike
 
While I'm extremely happy about this, from a knowledge standpoint, I am not happy, because I still do not know why the previous board went haywire. The code is the same, apart from some minor bug fixes completely unrelated to the issue. And the previous PCB had been working well for some time, so I cannot attribute the instability to the PCB design or layout. Also no new electronics got turned on or added into the environment to add EMI to the mix.

Could you quantify this? Like literally, what is the quantity of not-stable boards and quantity total boards?

You talked about this being a commercial product. How many have been manufactured? Are they all experiencing problems, or only some percentage of them, or just one unit?

Or is the quantity literally only 1, as in just 1 early prototype (not even a 2nd copy made) intended to become a future commercial product? If this really is a situation of exactly 1 unique prototype, the only existing build in the whole world, mysteriously developing a problem and you're concerned about it becoming an issue in future mass produced products, perhaps now is the time to build some reasonable number of prototypes, like 5 to 25 pieces, and put them into test usage. You'll probably have much more confidence in results that way than continuing testing with only 1 unit.
 
I'm confused, I meant use the one that had been using when you experienced problems.

Hi BriComp, no I understand what you meant, exactly. What I'm saying is I would need to dismantle the working unit and put in the old Teensy, and I'm concerned this would damage the current working Teensy as it would take desoldering and then desoldering again, which I've had poor results with even though I have a decent desoldering unit. Even though I'm careful it tends to rip out the pads in the Teensy, which happened with the old Teensy that I had ended up removing and resoldering. I should say I think the old Teensy is just done at this point, due to this issue.
 
Could you quantify this? Like literally, what is the quantity of not-stable boards and quantity total boards?

You talked about this being a commercial product. How many have been manufactured? Are they all experiencing problems, or only some percentage of them, or just one unit?

Or is the quantity literally only 1, as in just 1 early prototype (not even a 2nd copy made) intended to become a future commercial product? If this really is a situation of exactly 1 unique prototype, the only existing build in the whole world, mysteriously developing a problem and you're concerned about it becoming an issue in future mass produced products, perhaps now is the time to build some reasonable number of prototypes, like 5 to 25 pieces, and put them into test usage. You'll probably have much more confidence in results that way than continuing testing with only 1 unit.

Hi Paul, I've been selling units for over a year, successfully. But I also continually innovate and update the design and software, so each iteration is yes a new prototype, but not completely new as it's building on previous successful units. I'm always pushing the bar, to add new features and make the units even better, every version.

Early units were all point to point wiring, and recently in the last few months I moved to PCB-based design for higher reliability and ease of updating (old design with point to point wiring was not easy to access the program button for field updates; new PCB-based design has a breakout button accessible without opening the unit).

I agree with you one unit alone would not be a good idea to send out into the world, yet at this point there's been 22 units shipped out and working well. One unit developed an issue with a fader only being detected moving in a small range, instead of 0 - 127, only 0 - 10 or so, and I haven't gotten the unit back yet for repair, but I'm fairly certain it's a solder joint issue as this problem happened for me when building units when the solder joint wasn't 100%. Which is why I moved to the PCB design; point to point is a challenge (for me) to do soldering well, but PCBs are easy for me.

I appreciate your thoughts and time.
 
Hi BriComp, no I understand what you meant, exactly. What I'm saying is I would need to dismantle the working unit and put in the old Teensy, and I'm concerned this would damage the current working Teensy as it would take desoldering and then desoldering again, which I've had poor results with even though I have a decent desoldering unit. Even though I'm careful it tends to rip out the pads in the Teensy, which happened with the old Teensy that I had ended up removing and resoldering. I should say I think the old Teensy is just done at this point, due to this issue.

I see. I never use Teensy's soldered in. I use turned pin low profile plugs and sockets like this and this.
 
I see. I never use Teensy's soldered in. I use turned pin low profile plugs and sockets like this and this.

Thank-you BriComp for the info and the links.

So you solder the sockets into the Teensy and the board of course, and then plug in the Teensy.

I have a bunch of those I ordered recently because I considered exactly this. My one (perhaps unwarranted) concern is all the ones I found have gold flashing on them, and usually one part is tin and one part is gold. Usually the gold flashing is only 15um, which is incredibly thin, and can wear off with just one insertion. And gold/tin connectors are notorious for having issues over time. I used to work in an electronics store (Radio Shack lol), and the boss always pushed us selling the gold connectors, but in researching them, it's just far better to have tin on tin, and not tin/gold.

In any case, again maybe I'm overthinking, but I'd rather the reliability of a solder joint for units sent out into the world. If I can find sockets that are all tin both parts I am willing to experiment; in your links it looks like one is gold and one is tin.

Thanks,

Mike
 
Phantom problems are a PITA

These kinds of intermittent problems are maddening, are they not? I've been in the same boat more than once over the years. Agreed - until you can isolate and reproduce the problem it could rear its head and bite you later. Others here have made great suggestions. Here are mine.

I always have to remind myself to ask "what is different"? Clearly something. Code changes which shouldn't affect tthe issue at hand are still something different. Your "you should never see this error message" is great. We do the same thing! Amazing how many times we see that in development when something really goes into the weeds.
0. For any production system we are shipping, we have a test setup running 24/7, going through some stressing of I/O, etc. Currentlly we have some running three+ years and logging any errors. In development we always have extra systems running tests 24/7 in addition to the ones we use for daily development.
1. The first thing I would check is the power supply. Even if it is a 'known good' one. Ask me how I know this. Use a DVM when its under load to confirm voltage is stable. Also put a scope on it, triggering in AC mode at high gain looking for ripple or anomalies. In years of customer support, 95% of problems are the power supply provided by the customer. That may be hard to believe but it's true.
2. Run a scope across all the signals, esp the ones acting up - like your pushbutton. Look for marginal voltage swings, bursts of noise, etc. Confirm power and ground/comon are really what they should be. Adding the scope probe changes things too! DMM leads more so.
3. Run a simple test say just reading the pushbutton (which should not detect anything when you're not pressing it) and log and outptut a message, turn LED on, etc for any phantom actuations. You can use the LED pin to trigger a scope and look back on the trace.
4. Clean the board with spray electronics cleaner or Alconox or other approved electronic cleaner - not dish soap - rinse with isopropanol or distilled water. Dry with a heat gun on low (I keep my hand in the blast so I know it''s not too hot) to be sure its really dry. Repeat the above tests.
5. Inspect under a stereo microscope esp if any SMT devices. You can have a bad solder (or missing solder) joint that makes contact for a long time, until it doesn't.
6. We maintain our own libraries in Github and same for compiler versions so that we can rebuild even years later and know that no changes or 'updates' are in those. Keep the same build settings. Eliminate unintended differences.
7. We never overclock. It's that 24/7 demand. We often derate the clock a bit for the same reason. All our designs use worst case (at least to three-sigma points) design values and are designed to work over wide temp range.
8. Try heating and cooling with heat gun and chilling spray to see if that invokes failures.
9. We work on antistatic mats with a wrist strap or other precautions. Static hits can 'wound' a part and set up incipient failure.
10. Try touching various (low voltage) signals with a finger, if dry, mine are about 500 Kohms-ish to ground and shouldn't invoke failure unless it's a high impedance circuit.
11. Many other things depending on findings so far. Hang in there!
12. Get another brain to look over what you did. Multiple heads usually are better than one.

We had a run of boards (allegedly 100% tested in an ISO qualified facility) made overseas by our assembly firm years ago that believe it or not had some bad vias. We sectioned some boards to prove this. It was 16 bad boards out of something like 250. That number was suspicious; it turned out to be one panel that we surmise had plating issues. That was the last time we let them choose the board vendor, now we use one local to us with reliable QC. We alwys have a watchdog timer as a last resort and log that kind of startup so we know if/when the dog barked.

Our industrial systems are expected to run 24/7 for ten years and many exceed that. We expect execution to be deterministic and often it must be (running 450 HP hydraulic pumps at 10,000 psi for one example). Errors in that system which might suddenly close a servo valve could be ctastrophic to the equipment and people near it. We never use malloc/free so we can never get memory fragmentation issues, it's all statically allocated; some buffers get used in multiple places so we aren't super wasteful of RAM. We have a robust exception handling framework that logs to a uSD card as well as outputing serial (usually no debug device is connected so the uSD is a lifesaver). You can't test in quality - testing can only verify intended operation.
 
Last edited:
These kinds of intermittent problems are maddening, are they not? I've been in the same boat more than once over the years. Agreed - until you can isolate and reproduce the problem it could rear its head and bite you later. Others here have made great suggestions. Here are mine.

I always have to remind myself to ask "what is different"? Clearly something. Code changes which shouldn't affect tthe issue at hand are still something different. Your "you should never see this error message" is great. We do the same thing! Amazing how many times we see that in development when something really goes into the weeds.
0. For any production system we are shipping, we have a test setup running 24/7, going through some stressing of I/O, etc. Currentlly we have some running three+ years and logging any errors. In development we always have extra systems running tests 24/7 in addition to the ones we use for daily development.
1. The first thing I would check is the power supply. Even if it is a 'known good' one. Ask me how I know this. Use a DVM when its under load to confirm voltage is stable. Also put a scope on it, triggering in AC mode at high gain looking for ripple or anomalies. In years of customer support, 95% of problems are the power supply provided by the customer. That may be hard to believe but it's true.
2. Run a scope across all the signals, esp the ones acting up - like your pushbutton. Look for marginal voltage swings, bursts of noise, etc. Confirm power and ground/comon are really what they should be. Adding the scope probe changes things too! DMM leads more so.
3. Run a simple test say just reading the pushbutton (which should not detect anything when you're not pressing it) and log and outptut a message, turn LED on, etc for any phantom actuations. You can use the LED pin to trigger a scope and look back on the trace.
4. Clean the board with spray electronics cleaner or Alconox or other approved electronic cleaner - not dish soap - rinse with isopropanol or distilled water. Dry with a heat gun on low (I keep my hand in the blast so I know it''s not too hot) to be sure its really dry. Repeat the above tests.
5. Inspect under a stereo microscope esp if any SMT devices. You can have a bad solder (or missing solder) joint that makes contact for a long time, until it doesn't.
6. We maintain our own libraries in Github and same for compiler versions so that we can rebuild even years later and know that no changes or 'updates' are in those. Keep the same build settings. Eliminate unintended differences.
7. We never overclock. It's that 24/7 demand. We often derate the clock a bit for the same reason. All our designs use worst case (at least to three-sigma points) design values and are designed to work over wide temp range.
8. Try heating and cooling with heat gun and chilling spray to see if that invokes failures.
9. We work on antistatic mats with a wrist strap or other precautions. Static hits can 'wound' a part and set up incipient failure.
10. Try touching various (low voltage) signals with a finger, if dry, mine are about 500 Kohms-ish to ground and shouldn't invoke failure unless it's a high impedance circuit.
11. Many other things depending on findings so far. Hang in there!
12. Get another brain to look over what you did. Multiple heads usually are better than one.

We had a run of boards (allegedly 100% tested in an ISO qualified facility) made overseas by our assembly firm years ago that believe it or not had some bad vias. We sectioned some boards to prove this. It was 16 bad boards out of something like 250. That number was suspicious; it turned out to be one panel that we surmise had plating issues. That was the last time we let them choose the board vendor, now we use one local to us with reliable QC.

Our industrial systems are expected to run 24/7 for ten years and many exceed that. We expect execution to be deterministic and often it must be (running 450 HP hydraulic pumps at 10,000 psi for one example). Errors in that system which might suddenly close a servo valve could be ctastrophic to the equipment and people near it. We never use malloc/free so we can never get memory fragmentation issues, it's all statically allocated; some buffers get used in multiple places so we aren't super wasteful of RAM. We have a robust exception handling framework that logs to a uSD card as well as outputing serial (usually no debug device is connected so the uSD is a lifesaver). You can't test in quality - testing can only verify intended operation.

Just, wow! Now I''m exceedingly embarrassed. I'm just one little guy who dreamed up a product variation last year and started selling them. I don't have a scope here (have used in past), or a microscope, but your suggestions are beyond excellent.

I was thinking the same thing about overclocking; not worth any risk and from prior reading, every 10ºC rise in temp halves component life expectancy.

I appreciate your mind-blowing in-depth reply, and I'll take your suggestions on board when/if I can. At some point when the company is bigger than moi, I will hire some programmers and engineers to look things over and improve my likely clunky code and implementations.

Thank-you bboyes, from the bottom of my heart.

Mike
 
Thank-you BriComp for the info and the links.

So you solder the sockets into the Teensy and the board of course, and then plug in the Teensy.

I have a bunch of those I ordered recently because I considered exactly this. My one (perhaps unwarranted) concern is all the ones I found have gold flashing on them, and usually one part is tin and one part is gold. Usually the gold flashing is only 15um, which is incredibly thin, and can wear off with just one insertion. And gold/tin connectors are notorious for having issues over time. I used to work in an electronics store (Radio Shack lol), and the boss always pushed us selling the gold connectors, but in researching them, it's just far better to have tin on tin, and not tin/gold.

In any case, again maybe I'm overthinking, but I'd rather the reliability of a solder joint for units sent out into the world. If I can find sockets that are all tin both parts I am willing to experiment; in your links it looks like one is gold and one is tin.

Thanks,

Mike

Yes it does say Gold plated on the Male-Male parts. I must say that I had not noticed before.
The advantage of the turned pin sockets beside low insertion and extraction forces is the fact that they are very low profile.
So the Teensy is not way up in the air!
 
Fair enough, but I don't want to risk another Teensy at this point; I'm in England and it costs a fortune to get even one here due to VAT and customs duties through the roof (I only use the lockable ones, which are not available anywhere except for PJRC directly).

If instabilities appear again I will consider it.

Thanks, that was a good suggestion.

Mike

If buying in the US and shipping to the UK you might consider MyUs.com.
Register with them and you get a US address.
I.e. buy from PJRC giving your US address. Items get shipped to MyUs who then ship them on to you in the UK.
When the ship they charge UK VAT on top of shipping charges.
I would suggest that you get a quote from them for a typical parcel and value to the UK.

I have used them in the past, but just after Brexit which gave all sort of shipping problems.
Although my item from the US was 60% of the UK price, it took a very long time to get here.
Now shipping has been sorted so there should not be that problem.
 
Back
Top