Permanent ADC damage caused by running this code

Status
Not open for further replies.
I take a brand new Teensy 3.6 out of the package, and connect it to the computer with a short USB cable, and attach one of the analog pins and digital ground to a benchtop signal generator (photo). After running the below code for about 1 week, the ADC will be permanently damaged. Power cycling and re-programming do not fix the problem. The damage manifests itself as "missing codes" in the ADC digital output. A smoothly changing waveform will show as a blocky discontinuous mess (see photos). I've killed over a half a dozen T3.6's trying to figure out what was going on, and have also posted to this thread: https://community.nxp.com/t5/Kineti...DC-damage-despite-input-protection/m-p/453411 I was fairly certain that input protection was the problem, but this test with a signal generator seems to prove otherwise. This ADC failure is shared for all pins across adc0. adc1 still appears to work normally. Does anyone have any idea what could cause such a failure?

The code looks weird because it has been extracted from a much larger program. The purpose of sending USB in chunks is to fill 64-byte packets instead of dribbling it out.

Code:
#include <ADC.h>

#define PIN_IMS_SIG       A8  //  Analog input
#define TX_BUFFER_LEN 750

static int txB_wptr = 1;
static int txB_rptr = 1;
static byte txBuffer[TX_BUFFER_LEN];
IntervalTimer startIMSScan;
IntervalTimer XferDataUSB;
static int samplecount = 0;
static ADC *adc = new ADC(); // adc object

void setup() {
    Serial.begin(9600);
    startIMSScan.begin(startIMSISR, 42000);  // Scan interval in microseconds
    XferDataUSB.begin(XferDataISR, 3000);  //Send and receive data via USB every 3ms  If data is collected at 2 bytes/16us, buffer needs to be >375 bytes
    XferDataUSB.priority(130);
}


void loop() {
 delay(10);
}


void addByteToTx(uint8_t inbyte)
{
  __disable_irq();
  txBuffer[txB_wptr] =  inbyte;  
  txB_wptr = (txB_wptr == TX_BUFFER_LEN) ? 0 : txB_wptr + 1;
  __enable_irq();
}


void process_USB_IO(void)
{
while(txB_wptr != txB_rptr)
  {
    __disable_irq();
    Serial.write(txBuffer[txB_rptr]);
    txB_rptr = (txB_rptr == TX_BUFFER_LEN) ? 0 : txB_rptr + 1;
    __enable_irq();    
  }

//Process incoming USB data (removed from this test code)
}


void XferDataISR()
{
process_USB_IO();
}


void startIMSISR()
{ 
  addByteToTx(0xFF);  //Start of packet
  addByteToTx(0xFF); 

  uint16_t tempint =  0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);
  
  tempint =  0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);
 
  tempint = 0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);

  tempint = 0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);

  tempint = 0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);

  uint32_t temp32 = 0x00;
  addByteToTx(0x00);
  addByteToTx((temp32 >> 16) & 0xFF);
  addByteToTx((temp32 >> 8) & 0xFF);
  addByteToTx(temp32 & 0xFF); 
 
 
  __disable_irq();
//  bn_ctrl(BN_OPEN);
  adc_start_continuous();
  delayMicroseconds(50);
 // bn_ctrl(BN_CLOSE);
  __enable_irq();

}

void adc_start_continuous(void)
{
  samplecount = 0;
  adc->adc0->setReference(ADC_REFERENCE::REF_EXT); 
  adc->adc0->setAveraging(4); 
  adc->adc0->setResolution(16); 
  adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED_16BITS);   //was MED_SPEED
  adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED); // change the sampling speed 
  
  adc->adc0->enableInterrupts(adc0_isr);
  NVIC_SET_PRIORITY(IRQ_ADC0, 8);
  adc->adc0->startContinuous(PIN_IMS_SIG);
}


void adc0_isr(void) {
    int tmpval =  (uint16_t) adc->adc0->analogReadContinuous()   ;  
   // tmpval = samplecount*50;  
   if (tmpval > 0xFFF0)
    {
      tmpval = 0xFFF0;
    }
    __disable_irq();  
    addByteToTx((tmpval >> 8) & 0xFF);   
    addByteToTx(tmpval & 0xFF);
    __enable_irq();

    if (samplecount++ > 1498)
      {
        adc->adc0->stopContinuous();
      }
 
}



the setup.jpgfailed ADC signal.jpggood ADC signal.jpg
 
Thanks for your quick suggestion -- I was not aware of INPUT_DISABLE. It seemed to have no effect in this case, though. I should add that the above code works perfectly fine for a week, then fails spontaneously, completely on its own! This problem has me perplexed!
 
Do you have a plot of ADC counts versus input voltage? Patterns in that might suggest where the problem lies.
 
A few K resistor is often helpful for protecting inputs.

I would say its absolutely essential to add protection if connecting direct to an external piece of equipment - most signal generators
are capable of +/-15V or so, and modern ones without a physical pot as output attenuator may output startup and power-down
transients capable of blasting CMOS logic out of the water. An external attenuation pad would be one way to gain this security,
plus a schottky diode + resistor protection circuit.

This is very unlikely to be anything to do with software.
 
Are your T4, the laptop, and signal generator connected continuously while you collect data? If so, I suppose your laptop is connected to its charger while collecting. If that is the case, make sure that there is a really solid ground connection between the laptop and the signal generator and Teensy. Many laptop chargers do not have an actual ground connection on the low-voltage DC end. I have seen laptops where the nominal ground on the USB port actually has 60V of AC (at low current) if there is a bad ground connection on the device to which the USB is connected. You should also make sure that the laptop and signal generator are plugged in to an outlet strip with good transient protection.

Did you try the "Input disable" setting on a new T3.6? It may be too late for that to help if parts of the input circuitry have already been damaged.

In the interest of eliminating a possible source of damage, you could eliminate the signal generator by programming the DAC on the T3.6 to generate a sine wave with an interval timer, and connect that to the ADC input. If you get no failure then, it could be ground spikes from the signal generator.

Even better would be to eliminate the computer connection, and have the teensy write the data to SD. You can then power the T3.6 from either a battery or good bench supply.

I've run solar data collectors and GPS loggers for many hundreds of hours and not seen problems like this. I suspect some 60Hz leakage is happening and getting in to your system when there are local AC transients.
 
Thanks for all of the suggestions. The plot of input voltage to ADC counts looks like this (photo attached) when it is in the failed condition. The voltage waveform being supplied is a clean sawtooth. In my original circuit, I have a low-pass filter connected to the analog input pin, consisting of a series resistor, capacitor to ground (4.7k / 1n), and I later added schottky clamping diodes to both ground and vcc, then later schottky diodes to +2.5V and AGND, and connected AGND to DGND to make sure everything was tied together (I know that there is an inductor onboard the T3.6). The failures still occurred in that setup with the same average time-to-failure, same failure mode, etc.

The setup with the signal generator was not touched -- no power ups or disconnections or physical disturbance in the slightest. The signal ground from the generator was connected to digital GND on the T3.6, which is connected to USB GND and the shield on the T3.6. The signal generator shows continuity between the ground prong on the line plug and its signal output ground. The laptop appears to be floating relative to line ground. Additionally, I've experienced this same failure mode with a completely different computer (a desktop) and no external connection to a signal generator, so I don't think line noise or ground loops are the root cause.

I think the DAC->ADC loop would be an interesting test that would eliminate all external sources. I'll give it a shot.

I've used microcontrollers for many years, and have also not experienced a problem like this. My circuit design evolved, and I'm now using an external 24-bit ADC for other reasons, but this problem is intriguing and I'm still spending spare cycles trying to get to the bottom of it.




ADC fail.jpg
 
I finally got around to taking a closer look at the code that you provided in Post #1 and a few things caught my notice:

1. You have been very liberal in sprinkling __disable_irq() and __enable_irq() throughout your code. I'm not sure that it is necessary to do that when you have set the ADC interrupt to very high priority (8). I don't think anything else in the program has a high enough priority to interrupt that ISR except the system tick interrupt.

2. You have a Serial.write() call just after disabling interrupts and it is called from inside the XferDataISR(). I have an uneasy feeling about this as I'm reluctant to call Serial.Write from an ISR---especially when IRQs are blocked, as the USB serial handler may need an end-of-dma-transfer interrupt to complete properly.

3. Your txB_wptr and txB_rptr values are not declared as volatile---although they are called from different IRQ handlers. That seems like trouble waiting to happen.

4. If, as your comments suggest, you are collecting two bytes of data every 16 microseconds, disabling interrupts for more than 16 microseconds will cause missed ADC readings. That is one of the things that bothers me about disabling interrupts for the Serial.write() called from the aXferDataISR.

In your plot of the sawtooth in post #8, do the benches always occur at the same voltages? Or do they occur at different voltages after you restart the program? It looks to me that they occur a regular time intervals. If that is the case, it might be an interaction between the ADC collection ISR and having the Serial.write when the interrupts are disabled and that, at some point Serial.write is taking extra long and blocking the ISR from the ADC.
 
I should add that I've seen lots of glitches in data collection on both Naval and oceanographic research vessels. During one cruise we found a lot of glitches associated with starting and stopping a large electric winch. Another persistent glitch occurred at somewhere between 2AM and 3AM just about everyday. It turns out that one occurred when the engine room switched from motor generator 1 to motor generator 2 to even out the running hours and allow for maintenance. We solved most of those problems by putting an uninterruptible power supply with good filtering between our instruments and the ship's power plugs.
 
Thanks again. Yes, I agree that I have too many __disable_irq()'s I think I originally had these before changing the ADC irq priority to be 8. The program has no known glitches when it is running for the week before the failure mode occurs, so I didn't worry too much about it, and then the design changed, and the old code was never cleaned up. Also, good to know that Serial.write may need interrupts to be enabled in order to catch the end of DMA, etc. Also, you are right -- I should have declared those pointers to be volatile! My ring buffer has no overflow protection, etc.

The missing codes in the ADC digital output occur at the same exact voltage, not at the same point in time. It's almost as if one of the comparators in the ADC has failed, but knocking out one specific bit cannot explain the failure pattern -- there is interaction between bad bits. When I download new code to the T3.6 that consists only of Serial.println(analogRead(xx)); delay(xx); in a loop, the missing codes are still present at the exact same voltage levels when I sweep the ADC input voltage.

I'm going to run the DAC->ADC experiment with the mentioned improvements to the code, and we'll see what happens.
 
I just noticed that in your ADC initialization code you are using
Code:
adc->adc0->setReference(ADC_REFERENCE::REF_EXT);
What is the source of your external reference? Any glitches in that reference are going to show up in your ADC values.

I think you may be right about certain bits having messed up comparators. For many of the bits, the capacitances and charges transferred are very small and even minor changes in the resistances of the multiplexers connecting the various reference capacitances to the input capacitor could mess up the results on whatever bit was being compared at the time Zeus zapped your system with his lightning bolt.

It would be interesting to look at the hex or binary representations of the collected data. It might be possible to track the errors down to one or two stuck bits.
 
I have done an extensive amount of testing, and have come to the surprising conclusion that the failure is indeed caused by the code -- not by an external voltage spike. I know, it sounds ridiculous, I thought so too, but I've had an isolated test setup that has been left to run undisturbed for months while I changed just one variable at a time (used internal DAC as the voltage source, used external voltage source with different current-limiting resistor values, changed just one line of code, etc). The result is that running analogReadContinous() as the first line of the ISR ( adc0_isr) will cause the failure after about 5-7 days of continuous running. Adding code between the start of the ISR and analogReadContinuous allows the system to work normally.


Uncommenting the two lines above analogReadContinous will cause ADC damage after 5-7 days of continuous running, even with nothing physically connected to the Teensy. I had used these two lines in a previous test when using the internal DAC as an analog signal source.

I've searched through the ADC github https://github.com/pedvide/ADC , and cannot find the actual code for analogReadContinuous -- it just appears to be circular redefinitions at my level of understanding. If anyone can help me understand where that function is actually defined, I'd appreciate it (even for value outside of this weird ADC failure bug). Thanks for reading and commenting, everyone.

Code:
void adc0_isr(void) {
 
    
 //analogWrite(A21, (int)((sin(curphase))*2048.0 + 2048.0));
 //curphase = (curphase >= 6.28) ? 0 : curphase + 0.02;
 
    int tmpval =  (uint16_t) adc->adc0->analogReadContinuous()   ;  
   // tmpval = samplecount*50;  
   if (tmpval > 0xFFF0)
    {
      tmpval = 0xFFF0;
    } 
    addByteToTx((tmpval >> 8) & 0xFF);   
    addByteToTx(tmpval & 0xFF);

    if (samplecount++ > 1498)
      {
        adc->adc0->stopContinuous();
      }
}
 
ADC_Module.h, line 502.
Code:
    int analogReadContinuous() __attribute__((always_inline))
    {
#ifdef ADC_TEENSY_4
        return (int16_t)(int32_t)adc_regs.R0;
#else
        return (int16_t)(int32_t)adc_regs.RA;
#endif
    }
 
A few experiments might prove interesting-----if somewhat costly in Teensies and time:

1. Does the timing of the damage change if you change the sampling speed and conversion speed?
2. Does the timing of the damage change if you reduce the CPU clock rate?
3. Does the timing of the damage change if you switch from continuous running to collecting at a high rate with the ADC timer?
4. Does the timing of the damage change with changes in ADC reference, averaging or input pin?

Those questions could add up to quite a large test matrix and by the time you get through that many tests, you might have to buy your T3.6's wholesale to make sure you can finish the tests before the T3.6 becomes obsolete :rolleyes:
 
Since analogReadContinuous is just a fancy way to read the ADC result register, it can't be the cause of your trouble. The ADC will keep on performing conversions in continuous mode until you stop it even if you don't read the results.

The actual sample rate and ADC clock speed is obfuscated by the library so I have no idea what they really are. If they are running at a high rate, use the DMAC to collect the data.
 
power monitor

do you have any way of ac line power monitoring for dropouts and glitches?

a) if you are having a line voltage drop out every 5-7 days and if your sig gen starts up with a nasty glitch
when ac is restored (as opposed to turning it on) that might be the problem. you can test this w a good
scope (storage or digital) and manually unplugging and re-plugging the line cord on the gen
b) could you be getting a large spike or glitch on the ac line about once every 5-7 days? the only way to test
this without a power line monitor would be to have a surge protector followed by a battery charger feeding
a very low impedance battery which then runs a high quality inverter which then runs the Teensy power
source and the generator and see if you can go WAY longer than the 5-7 days
c) have you checked the cal of your scope and scope probe against a good reference? if not maybe something
is amiss there and you are actually over-driving the Teensy?
 
Perhaps during one cycle the line becomes a digital write one and fight with the signal generator (that have a 50 ohm output).
Try to disconnect to the signal generator, place a 1khm load and a scope in Normal or one shoot to look if the voltage goes over 0V during reading
 
Thanks everyone for reading along and for your suggestions. Here's one last update: I did a test with a brand new Teensy 3.6 and nothing physically connected to it (photo) other than the USB. Simply a 4K7 resistor between the ADC pin and the DAC pin. Run the below code, and the ADC will be permanently damaged after several days of continuous running as long as a PC program pulls data from the serial port continuously. The chip was not handled during the test, nor exposed to nearby high voltage, microwaves, gamma radiation, etc ;) The damage is likely caused by a race condition or unexpected problem of nested interrupts combined with how data are buffered and sent via USB. My loop buffer is not the best implementation, and it may cause problems in edge cases, but I never suspected anything could cause hardware damage! This is definitely one of the most bizarre bugs that I've encountered in my career, and luckily the project design went in a different direction anyway, and I didn't need to solve this. I only continued blowing up T3.6's to satisfy my curiosity, which was only partially successful, since I still never found the exact smoking gun.

T36.jpg


Code:
#include <ADC.h>
#define PIN_IMS_SIG   A8  
#define TX_BUFFER_LEN 750

volatile static int txB_wptr = 1;
volatile static int txB_rptr = 1;
static byte txBuffer[TX_BUFFER_LEN];
IntervalTimer startIMSScan;
IntervalTimer XferDataUSB;
static int samplecount = 0;
static ADC *adc = new ADC();
static float curphase;

   
void setup() 
 {
    Serial.begin(9600);
    startIMSScan.begin(startIMSISR, 42000);
    XferDataUSB.begin(XferDataISR, 3000);
    XferDataUSB.priority(130);
    analogWriteResolution(12);
 }


void loop() 
 {
  delayMicroseconds(10);
  analogWrite(A21, (int)((sin(curphase))*2048.0 + 2048.0));
  curphase = (curphase >= 6.28) ? 0 : curphase + 0.02; 
 }


void addByteToTx(uint8_t inbyte)
 {
   __disable_irq();
   txBuffer[txB_wptr] =  inbyte;  
   txB_wptr = (txB_wptr == TX_BUFFER_LEN) ? 0 : txB_wptr + 1;
   __enable_irq();
 }


void process_USB_IO(void)
 {
 while(txB_wptr != txB_rptr)
  {
    Serial.write(txBuffer[txB_rptr]);
    txB_rptr = (txB_rptr == TX_BUFFER_LEN) ? 0 : txB_rptr + 1;    
  }
 }


void XferDataISR()
 {
  process_USB_IO();
 }


void startIMSISR()
 { 
  addByteToTx(0xFF);  //Start of packet
  addByteToTx(0xFF); 
  addByteToTx(0xFF); 

  uint16_t tempint =  0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);
  
  tempint =  0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);
 
  tempint = 0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);

  tempint = 0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);

  tempint = 0x00;
  addByteToTx(tempint >> 8);
  addByteToTx(tempint & 0xFF);

  uint32_t temp32 = 0x00;
  addByteToTx(0x00);
  addByteToTx((temp32 >> 16) & 0xFF);
  addByteToTx((temp32 >> 8) & 0xFF);
  addByteToTx(temp32 & 0xFF); 
 
  adc_start_continuous(); 
 }

void adc_start_continuous(void)
 {
  curphase = 0;
  samplecount = 0;
  adc->adc0->setReference(ADC_REFERENCE::REF_EXT); 
  adc->adc0->setAveraging(4); 
  adc->adc0->setResolution(16); 
  adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED_16BITS);   //was MED_SPEED
  adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED); // change the sampling speed 
  
  adc->adc0->enableInterrupts(adc0_isr);
  NVIC_SET_PRIORITY(IRQ_ADC0, 8);
  adc->adc0->startContinuous(PIN_IMS_SIG);
 }


void adc0_isr(void) 
 {
   int tmpval =  (uint16_t) adc->adc0->analogReadContinuous();  
   if (tmpval > 0xFFF0)
    {
      tmpval = 0xFFF0;
    } 
    addByteToTx((tmpval >> 8) & 0xFF);   
    addByteToTx(tmpval & 0xFF);

    if (samplecount++ > 1498)
      {
        adc->adc0->stopContinuous();
      }
 }
 
180MHz with "Faster" optimization. Teensyduino 1.53 with Arduino 1.8.13. As a side note, Paul's work and the Teensy community have enabled me to build prototypes amazingly quickly, and I am incredibly thankful for the libraries and hard work that have gone into it. I'm not really looking for specific technical support for this bug -- it's almost certainly a silicon erratum in the Kinetis chip. I just thought it was a crazy nerd-snipe, interesting thing to investigate. There was one other person on the NXP forum (link in first message in this thread) who indicated the problem was solved by reworking the ADC input protection, but I'll bet there was some tiny code change that prevented the bad condition from occurring.
 
Would be interesting to monitor chip temp while your code is running. I could easily image some area of the chip heating up with certain code.
 
1) Disabling interrupts in addByteToTx() just wastes time since txB_wptr isn't modified anywhere else. Certainly not in another ISR.

2) The more natural (and faster!) way to write that is:

Code:
void addByteToTx(uint8_t inbyte)
 {
   txBuffer[txB_wptr++] =  inbyte;
   if(txB_wptr == TX_BUFFER_LEN)
       txB_wptr = 0;
 }
(Ditto for reading data from the buffer.)


The USB serial routines support writing more than a byte at a time so it would be far faster if you took advantage of that. It would require a bit more work up front of course. Although since you always build a fixed sized packet of data...


This matters because the interval timer ISR that starts the USB transfer is blocking. Which means that the timing in loop() could get thrown off if that transfer takes any significant amount of time. Or if the stuff that happens behind the scenes in loop() takes some time.

I think that I would structure that differently. Use the interval timer to update the DAC output and handle the serial I/O in loop(). That way all of the time critical things happen in an ISR. The serial I/O is much less critical so can be interrupted.

(Also, to reduce jitter, I would compute the next DAC output value and store it in a static variable. Which would get written to the DAC at the beginning of the ISR.)
 
Status
Not open for further replies.
Back
Top