Possible Teensyduino modification to ease debugging...

tician

Member
Been quite a while since I've been active on any forums, but nice to see KurtE is still going strong.


Not really urgent in any sense and not exclusive to any Arduino variant and not sure how many more projects I'll be using the T4 for, but it would be nice if the -O0 option were available inside the Arduino settings by default instead of having to modify the settings every time I update the boards/core. gcc is generally pretty nice, but I have had several instances on multiple platforms where essential bits of code end up being completely optimized out with any option other than -O0 and sending me bashing my head against a wall.

Three particular instances come to mind of gcc optimizations breaking things and causing so much annoyance tracking it down:
1) some initialization/callback issue in the Robotis DARwIn-OP motion system on x86-64
2) some timer initialization code on the Adafruit ItsyBitsy-M0 and/or Feather-M4 where it eliminated a register initialization line that meant the timer would start at random value and take forever to loop around
3) yesterday's irritation: a convenience pointer to a 512 byte array for USB output that gets data values loaded out of a pair of DMAMEM buffers/arrays fed from the ADCs

Code:
volatile uint32_t usbhs_write_buffy[128] = {};
char *usbhs_write_buffy_pointer = (char *) usbhs_write_buffy;
versus
Code:
volatile uint32_t usbhs_write_buffy[128] = {};
const uint8_t *usbhs_write_buffy_pointer = (const uint8_t *) usbhs_write_buffy;

producing very different behaviors when used in this bit of code:

Code:
usbhs_write_buffy[2] = (blocktotal<<16) | (++blockiter);
for (uint32_t iter=0; iter<samples_per_block; iter++)
{
  usbhs_write_buffy[3+iter] = (uint32_t) (pbuffer0[(jter*samples_per_block)+iter]<<16) | (pbuffer1[(jter*samples_per_block)+iter]);
}
// Write out the 512 byte packet
Serial.write(usbhs_write_buffy_pointer, 512);

Basically going for a ~1Msps 12-bit 2-channel ADC streaming data as fast as possible over the high-speed USB interface (4~5 Mbyte/sec). When using the char* version, it works exactly as expected. Use the const uint8_t* version, and it will manage to build and send the 512-byte blocks over USB exactly as expected (incrementing header values and pulling data from the DMAMEM buffers pointed to by pbuffer0 and pbuffer1), but it goes pear shaped somewhere in actually extracting new data from the ADC+DMA. The DMA thinks it is transferring data since the data transfer loop is dependent on AnalogBufferDMA interrupts being reactivated by the DMA hardware and debug prints show AnalogBufferDMA's internal _interrupt_count getting incremented properly. But the data stored in the DMAMEM arrays never actually change after the first ADC collection run and it seems like the ADC gets borked somehow since the loop gets completely stalled if the ADC input pins get pulled above 1.2V and resumes after dropping back below ~1.1V.


So, yeah. That took a full day to track down and really made me wish I had not been so lazy in performing the -O0 modification again.
 
Well... the const uint8_t* version looks wrong because the buffer it points to is not const. So there's undefined behaviour, since anything that gets passed that pointer can assume its contents never change. Maybe you meant to use "uint8_t* const usbhs_write_buffy_pointer = ..." instead?

Although from your description of the problem it sounds more like a cache issue (incoming data buffers not being invalidated before reading).
 
...and yes, I am aware the effective number of bits is less than the stated 12-bits at ADC_CONVERSION_SPEED::HIGH_SPEED and ADC_SAMPLING_SPEED::MED_SPEED.

Some ranty/venty backstory...
We have some custom ADC hardware that uses a bunch of obsolete parts with a really irritating high-speed USB software interface limited to windows only (and tends to glitch out on Intel hardware as well). It was hoped that the ADC hardware would be able to operate in an existing collection mode but with slightly different software operating for an experimental instrument with just some settings changed, but the ADC settings and operating modes are barely documented and inconsistently applied. That all required many more low-level software and FPGA changes than feared to get anything remotely working as hoped (and that is well outside my wheelhouse; obsolete FPGA with discontinued software support, and in-house sofware built on friggin' VisualBasic with some C/C++ and FORTRAN in the DLLs). So I ditched that and returned to a Feather-M4 ADC project and had the hardware working pretty well at collecting and processing. But the detector was crap and the necessary processing ended up requiring way more samples stored in RAM than it could hope to accommodate internally. Full-speed USB is laughably inadequate to get enough data off the Feather-M4 and I quickly remembered why I had abandoned at least three prior attempts to get any of the various high-speed FTDI chips working correctly with any of our hardware or software. So, overnighted a T4.0 from digikey.

Although the T4.0 has more than enough processing speed and power, even it does not have the RAM required to deal with the data stream required to get useful signal out of the detector. So I am very, very, VERY glad that the T4's have high-speed USB that can dump data out over the regular 'serial' port at ridiculous throughput.
 
Convenience pointer was because of Serial.write() demanding const char* or const uint8_t* for buffered writes and I got tired of trying to keep it happy while using a single 512-byte chunk of RAM and accessing it as uint32_t, uint16_t, uint8_t, or char arrays depending on need.

That pointer is used solely for the SerialUSB operation to dump data, yet it only caused problems in the ADC+DMA hardware. That is what kept me pulling my hair and wanting to smash things for so long because I kept looking and testing everything in the ADC and DMA code but finding nothing.
 
You can pass uint8_t* to a function that expects const uint8_t*; that's just a guarantee that the function won't modify the data.
It's still undefined to modify the memory pointed to by a const pointer. So this isn't a compiler bug, it's a programming bug.

I kept looking and testing everything in the ADC and DMA code but finding nothing.
And you're sure you were invalidating the DMA buffers (pbuffer0/pbuffer1) after they got filled / before reading them with the CPU? This could easily cause intermittent issues depending on what other code was executing in the meantime, since cache eviction happens naturally in the background of ordinary execution.
 
Literally changed the line from
Code:
const uint8_t *usbhs_write_buffy_pointer = (const uint8_t *) usbhs_write_buffy;
to
Code:
char *usbhs_write_buffy_pointer = (char *) usbhs_write_buffy;
with nothing else changed anywhere in my sketch and it started working correctly. Got so frustrated nearing a full day debugging this one silly thing that I just started reading through at the very top of the sketch and got to that line and thought "no. It couldn't be that". Then I changed it, recompiled, and it started working and I started swearing.


pbuffer0/pbuffer1 are local pointers used for grabbing values from the AnalogBufferDMA objects (e.g. volatile uint16_t *pbuffer0 = abdma0.bufferLastISRFilled()). The loop() checks for both ABDMA interrupted(), then iterates through the indicated ABDMA buffers in 512-byte increments spitting them out over USB while checking to see if all the 512-byte blocks in that 'collection' have been sent, then finally clears the interrupts (e.g. abdma0.clearInterrupt()).

With the (const uint8_t *) line, the actual data in the ABDMA buffers only gets updated the very first time by the ADC and it does something else everafter. The 512-byte data blocks for USB are generated correctly. The pbuffer0/pbuffer1 pointers are cycled correctly between their two respective DMAMEM array start locations, and the AnalogBufferDMA interrupt counts are incremented correctly as long as the analog input is below 1.2V. But the data in the DMAMEM arrays does not get updated after the first fill and the ABDMA interrupted() stop returning true if the analog input is above 1.2V. During the debugging, the size of the ABDMA buffers was varied between 125 and 2000 and 8000 and 20000 with the predictable result being the length of transmitted ADC values between repetitions of the data.

Did not add a timer based heartbeat on the SerialUSB2 debug console to check if the entire CPU was stalling or just the ADC+DMA stuff, but a complete CPU stall/halt would probably cause the USB device to disappear/disconnect fairly rapidly. It works now, but might have been found a bit earlier if compiler optimization were easier to disable.
 
Still loving the Teensy4 with its high-speed USB and hoping I can manage to get it working even better with some supplemental hardware to get rid of these damn 'Bitwise Systems' hardware boards using Cyclone II FPGAs and Cypress EZ-USB FX2LP USB chips. Adding the external ADCs should be no problem since it is just SPI with DMA instead of internal-ADC with DMA. The other input system requires >50MHz pulse counting/windowing that I don't know the Teensy4 can handle as reliably as an FPGA, although the FreqCount library does look like promising starting point.
 
hmm.
Code:
//    if ((uint32_t)pbuffer0 >= 0x20200000u)  arm_dcache_delete((void*)pbuffer0, sizeof(dma_adc0_buff1));
//    if ((uint32_t)pbuffer1 >= 0x20200000u)  arm_dcache_delete((void*)pbuffer1, sizeof(dma_adc0_buff1));
If I had actually read for comprehension, it deletes cache between the two RAM areas and not the actual data at that location.

Still quite odd that it would re-use the cache indefinitely despite RAM2 supposedly being repeatedly updated by DMA all because of that one unrelated pointer used purely to reference a properly updating USB data buffer to be read by Serial.write(). All writes to the USB data buffer were on the uint32_t array directly with that pointer used only to give Serial.write() the unchanging starting point of the buffer without having to do an explicit type cast in the function call.

I guess forever reusing cache could be considered one hell of an optimization if RAM2 was not actually being changed by the DMA. But the debug printouts showed the ADC was triggering DMA transfers as the address returned by the AnalogBufferDMA was switching between the two DMAMEM buffers every ~125us to ~20ms depending on what buffer size I used during debugging. The AnalogBufferDMA buffer size is always some multiple of 125 uint16_t ADC samples to get 250 uint16_t samples from the two ADCs into 500-bytes of a 512-byte USB data block.

But then the ADC seeming to stall when the analog input is pushed above 1.2V indicates the ADC input was getting misconfigured somehow immediately after that last collection where the DMA was supposed to return to using the first of its two buffers despite the ADCs being in continuous collection mode without any changes made to anything after the initial configuration in the setup(). When I had been pulling a bit too much hair out, I had ended up extracting all the ADC and AnalogBufferDMA stuff to an independent sketch for testing without dumping the data out over USB (which was the entire point of using the DMA) and the DMA continued updating exactly as expected even when the input was pushed above 1.2V and the DMAMEM buffers were never actually being used by the CPU in loop().
 
This has still been bugging me, so just did a bit more experimenting.

(const uint8_t*) version works fine with "Fastest" but has the "ADC stalls at input >1.2V" issue for "Fast" and "Faster". Actually use the arm_dcache_delete() and all of them work. So it was definitely related to optimization level and failure to use the arm_dcache_delete(), but I am still really curious how the optimization modes are managing to break the ADC input without the arm_dcache_delete() calls.
 
And encountered the ADC stalling at input >1.0V again with all optimization settings. It appears to be the enableCompare() and enableCompareRange() checks that are included in the ADC library examples after changing resolution to ensure that the ADC is actually behaving correctly. Failure to call disableCompare() for the adc objects after the compare checks is highly likely to cause inconsistent behavior later on when using continuous collection and DMA.

Code:
 ///// ADC0 ////
  adc->adc0->setReference(ADC_REFERENCE::REF_3V3);
  adc->adc0->setAveraging(1); // set number of averages
  adc->adc0->setResolution(12); // set bits of resolution
  adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED); // change the conversion speed
  adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::MED_SPEED); // change the sampling speed

  // always call the compare functions after changing the resolution!
  adc->adc0->enableCompare(1.0/3.3*adc->adc0->getMaxValue(), 0); // measurement will be ready if value < 1.0V
  adc->adc0->enableCompareRange(1.0*adc->adc0->getMaxValue()/3.3, 2.0*adc->adc0->getMaxValue()/3.3, 0, 1); // ready if value lies out of [1.0,2.0] V

  ////// ADC1 /////
  adc->adc1->setReference(ADC_REFERENCE::REF_3V3);
  adc->adc1->setAveraging(1); // set number of averages
  adc->adc1->setResolution(12); // set bits of resolution
  adc->adc1->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED); // change the conversion speed
  adc->adc1->setSamplingSpeed(ADC_SAMPLING_SPEED::MED_SPEED); // change the sampling speed

  // always call the compare functions after changing the resolution!
  adc->adc1->enableCompare(1.0/3.3*adc->adc1->getMaxValue(), 0); // measurement will be ready if value < 1.0V
  adc->adc1->enableCompareRange(1.0*adc->adc1->getMaxValue()/3.3, 2.0*adc->adc1->getMaxValue()/3.3, 0, 1); // ready if value lies out of [1.0,2.0] V

  // setup a DMA Channel.
  abdma0.init(adc, ADC_0);
  abdma0.userData(0); // one uint32_t for whatever
  abdma1.init(adc, ADC_1);
  abdma1.userData(0); // one uint32_t for whatever

  adc->adc0->startContinuous(mypinADC0);
  adc->adc1->startContinuous(mypinADC1);
 
Back
Top