Teensy 4: Global vs local variables speed of execution

Status
Not open for further replies.
> adc->adc1->setAveraging(1);

You can change the 1 to 2 or more for less noise. Also consider posting a schematic to get a lower noise design - there is more to it than just here and there capacitors.
 
Indeed that code can be tweaked - as noted last post was avg(1) and running 133,386 reads of 10 pins per second.

Changing to this:
Code:
  ///// ADC0 ////
  adc->adc0->setAveraging(2);                                    // set number of averages
  adc->adc0->setResolution(12);                                   // set bits of resolution
  adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED ); // change the conversion speed
  adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED );     // change the sampling speed
  ////// ADC1 /////
  adc->adc1->setAveraging(2);                                    // set number of averages
  adc->adc1->setResolution(12);                                   // set bits of resolution
  adc->adc1->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED ); // change the conversion speed
  adc->adc1->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED );     // change the sampling speed

With Avg(2) cuts that 133K down to 50K per second.

But I see in the unrolled code that setting :: #define someWork { delayNanoseconds(3500);}
>> it runs still at 49K per second. So increasing the averaging to read the win ADC's seems to take over 3 ms longer. Dropping the resolution from 12 to 10 doesn't run much faster and tossing those 2 bits should hopefully give more usable 10 bits.

Which is OKAY if all the needed calcs on the prior numbers can be done - where 'someWork' is placed in that unrolled loop. The first read is wasted time - though perhaps the 'repeat once in a while code' could run there. And there is no extra 'someWork' added at the end for calc on the final paring.

With the above changes to last post (#75 ?) code I see this output where waitcount (waitCnt) is now showing the number of CPU CYCLES used in the read code:
Code:
P#24: 2.77V <P#16: 2.97V <P#25: 2.77V <P#18: 3.19V <P#19: 3.19V <P#20: 0.18V <P#21: 0.20V <P#22: 0.13V <P#23: 0.11V <P#14: 0.11V <
	10 pins read 49362 times per second at us=386300015 [12062 waitCnt]
 
The last code I have here was asking for 12 bit ADC reads - with 1 avg - not sure if that gives a more stable 10 bit value? It takes longer - but with the dual read improvement it offset the longer read time.

Here is the code I have last edited - there is an #ifdef for loop versus unrolled with cycle count around the two methods.

YMMV - not sure what I poked at last - but here it is - don't forget to call errCkADC() during debug to test that pin order in array works with the ADC it gets assigned to - perhaps in the 1 second update "if ( lT >= 1000 ) {":
Code:
#include <ADC.h>
#include <ADC_util.h>

ADC *adc = new ADC(); // adc object

#define PINS 10  // MUST BE EVEN to read on paired ADC's
const uint32_t adc_pins[] = {A10, A2, A11, A4, A5, A6, A7, A8, A9, A0};

void setup()
{
  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(9600);
  while (!Serial && millis() < 4000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  ///// ADC0 ////
  adc->adc0->setAveraging(1);                                    // set number of averages
  adc->adc0->setResolution(12);                                   // set bits of resolution
  adc->adc0->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED ); // change the conversion speed
  adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED );     // change the sampling speed
  ////// ADC1 /////
  adc->adc1->setAveraging(1);                                    // set number of averages
  adc->adc1->setResolution(12);                                   // set bits of resolution
  adc->adc1->setConversionSpeed(ADC_CONVERSION_SPEED::HIGH_SPEED ); // change the conversion speed
  adc->adc1->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED );     // change the sampling speed

  //BSerial.println(" ENTER to START \n");
  //while ( !Serial.available() );

  delay(500);
}

int value = 0;
int pin = 0;

uint32_t lC = 0;
uint32_t lShow = 0;
elapsedMillis lT = 0;
void errCkADC();
uint32_t priorR[PINS + 1];

uint32_t wc = 0;
void loop()
{
  // delayMicroseconds(32); // Benchmark guess at the time per loop to maintain 25+K reads of 8 pins per second

  lC++;
  if ( lT >= 1000 ) {
    lShow = lC;
    lC = 0;
  }
  uint32_t lastR[PINS + 1];
  lastR[PINS] = micros();
  uint32_t r10c = ARM_DWT_CYCCNT;
#if 0
  for (int i = 0; i < PINS; i += 2)
  { // reads >> 10 pins read 145200 times per second
    // start reads on both ADCs
    adc->startSynchronizedSingleRead(adc_pins[i], adc_pins[i + 1]);
    // wait for both to complete
    {
      // some amount of work on prior value testing could be done here
      // substituting this line for below averages 124 counts waiting for each read
      delayNanoseconds(60); // Benchmark guess at the time per loop to maintain 25+K reads of 8 pins per second
      //while (adc->adc0->isConverting() || adc->adc1->isConverting()) wc++;

      while (adc->adc0->isConverting() || adc->adc1->isConverting());
    }
    // now get both results
    lastR[i] = adc->adc0->readSingle();
    lastR[i + 1] = adc->adc1->readSingle();
  }
#else
  {
    #define someWork //{ delayNanoseconds(50);}
    uint32_t ii=0;
    adc->startSynchronizedSingleRead(A10, A2);
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

    adc->startSynchronizedSingleRead(A11, A4);
    someWork
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

    adc->startSynchronizedSingleRead(A5, A6);
    someWork
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

    adc->startSynchronizedSingleRead(A7, A8);
    someWork
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

    adc->startSynchronizedSingleRead(A9, A0);
    someWork
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();
  }
#endif
  wc += ARM_DWT_CYCCNT - r10c;

  if ( lShow ) {
    errCkADC( 999 );
    for (int i = 0; i < PINS; i++)
    {
      Serial.print("P#");
      Serial.print(adc_pins[i]);
      Serial.print(": ");
      Serial.print(lastR[i] * 3.3f / adc->adc0->getMaxValue(), 2);
      Serial.print("V <");
    }
    Serial.printf("\n\t%u pins read %u times per second at us=%lu [%lu waitCnt]\n", PINS, lShow, lastR[PINS], wc / lShow );
    lShow = 0;
    lT = 0;
    wc = 0;
  }
}

void errCkADC( uint32_t vv ) {
  // Print errors, if any.
  if (adc->adc0->fail_flag != ADC_ERROR::CLEAR)
  {
    Serial.print(vv);
    Serial.print("<< ADC0: ");
    Serial.println(getStringADCError(adc->adc0->fail_flag));
  }
  if (adc->adc1->fail_flag != ADC_ERROR::CLEAR)
  {
    Serial.print(vv);
    Serial.print("<< ADC1: ");
    Serial.println(getStringADCError(adc->adc1->fail_flag));
  }
  adc->resetError();
}

Thanks man, I will test this.

I tried up to 4 averages. 4 averages gave me an OK result but not as good as analogRead. However I only tested 10bits.

BTW, the delay you are adding. Is that important for the readings to work? I didn't add delay, just ran my code between. I'm guessing it doesn't matter but asking just in case.

Didn't check for errors, I ran my program with monitoring software that prints out telemetry data so I can easily see what's going on. I will check for errors.
 
> adc->adc1->setAveraging(1);

You can change the 1 to 2 or more for less noise. Also consider posting a schematic to get a lower noise design - there is more to it than just here and there capacitors.

Here is the circuit: https://www.dropbox.com/s/90zhth0m3q2k2hd/piezo circuit.jpg?dl=0

I just used some resistors I had available, the op amp is a TI model that was made for this application.

I bought a resistor and capacitor kit now so can experiment a bit more with different values. It worked well enough before so didn't bother to play around with it at this point but it can probably be improved.
 
Thanks man, I will test this.

I tried up to 4 averages. 4 averages gave me an OK result but not as good as analogRead. However I only tested 10bits.

BTW, the delay you are adding. Is that important for the readings to work? I didn't add delay, just ran my code between. I'm guessing it doesn't matter but asking just in case.

Didn't check for errors, I ran my program with monitoring software that prints out telemetry data so I can easily see what's going on. I will check for errors.

That someWork delayMicroseconds() was to get some measure of the amount of time available to do "Work" without affecting the actual throughput of the reading. With that active watch the reads per second ( and the CPU CYCLES shown in waitCnt ) and increase to see where it is no longer close to running without it.

The other 'spare time' for 'global' operations on prior data would be added as noted below:
Code:
#else
  {
    #define someWork //{ delayNanoseconds(50);}
    uint32_t ii=0;
    adc->startSynchronizedSingleRead(A10, A2);
   [B][COLOR="#FF0000"] someWork_GLOBAL // There will be 'spare' time here as well - but only for working with PRIOR data as no new data yet read this pass.[/COLOR][/B]
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
 
Just coded that in place - where someWork and someWork_Global are just placeholders for timing measure:
Code:
#define someWork_Global { delayNanoseconds(3500);}  // Delay waiting for first data to arrive - do Global work
#define someWork { delayNanoseconds(3500);}  // stepwise work after 4 of 5 new readings request is pending
    uint32_t ii = 0;
    adc->startSynchronizedSingleRead(A10, A2);
    [B]someWork_Global[/B]
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

    adc->startSynchronizedSingleRead(A11, A4);
    [B]someWork[/B]
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

...

With 12 bit read and (2) avg - adding that _Global does not change the unrolled loop runtime:
Code:
P#24: 2.77V <P#16: 2.97V <P#25: 2.77V <P#18: 3.19V <P#19: 3.19V <P#20: 0.18V <P#21: 0.20V <P#22: 0.13V <P#23: 0.11V <P#14: 0.11V <
	10 pins read 49356 times per second at us=291300007 [12065 waitCnt]

The placeholder is quickly removed with comments:
Code:
#define someWork_Global // { delayNanoseconds(3500);}  // Delay waiting for first data to arrive - do Global work
#define someWork // { delayNanoseconds(3500);}  // stepwise work after 4 of 5 new readings request is pending
 
Do add some filtering on the output of the op amps to reduce noise. Try a 1K resistor in series followed by a .1u capacitor to ground. Change capacitor as needed.
 
Last edited:
Just coded that in place - where someWork and someWork_Global are just placeholders for timing measure:
Code:
#define someWork_Global { delayNanoseconds(3500);}  // Delay waiting for first data to arrive - do Global work
#define someWork { delayNanoseconds(3500);}  // stepwise work after 4 of 5 new readings request is pending
    uint32_t ii = 0;
    adc->startSynchronizedSingleRead(A10, A2);
    [B]someWork_Global[/B]
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

    adc->startSynchronizedSingleRead(A11, A4);
    [B]someWork[/B]
    while (adc->adc0->isConverting() || adc->adc1->isConverting());
    lastR[ii++] = adc->adc0->readSingle();
    lastR[ii++] = adc->adc1->readSingle();

...

With 12 bit read and (2) avg - adding that _Global does not change the unrolled loop runtime:
Code:
P#24: 2.77V <P#16: 2.97V <P#25: 2.77V <P#18: 3.19V <P#19: 3.19V <P#20: 0.18V <P#21: 0.20V <P#22: 0.13V <P#23: 0.11V <P#14: 0.11V <
	10 pins read 49356 times per second at us=291300007 [12065 waitCnt]

The placeholder is quickly removed with comments:
Code:
#define someWork_Global // { delayNanoseconds(3500);}  // Delay waiting for first data to arrive - do Global work
#define someWork // { delayNanoseconds(3500);}  // stepwise work after 4 of 5 new readings request is pending

I really appreciate your efforts into this, thanks a lot. I'm going to need a day or so to go through it and understand everything. I have a hard time seeing exactly what is going on and also what is there for testing and what is the actual code.
 
Enjoy ...

These are test code : #define someWork and someWork_Global
and any use or ref of waitCnt

And this and any of the vars are just for measuring and limiting output to 1/sec:
Code:
  if ( lT >= 1000 ) {
    [B]errCkADC( 1001 ); // call this to be sure ADC reads are working during initial setup[/B]
    lShow = lC;
    lC = 0;
  }

The data goes into priorR[] for reference.

The unrolled loop code has the pin #' refs hard coded as array ref took cycles. Change out for your pins and proper order. When starting make sure to call errCkADC( 1001 ); to be cure no errors in pin association to ADC#, added above.

The #if 0 code can be removed if using the #else code, the upper for caode can be tested to compare - but it doesn't offer the clear places for work without a test each time.
 
Do add some filtering on the output of the op amps to reduce noise. Try a 1K resistor in series followed by a .1u capacitor to ground. Change capacitor as needed.

Yes that was also my plan (I will try before the op amp and after and see what works best) but I didn't have any capacitors laying around, I have a bunch of them now, will experiment a bit. Was thinking to try adding those series resistors and capacitors to ground on the op amp power supply as well. Also I'm on a breadboard now, will probably be better when I make a pcb and use SMD capacitors.
 
Do add some filtering on the output of the op amps to reduce noise. Try a 1K resistor in series followed by a .1u capacitor to ground. Change capacitor as needed.

BTW is it not better to only have the filtering before the op amp to keep the signal as low impedance as possible?
 
BTW is it not better to only have the filtering before the op amp to keep the signal as low impedance as possible?

sometimes - i just last week had to remove one pole after an op amp and before adc input on t4 to get
a 3-6 db improvement in signal to noise. the t4 seems to spit more charge out the adc input than the t3.6
(for which the filter had been designed originally), r was 1k, c was 0.0033u poly so could not really go
down in r. net result removing that pole and going from 5 pole to 4 pole was actually a net improvement
because the t4 adc input was seeing a much lower impedance looking into the op amp than it had been
looking into 0.0033u in parallel with 1k. not a massive improvement but i'll take it.
 
What is an array ref took cycle?

The data goes into priorR[] for reference.

The unrolled loop code has the pin #' refs hard coded as array ref took cycles.
adc_pins[] array reference indexing like in the for(i=0... ) code when unrolled added processor cycles - it seemed because 'i' was no longer an active variable stored in a register.

adc->startSynchronizedSingleRead(adc_pins, adc_pins[i + 1]);

versus

adc->startSynchronizedSingleRead(A10, A2);
 
The reference manual says that 4K ohms is low enough source resistance for a T4 ADC input. 1K will be fine, even if an adjustment is needed for higher speed modes. Lower won't make any difference.
 
The reference manual says that 4K ohms is low enough source resistance for a T4 ADC input. 1K will be fine, even if an adjustment is needed for higher speed modes. Lower won't make any difference.

Good to know, thanks. I'll calculate what exact cutoff frequency I need depending on the signal. The sensors are catching peaks of about 500-1000Hz but three of them will have to be a bit higher, maybe up to 1500-2000Hz.

What about using bypass capacitors? Any recommendations there? I'm thinking maybe using them on the power supply of the op amp.
 
adc_pins[] array reference indexing like in the for(i=0... ) code when unrolled added processor cycles - it seemed because 'i' was no longer an active variable stored in a register.

adc->startSynchronizedSingleRead(adc_pins, adc_pins[i + 1]);

versus

adc->startSynchronizedSingleRead(A10, A2);


Ah, I understand.

I'm not that familiar with the preprocessor directives, I'm going to have to read about them a bit. I've only seen them used to include header files and whatnot.

I wish I had gone to computer science school or something, it's a steep learning curve when you are trying to learn by your self reading stuff online.
 
You can try a 10 ohm resistor and then a 10u capacitor on line from power supply to the op amp.
 
Status
Not open for further replies.
Back
Top