Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 8 of 8

Thread: Input capture on teensy 4.1 and registers access time

  1. #1
    Junior Member
    Join Date
    May 2021
    Posts
    2

    Input capture on teensy 4.1 and registers access time

    Hello everyone,

    I am an assistant professor at the University of Angers (France). We are currently trying to use the teensy 4.1 to counts single photons or more precisely to count and tag-time the electrical pulses emanating from single-photon counter (i.e. Avalanche PhotoDiode).

    Just for information, from this raw data, we get access to the Brownian motion of nanoparticles in solution and we can measure their size and shape (via Fluctuation Correlation Spectroscopy or Dynamic Light scattering).

    Obviously, electronics for time-tagging does exist (and, for the initiated, we do have a TCSPC card) but are quite expensive (a few thousand euros). Open source projects based on FPGA do exist (like this one that we have tested in the past) but FPGA are... hard to use (at least for us now).

    The idea is to use a fast micro-controller (the teensy 4.1) to time tag the pulses, thoroughly test it and then create an open-source project (and also publish an article because, apparently, that's why I am paid for...).

    Our aim is to time-tag the pulses with a least a temporal precision of, let's say 50ns, with a count rate as high as possible (at least 1 million pulses per second).

    For this goal, we are using the input captures of the 32 bits General Purpose Timer (GPT).

    We are using double buffering to transfer the capture register to RAM, then from RAM to PSRAM, and then from PSRAM to a fast SDCard. So far so good except for the count rate that can't exceed 500kHz without getting wrong capture times.

    Consequently, we have investigated and found an unexpected bottleneck.

    We have measured on the oscilloscope the time taken for the capture interrupt in order to take place (see code below) and found it was 1Ás (which explains the problem at 500kHz).

    By increasing the clock speed sent to the GPT (PERCLCK_CLK_ROOT) from 24Mhz to 300Mhz we have reduced the interrupt total time to around 500ns. Even the minimal interrupt function (only resetting the interrupt flag) takes 150ns.

    In other words, accessing the registers for writing or reading takes a lot of cycles and we did not plan it that while crafting all our memory transfers map in order to measure at the higher rate possible.

    Now, time for some code.

    Here is the setup of the GPT timer (number 2) in order to perform input capture :

    Code:
    void setupTimerGPT2(void)
    {
      // IOMUX Configuration in order to have physical access to capture 1 et 2 pins of GPT2
      //GPT capture 1
      IOMUXC_GPT2_IPP_IND_CAPIN1_SELECT_INPUT = 1;  // remap GPIO_AD_B1_03_ALT8 GPT2 Capture1 (Channel 1)
      IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_03 = 8; // GPT2 Capture1 configuration ALT8 Pin 15
      IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_03 = 0x13000; //Pulldown & Hyst
    
      //GPT capture 2
      IOMUXC_GPT2_IPP_IND_CAPIN2_SELECT_INPUT = 1;  // remap GPIO_AD_B1_04_ALT8 sur GPT2 Capture2 (Channel 2)
      IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_04 = 8; // GPT2 Capture2 configuration ALT8 Pin 40
      IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_04 = 0x13000; //Pulldown & Hyst
    
    
      //Configuration du bus d'horloge
      // #define CCM_CSCMR1_PERCLK_CLK_SEL   ((uint32_t)(1<<6))
      
      // Change the Clock Controller Module in order to use PERCLK_CLK_ROOT for the counter and not OSC@24MHz (default)
      CCM_CSCMR1 &= ~CCM_CSCMR1_PERCLK_CLK_SEL; //
    
      // Change the prescaler between AHB_CLK_ROOT (typically 600MhZ) and PERCLK_CLK_ROOT (default is 4 -> 150MHz)
      CCM_CBCDR = CCM_CBCDR_IPG_PODF(1);	// NB I can't get 0 (that is to say no prescaler) to work
       
      // Set the CCM Clock Gating Register
      CCM_CCGR0 |= CCM_CCGR0_GPT2_BUS(CCM_CCGR_ON) |
                   CCM_CCGR0_GPT2_SERIAL(CCM_CCGR_ON);  // enable clock
      
    
      //Clear GPT2 registers, namely  CR, PR and SR   
      GPT2_CR = 0;
      GPT2_PR = 0; //No prescaler.
    
      // "Clear" bit flags (ROV, IF1 and IF2) writing one in them
      GPT2_SR = GPT_SR_ROV |            //Clear bit ROV
                GPT_SR_IF1 |            //Clear bit IF1
                GPT_SR_IF2;             //Clear bit IF2
                 
    
      //CR  register of GPT2 (Control Register)
      GPT2_CR = GPT_CR_EN |             //EN = 1 activate TIMER GPT2
                GPT_CR_FRR |            //Free run mode
                GPT_CR_CLKSRC(1) |      //Clock source is Peripheral Clock
                GPT_CR_IM1(1) |         //Capture activated on channel 1 on rising edge only
                GPT_CR_IM2(1);          //Capture activated on channel 2 on rising edge only
    
    
      //IR register of GPT2 (Interruptions)
      GPT2_IR = GPT_IR_ROVIE |          //Interruption on overflow of the 32bits counter
                GPT_IR_IF1IE |          //Interruption on Channel 1 capture 
                GPT_IR_IF2IE;           //Interruption on Channel 2 capture 
    }
    Here is the code to start the capture :

    Code:
    void Start_Capture(void)
    {
      // Clear the interrupt flags 
      GPT2_SR = GPT_SR_ROV |            
                GPT_SR_IF1 |            
                GPT_SR_IF2;             
    
      // Custom variable initialization (not very relevant for this post)
      TimeTagPtr1 = 0;
      TimeTagPtr2 = 0;
      TimeTagNb1 = 0;
      TimeTagNb2 = 0;
      OverflowCount = 0;
      
      //Double buffer management
      halfBuffer1 = false;
      halfBuffer2 = false;
    
    
      //enable  IRQ_GPT2 interruption
      NVIC_ENABLE_IRQ(IRQ_GPT2);
    }
    And here is the code executed during the interrupt capture :

    Code:
    void GPT2capture() {
    
        // For timing on the oscilloscope  
        //digitalWriteFast(DEBUG_BLINK_PIN, HIGH);
    
        //Test the origin of interruption
        
        if (GPT2_SR & GPT_SR_ROV) {
          //32 bits counter overflow 
          GPT2_SR |= GPT_SR_ROV;    //Reset ROV flag
          // Tag the event "overflow" with the special time 0xFFFFFFFF
          PsRamBuffer1[TimeTagPtr1++] = 0xFFFFFFFF;
          PsRamBuffer2[TimeTagPtr2++] = 0xFFFFFFFF;
          // buffer wrap up
          if (TimeTagPtr1 == buffersize) TimeTagPtr1 = 0;
          if (TimeTagPtr2 == buffersize) TimeTagPtr2 = 0;
        }
        
        if (GPT2_SR & GPT_SR_IF1) {
          //capture onchannel 1
          GPT2_SR |= GPT_SR_IF1;   //reset IF1 flag 
          PsRamBuffer1[TimeTagPtr1++] = GPT2_ICR1;   //read and store capture register
          if (TimeTagPtr1 == buffersize) TimeTagPtr1 = 0;
          TimeTagNb1++;
        }
        
        if (GPT2_SR & GPT_SR_IF2) {
          //capture onchannel 2
          GPT2_SR |= GPT_SR_IF2;   //reset IF2 flag 
          PsRamBuffer2[TimeTagPtr2++] = GPT2_ICR2;   //read and store capture register
          if (TimeTagPtr2 == buffersize) TimeTagPtr2 = 0;
          TimeTagNb2++;
        }
        
        asm volatile ("dsb");   // wait for clear  memory barrier
    
        // For timing on the oscilloscope
        //digitalWriteFast(DEBUG_BLINK_PIN, LOW);
      }
    And here is the "minimalistic" interruption I was mentioning earlier and takes 150ns :

    Code:
    void GPT2capture() 
    {
    
    GPT2_SR |= GPT_SR_IF1;	//Reset the interruption flag (assuming that only event are detected on channel 1)
    asm volatile ("dsb");
    }
    As a side note, and this could be a clue, when the interruption is only :

    Code:
    void GPT2capture() 
    {
    
    digitalWriteFast(DEBUG_BLINK_PIN, HIGH);
    asm volatile ("dsb");
    digitalWriteFast(DEBUG_BLINK_PIN, LOW);
    }
    I get a periodic signal at "only" 23Mhz (instead of the 150Mhz that digitalWriteFast toggling can attain)


    Finally one specific question :
    - Could you tell me how to reduce the access time to the registers? Or, conversely, could you explain to me why it can't be reduced?

  2. #2
    Senior Member
    Join Date
    Feb 2018
    Location
    Corvallis, OR
    Posts
    349
    Do you need continuous sampling? If not, could you use an intermittent sampling scheme where you collect data for 10 milliseconds, stop and write to SD card, collect another 10milliseconds, stop and write to SD card, etc. etc..?

    If intermittent sampling is acceptable, you might be able to avoid the timer interrupts and other issues by blocking interrupts, going into a tight loop which waits for rising edges, captures the count in the 600MHz CPU clock counter and stores it in an array, the loops back to wait for the next rising edge.

    I'll see if I can come up with some example code to verify this algorithm.

  3. #3
    Senior Member
    Join Date
    Feb 2018
    Location
    Corvallis, OR
    Posts
    349
    Here is a sample sketch showing intermittent capture of input pulse timing.

    Code:
    // Fast_Stamp_4
    // Sketch to test fast intermittent time-stamp collection
    // M. Borgerson   May 25, 2010
    
    //================
    const int LEDpin = 13;
    const int inpin1 = 5;
    #define  LEDON digitalWriteFast(LEDpin, HIGH);
    #define  LEDOFF digitalWriteFast(LEDpin, LOW);
    #define  LEDTOGGLE digitalToggleFast(LEDpin);
    //*******************************************************
    
    // use a sample size that is multiple of 512 bytes for efficient SD storage
    #define MAXSAMPLES 10240
    uint32_t timestamps1[MAXSAMPLES];
    uint16_t samplecount;
    
    void setup() {
      //Initialize the digital pin 13 as an output for LED.
      pinMode(LEDpin, OUTPUT);
      pinMode(inpin1, INPUT);
      delay(500);
      Serial.begin(9600);
      Serial.println("\n\nFast Time Stamp Test.");
      Serial.println("Press 's' to collect and display sample,  'h for histogram.");
    
    }
    
    
    //MAIN LOOP routine
    //=================
    void loop() {
      char ch = 0;
    
      if (Serial.available()) {
        ch = Serial.read();
          LEDON
          CollectSample(MAXSAMPLES);
          AdjustSamples(MAXSAMPLES);
          LEDOFF
        if (ch == 's') {
          ShowSamples(200);
        }
        if (ch == 'h') {
          ShowHisto(MAXSAMPLES);
        }
        Serial.println("Press 's' to collect and display sample,  'h for histogram.");
      }
    
    }
    
    // Collect time stamps for rising edges with interrupts off
    void  CollectSample(uint32_t numsamples) {
      uint16_t scount = 0;
      while (digitalReadFast(inpin1)); // wait for pin low to start
      noInterrupts();
      do {
        while (!digitalReadFast(inpin1)); // wait for rising edge
        timestamps1[scount] = ARM_DWT_CYCCNT; // capture CPU cycle count
        scount++;
        while (digitalReadFast(inpin1)); // wait for pin low to loop
      } while (scount < MAXSAMPLES);
      interrupts();
    }
    
    
    // Convert time stamps to CPU clock cycles since first sample.
    // This handles roll over in the ARM CPU cycle counter
    void AdjustSamples(uint32_t numsamples) {
      uint32_t i, firstvalue;
      firstvalue = timestamps1[0];
      for (i = 0; i < numsamples; i++) {
        timestamps1[i] = timestamps1[i] - firstvalue;
      }
    
    }
    
    // Show the timestamps for the first numsamples from array
    void ShowSamples(uint32_t numsamples) {
      uint16_t i;
      Serial.println("CPU counts between rising edges");
      for (i = 0; i < numsamples-1; i++) {
        if ((i % 10) == 0)Serial.printf("\n% 4u: ", i);
        Serial.printf("%6lu ", timestamps1[i+1]- timestamps1[i]);
      }
      Serial.println();
    }
    
    //  For complete data collection, the samples would be written to SD.
    //  For now, just show a histogram to evaluate stability and resolution.
    //  I assume that the maximum input interval is not greater than 32767
    //  CPU clock cycles or 18.3 milliseconds
    #define HISTOSIZE 32678
    static uint16_t histocounts[HISTOSIZE];   // 128KByte histogram buffer
    void ShowHisto(uint32_t numsamples) {
      uint32_t i, dcount;
      uint16_t outliers = 0;
      uint16_t firstval, lastval;
      // first set all histocounts to zero
      for (i = 0; i < HISTOSIZE; i++) histocounts[i] = 0;
      // Now build the histogram data
      for (i = 1; i < numsamples; i++) {
        dcount = timestamps1[i] - timestamps1[i - 1]; // find difference in time stamps
        if (dcount < HISTOSIZE) {
          histocounts[dcount]++;
        } else {
          outliers++;
        }
      }
      // now display histogram.  Skip zero values in histogram data
      firstval = 0;  lastval = 0;
      for (i = 0; i < HISTOSIZE;  i++) {
        if ((firstval == 0)  && (histocounts[i] != 0)) firstval = i;
        if (histocounts[i] != 0) lastval = i;
      }
      // display firstval and lastval
      Serial.printf("\nhFirst valid count at: %u    Last valid count at %u\n", firstval,lastval);
    
      Serial.println("Histogram clock intervals and samples.");
      for (i = firstval; i <= lastval; i++) { // show only non-zero counts
        if(histocounts[i]) Serial.printf("%u   %u\n", i, histocounts[i]);
      }
      Serial.printf("Outliers:  %u\n", outliers);
      Serial.println();
    }
    The time stamp resolution appears to be about 9 cycles of the 600MHz clock, or about 15 nanoSeconds. It will capture at least 2 million samples/second (the limit of my inexpensive signal generator);

    Here is some sample output:
    Code:
    Press 's' to collect and display sample,  'h for histogram.
    CPU counts between rising edges
    
       0:    502    509    518    509    509    518    509    509    518    509 
      10:    518    509    509    518    509    509    509    518    509    509 
      20:    518    509    509    518    509    509    518    509    509    518 
      30:    509    518    509    509    518    509    509    518    509    509 
      40:    518    509    509    518    509    509    518    509    509    518 
      50:    509    509    518    509    509    518    509    509    518    509 
      60:    509    518    509    509    518    509    509    518    509    509 
      70:    518    509    509    518    509    509    518    509    509    518 
      80:    509    518    509    509    518    509    509    518    509    509 
      90:    518    509    509    518    509    509    518    509    509    518 
     100:    509    509    518    509    509    518    509    509    518    509 
     110:    509    518    509    509    518    509    509    518    509    509 
     120:    518    509    509    518    509    509    518    509    509    518 
     130:    509    509    518    509    509    518    509    509    518    509 
     140:    518    509    509    518    509    509    518    509    509    518 
     150:    509    509    518    509    509    518    509    509    518    509 
     160:    509    518    509    509    518    509    509    518    509    509 
     170:    518    509    509    518    509    509    518    509    509    518 
     180:    509    509    518    509    509    518    509    509    518    509 
     190:    509    518    509    509    518    509    509    518    509 
    Press 's' to collect and display sample,  'h for histogram.
    Press 's' to collect and display sample,  'h for histogram.
    CPU counts between rising edges
    
       0:    510    518    509    509    518    509    509    518    509    509 
      10:    518    509    518    509    509    518    509    509    518    509 
      20:    509    518    509    509    518    509    509    518    509    509 
      30:    518    509    518    509    509    518    509    509    518    509 
      40:    509    518    509    509    518    509    509    518    509    518 
      50:    509    509    518    509    509    518    509    509    518    509 
      60:    509    518    509    518    509    509    518    509    509    518 
      70:    509    509    518    509    509    518    509    509    518    509 
      80:    518    509    509    518    509    509    518    509    509    518 
      90:    509    509    518    509    518    509    509    518    509    509 
     100:    518    509    509    518    509    509    518    509    518    509 
     110:    509    518    509    509    518    509    509    518    509    509 
     120:    518    509    509    518    509    518    509    509    518    509 
     130:    509    518    509    509    518    509    509    518    509    518 
     140:    509    509    518    509    509    518    509    509    518    509 
     150:    509    518    509    518    509    509    518    509    509    518 
     160:    509    509    518    509    509    518    509    509    518    509 
     170:    518    509    509    518    509    509    518    509    509    518 
     180:    509    509    518    509    509    518    509    518    509    509 
     190:    518    509    509    518    509    509    518    509    509 
    Press 's' to collect and display sample,  'h for histogram.
    
     First valid count at: 509    Last valid count at 518
    Histogram clock intervals and samples.
    509   6575
    510   1
    518   3663
    Outliers:  0
    Press 's' to collect and display sample,  'h for histogram.
    Of course, this is collecting only one channel of data. Collecting more channels will increase the time in the fast loop and and decrease the timing resolution.

    It may be possible to combine this fast loop algorithm with the input capture of the 300MHz clock to get acceptable resolution without the issues that successive checking of multiple channels in the fast loop will present.

    I think that the problems in the original code with high input rates are caused by the SD card writes. The SD routines do block interrupts for a short time (1 to 2 microseconds) with each write. It takes that long for the SD driver to set up the DMA transfers that accomplish the actual data transfer.

  4. #4
    Senior Member
    Join Date
    Feb 2018
    Location
    Corvallis, OR
    Posts
    349
    I wrote a test program to use the input capture capability on GPT2 and to save the data to SD. The program saves separate files for channels 1 and 2. It seems to work well with one channel at about 900KHz and the other up to about 1.7MHz pulse rates. I also added PWM pulse outputs from pins 5 and 6 for a bit of self-test capability if you jumper the pulse outputs to the input pins.

    I found that running the GPT clock above 150MHz resulted in a lot of bad data. I suspect that running the GPT clock faster than the 150MHz bus clock for the timer registers causes the problems. I also found that counting pulses with longer duty cycles caused some problems, so I set up the PWM outputs for narrow pulses.

    Here's the test program:
    Code:
    /**************************************************************************************
       Sketch to test fast  time-stamp collection and storage
       This version uses data from GPT input capture with Teensy 4.1
       and requires the installation of PSRAM to hold the large buffers
       needed for writing to SD Card with data rates up to a combined
       4 MBytes/second.
       The data values saved are 16-bit unsigned integers which represent the difference
       in GPT clock counts  between the current capture and the previous capture.
       This method has two advantages:
       1.  The subtraction of the unsigned 32-bit capture values automatically corrects for
           roll over in the 32-bit counter.
       2.  Storing the 16-bit differences cuts storage time and bandwidth in half.
     
       The scheme also has a disadvantage:  if the time between captures exceeds 65535 clocks
       you will have bad data.  However 65K counts of a 150MHz clock is 0.43 milliSeconds,
       which is not an issue with the high repetition rates I'm using for testing.
       If this limit is a problem, it can be increased by slowing the GPT clock--at the
       expense of reduced timing resolution.
       If average pules rates are known to be slower than about 250KHz, you could adjust
       the program to save the full 32-bit time differences, but that doubles the required
       storage for the buffers.
     
       If you are a Matlab user, you can generate an array of time stamps in microseconds
       with a few lines of Matlab code:
     
        %  starting with vector tds1 read from file
        tsusec = tds1 / 150;  % convert count difference to microseconds
        timestamp1 = cumsum(tsusec);  % convert to monotonically increasing time stamps
    
    
    // M. Borgerson   May 27, 2010
    ***************************************************************************************/
    #include <SD.h>
    #include <TimeLib.h>
    
    //================
    const int LEDpin = 13;
    const int inpin1 = 15;  // pins that can be connected to GPT capture inputs
    const int inpin2 = 40;
    
    
    const int pwmpin1 = 5; // FLEXPWM2_1_A
    const int pwmpin2 = 6;// FLEXPWM2_2_A
    
    File file1, file2;
    
    #define  LEDON digitalWriteFast(LEDpin, HIGH);
    #define  LEDOFF digitalWriteFast(LEDpin, LOW);
    #define  LEDTOGGLE digitalToggleFast(LEDpin);
    //*******************************************************
    
    // Use a sample size that is multiple of 512 bytes for efficient SD storage.
    // When writing to SD Card, buffers should hold at least  200milliseconds of data.
    // For high sample rates, that means that buffers must be in PSRAM.
    #define MAXSAMPLES 512000   // one half second storage at 1 million samples/second
    uint16_t timestamps1A[MAXSAMPLES]EXTMEM;
    uint16_t timestamps1B[MAXSAMPLES]EXTMEM;
    uint16_t timestamps2A[MAXSAMPLES]EXTMEM;
    uint16_t timestamps2B[MAXSAMPLES]EXTMEM;
    
    
    // set up pointers to buffers for quick change inside IRQ handler
    volatile uint16_t *bpt1[2]  = {&timestamps1A[0], &timestamps1B[0]};
    volatile uint16_t *bpt2[2]  = {&timestamps2A[0], &timestamps2B[0]};
    
    // These are the pointers used in the IRQ handler to store data
    volatile uint16_t *bptr1 = bpt1[0];
    volatile uint16_t *bptr2 = bpt2[0];
    
    volatile uint16_t *outptr1 = NULL;
    volatile uint16_t *outptr2 = NULL;
    
    volatile uint32_t lastcapt1, lastcapt2;
    volatile uint32_t samplecount1, samplecount2;
    volatile uint32_t totalsamples1, totalsamples2;
    
    // set pwm test output frequencies
    uint32_t fc1 = 720000;
    uint32_t fc2 = 900000;
    
    uint32_t unixtime;
    
    void setup() {
      //Initialize the digital pin 13 as an output for LED.
    
      pinMode(LEDpin, OUTPUT);
      pinMode(inpin1, INPUT);
    
      delay(500);
      Serial.begin(9600);
      Serial.println("\n\nFast Time Stamp output Test.");
      Serial.println("Press 's' to collect, 'q' to stop,  'h' for histograms.");
    
    
      if (!StartSDCard()) {
        Serial.println("Can not initialize SD card!");
        while (1) {
          LEDON
          delay(50);
          LEDOFF
          delay(50);
        }
      }
      setupTimerGPT2();
      SetPWM(720000, 900000);
    }
    
    
    //MAIN LOOP routine
    //=================
    void loop() {
      char ch = 0;
      char *fname;
      if (Serial.available()) {
        ch = Serial.read();
    
        switch (ch) {
          case 'c': // sample without writing to SD
            Serial.println("\nCollecting, but not saving.");
            totalsamples1 = 0; totalsamples2 = 0;
            // open the data files
            StartSampling();
            break;
          case 's':
            Serial.println("\nSaving data to SD card.");
            totalsamples1 = 0; totalsamples2 = 0;
            // open the data files
            fname =  GetFileName("TStamp1_", "TS1");
            file1 = SD.open(fname, FILE_WRITE);
            fname =  GetFileName("TStamp2_", "TS1");
            file2 = SD.open(fname, FILE_WRITE);
            StartSampling();
            break;
          case 'v':
            ShowSamples(100, timestamps1A);
            ShowSamples(100, timestamps2A);
            break;
          case 'h':
            ShowHisto(MAXSAMPLES, timestamps1A);
            ShowHisto(MAXSAMPLES, timestamps2A);
            break;
          case 'q':
            file1.close();
            file2.close();
            StopSampling();
            break;
          case 'a':
            SetPWM(900000, 1000000);
            break;
          case 'b':
            SetPWM(600000, 980000);
            break;
          case 'd':
            SD.sdfs.ls( LS_SIZE | LS_DATE |  LS_R);
            Serial.println();
            break;
        }
        Serial.println("Press 's' to collect, 'q' to stop,  'h' for histograms.");
      }
      CheckBuffers();
    
    }
    
    void CheckBuffers(void) {
      if (outptr1 != NULL) { // save buffer 1
        totalsamples1 +=  MAXSAMPLES;
        // write to SD here
        if (file1) {
          file1.write((const void *)outptr1, MAXSAMPLES * 2);
        }
        outptr1 = NULL;
      }
      if (outptr2 != NULL) { // save buffer 2
        totalsamples2 +=  MAXSAMPLES;
        // write to SD here
        if (file2) {
          file2.write((const void *)outptr2, MAXSAMPLES * 2);
        }
        outptr2 = NULL;
      }
    }
    
    void StartSampling(void) {
      bptr1 = bpt1[0];
      bptr2 = bpt2[0];
    
      outptr1 = NULL;
      outptr2 = NULL;
    
      samplecount1 = 0;
      samplecount2 = 0;
      lastcapt1 = 0;  // needed to maintain time coherence between
      lastcapt2 = 0;  // channels 1 and 2
      GPT2_CR |= GPT_CR_EN;  // Reset count and enable counter
      NVIC_ENABLE_IRQ(IRQ_GPT2);
    }
    
    void StopSampling(void) {
      GPT2_CR &=  ~GPT_CR_EN;  // Reset count and enable counter
      NVIC_DISABLE_IRQ(IRQ_GPT2);
      Serial.println("Sampling halted.");
      Serial.printf("Samples Saved:   Ch1: %lu      Ch2:  %lu\n", totalsamples1, totalsamples2);
    }
    
    char *GetFileName(const char *fbase, const char *ext) {
      static char filename[64];
      time_t nn;
      nn = now();
      uint8_t mo = month(nn);
      uint8_t dd = day(nn);
      uint8_t hh = hour(nn);
      uint8_t mn = minute(nn);
      sprintf(filename, "%s_%02d%02d%02d%02d.%3s", fbase, mo, dd, hh, mn, ext);
      return &filename[0];
    }
    
    
    
    // GPT IRQ Handler
    void GPT_Handler(void) {
      static uint16_t bnum1, bnum2;
      static uint32_t capt1, capt2;
      uint32_t statreg;
    
    
      statreg = GPT2_SR;
      if (statreg & GPT_SR_IF1) { //capture on channel 1
        LEDON
        capt1 = GPT2_ICR1;   //read  counts
        GPT2_SR = GPT_SR_IF1;   //reset IF1 flag
        *bptr1++ = capt1 - lastcapt1;
        lastcapt1 = capt1;
        samplecount1++;
        if (samplecount1 == MAXSAMPLES) { // mark buffer ready and switch
          outptr1 = bpt1[bnum1];  // mark old buffer as full
          bnum1 = (bnum1 ^ 0x01);  // switch beteween buffers 0 and 1
          bptr1 =  bpt1[bnum1];
          samplecount1 = 0;
        }
        LEDOFF
      }
    
      if (statreg & GPT_SR_IF2) { //capture on channel 2
        LEDON
        capt2 = GPT2_ICR2;   //read  counts
        GPT2_SR = GPT_SR_IF2;   //reset IF2 flag
        *bptr2++ = capt2 - lastcapt2;
        lastcapt2 = capt2;
        samplecount2++;
        if (samplecount2 == MAXSAMPLES) { // mark buffer ready and switch
          outptr2 = bpt2[bnum2];  // mark old buffer as full
          bnum2 = (bnum2 ^ 0x01);
          bptr2 =  bpt2[bnum2];
          samplecount2 = 0;
        }
    
      }
      // asm("DSB"); // Adding this can extend IRQ from 360 to 1350 nanoseconds!
      // perhaps because it triggers a cache flush in EXTMEM
      LEDOFF
    }
    
    //  For complete data collection, the samples would be written to SD.
    //  For now, just show a histogram to evaluate stability and resolution.
    //  I assume that the maximum input interval is not greater than 32767
    //  GPT clock cycles or 0.218 milliseconds
    #define HISTOSIZE 32678
    static uint32_t histocounts[HISTOSIZE];   // 128KByte histogram buffer
    void ShowHisto(uint32_t numsamples, uint16_t tsvals[]) {
      uint32_t i, maxidx, dcount;
      uint32_t countmax;
      uint16_t outliers = 0;
      uint16_t firstval, lastval;
      double frmax, pdmax;
      // first set all histocounts to zero
      for (i = 0; i < HISTOSIZE; i++) histocounts[i] = 0;
      // Now build the histogram data
      countmax = 0;
      maxidx = 0;
      for (i = 1; i < numsamples; i++) {
        dcount = tsvals[i]; // read difference in time stamps
        if (dcount < HISTOSIZE) {
          histocounts[dcount]++;
          if (histocounts[dcount] > countmax) {
            countmax = histocounts[dcount];
            maxidx = dcount;
          }
        } else {
          // Serial.printf("Outlier: %lu\n", dcount);
          outliers++;
        }
      }
      // now display histogram.  Skip zero values in histogram data
      firstval = 0;  lastval = 0;
      for (i = 0; i < HISTOSIZE;  i++) {
        if ((firstval == 0)  && (histocounts[i] != 0)) firstval = i;
        if (histocounts[i] != 0) lastval = i;
      }
      // display firstval and lastval
      //Serial.printf("\nhFirst valid count at: %u    Last valid count at %u\n", firstval, lastval);
    
      Serial.println("Histogram clock intervals and samples.");
      for (i = firstval; i <= lastval; i++) { // show only non-zero counts
        if (histocounts[i]) Serial.printf("%u   %u\n", i, histocounts[i]);
      }
      // now show the frequency and period at maxidx count
      pdmax = (double)maxidx / 150e6;
      frmax = 1.0 / pdmax;
      Serial.printf( "Frequency at max: %8.2f KHz   Period at max:  %5.1f nSec\n\n",
                     frmax / 1000.0, pdmax * 1.0e9);
      //Serial.printf("Outliers:  %u\n", outliers);
      Serial.println();
    }
    
    
    // Show the timestamp intervals for the first numsamples from array
    void ShowSamples(uint32_t numsamples, uint16_t tsvals[]) {
      uint16_t i;
      Serial.println("CPU counts between rising edges");
      for (i = 0; i < numsamples - 1; i++) {
        if ((i % 10) == 0)Serial.printf("\n% 4u: ", i);
        Serial.printf("%6lu ", tsvals[i]);
      }
      Serial.println();
    }
    
    // Set the output PWM frequency for pins 5 and 6 for self test
    // The outputs are set for short positive pulses.  
    // The frequencies will not be exact, especially when the
    // requested values are above a few hundred KHz.
    void SetPWM(uint32_t f1, uint32_t f2) {
      fc1 = f1; fc2 = f2;  // Copy value to globals for later use
      analogWriteResolution(8);
      analogWriteFrequency(pwmpin1, f1);
      analogWriteFrequency(pwmpin2, f2);
      analogWrite(pwmpin1, 4);  //low duty cycle
      analogWrite(pwmpin2, 4);
      Serial.printf("\nSet F1 to %lu and F2 to %lu\n", fc1, fc2);
    }
    
    // Set up GPT 2 for input capture on pins 14 and 40 --  Code from Mloum
    void setupTimerGPT2(void) {
      // IOMUX Configuration in order to have physical access to capture 1 et 2 pins of GPT2
      //GPT capture 1
      IOMUXC_GPT2_IPP_IND_CAPIN1_SELECT_INPUT = 1;  // remap GPIO_AD_B1_03_ALT8 GPT2 Capture1 (Channel 1)
      IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_03 = 8; // GPT2 Capture1 configuration ALT8 Pin 15
      IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_03 = 0x13000; //Pulldown & Hyst
    
      //GPT capture 2
      IOMUXC_GPT2_IPP_IND_CAPIN2_SELECT_INPUT = 1;  // remap GPIO_AD_B1_04_ALT8 sur GPT2 Capture2 (Channel 2)
      IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_04 = 8; // GPT2 Capture2 configuration ALT8 Pin 40
      IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_04 = 0x13000; //Pulldown & Hyst
    
      //Configuration du bus d'horloge
    #define CCM_CSCMR1_PERCLK_CLK_SEL   ((uint32_t)(1<<6))
    
      // Change the Clock Controller Module in order to use PERCLK_CLK_ROOT for the counter and not OSC@24MHz (default)
      CCM_CSCMR1 &= ~CCM_CSCMR1_PERCLK_CLK_SEL; //
    
      // Change the prescaler between AHB_CLK_ROOT (typically 600MhZ) and PERCLK_CLK_ROOT (default is 4 -> 150MHz)
      //  CCM_CBCDR = CCM_CBCDR_IPG_PODF(1);  // NB I can't get 0 (that is to say no prescaler) to work
      CCM_CBCDR = CCM_CBCDR_IPG_PODF(3);  // Clocks faster than this (150mHz) cause intermitent errors.
      // Note that 150MHz clock yields 6.67 nanoSecond resolution
    
      // Set the CCM Clock Gating Register
      CCM_CCGR0 |= CCM_CCGR0_GPT2_BUS(CCM_CCGR_ON) |
                   CCM_CCGR0_GPT2_SERIAL(CCM_CCGR_ON);  // enable clock
    
      //Clear GPT2 registers, namely  CR, PR and SR
      GPT2_CR = 0;
      GPT2_PR = 0; //No prescaler.
    
      // "Clear" bit flags (ROV, IF1 and IF2) writing one in them
      GPT2_SR = GPT_SR_ROV |            //Clear bit ROV
                GPT_SR_IF1 |            //Clear bit IF1
                GPT_SR_IF2;             //Clear bit IF2
    
      //CR  register of GPT2 (Control Register)  Timer is set up, but not enabled
      GPT2_CR = GPT_CR_ENMOD |          //Set count to zero at enable
                GPT_CR_FRR |            //Free run mode
                GPT_CR_CLKSRC(1) |      //Clock source is Peripheral Clock
                GPT_CR_IM1(1) |         //Capture activated on channel 1 on rising edge only
                GPT_CR_IM2(1);          //Capture activated on channel 2 on rising edge only
    
      // clear the interrupt flags
      GPT2_SR = GPT_SR_ROV |
                GPT_SR_IF1 |
                GPT_SR_IF2;
    
      //IR register of GPT2 (Interruptions)
      GPT2_IR = GPT_IR_IF1IE |          //Interruption on Channel 1 capture
                GPT_IR_IF2IE;           //Interruption on Channel 2 capture
    
      // set up IRQ_GPT2 interrupts
      attachInterruptVector(IRQ_GPT2, &GPT_Handler );
      NVIC_SET_PRIORITY(IRQ_GPT2, 32); // high priority
      // IRQ is enabled by StartSampling
    }
    
    // initialize teensy time from  RTC for SD directories then start the SD Card.
    bool StartSDCard() {
      setSyncProvider(getTeensy3Time);
      if (timeStatus() != timeSet) {
        Serial.println("Unable to sync with the RTC");
      } else {
        Serial.println("RTC has set the system time");
        unixtime = now();
      }
    
      if (!SD.begin(BUILTIN_SDCARD)) {
        Serial.println("\nSD File initialization failed.\n");
        return false;
      } else  Serial.println("initialization done.");
      // set date time callback function for file dates
      SdFile::dateTimeCallback(dateTime);
      return true;
    }
    
    /*
       User provided date time callback function.
       See SdFile::dateTimeCallback() for usage.
    */
    void dateTime(uint16_t* date, uint16_t* time) {
      // use the year(), month() day() etc. functions from timelib
    
      // return date using FAT_DATE macro to format fields
      *date = FAT_DATE(year(), month(), day());
    
      // return time using FAT_TIME macro to format fields
      *time = FAT_TIME(hour(), minute(), second());
    }
    
    /*****************************************************************************
       Read the Teensy RTC and return a time_t (Unix Seconds) value
    
     ******************************************************************************/
    time_t getTeensy3Time() {
      return Teensy3Clock.get();
    }

  5. #5
    Senior Member
    Join Date
    Feb 2018
    Location
    Corvallis, OR
    Posts
    349

    Finally one specific question :
    - Could you tell me how to reduce the access time to the registers? Or, conversely, could you explain to me why it can't be reduced?
    Access time to the registers is slower than normal DTCM memory access because the peripheral bus runs at only 150MHz, whereas the internal bus to DTCM memory runs at the 600MHz CPU clock rate.

    I think that adding the DSB instruction forces the processor to halt the internal 3-instruction pipeline, which slows things down even more.

    In your sample code, you had a DSB instruction at the end of your interrupt handler. If you are writing to the QSPI PSRAM in the handler, that DSB forces the system to wait until the EXTMEM cache is fully written before exiting the interrupt handler. That can greatly increase the duration of the interrupt handler.

    The DSB at the end of the handler is really needed only if your main program is going to read the EXTMEM data very quickly after the return from the interrupt. If you are writing to one buffer in the interrupt and reading from another to write to SD in the main program, you should never be reading the data written in the interrupt until long after it was written---when the file write gets to the end of the buffer.

  6. #6
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    1,055
    A tight loop reading and testing GPIO6 takes about 33 nsec per edge - meeting the 50 ns temporal precision target. This does tie up the processor such that nothing else gets done.

  7. #7
    Junior Member
    Join Date
    May 2021
    Posts
    2
    Thank you mborgerson for your very detailed answers.

    I have right now to meet a deadline for something else but we will definitely look this into details a give some feedback soon.

  8. #8
    Senior Member
    Join Date
    Feb 2018
    Location
    Corvallis, OR
    Posts
    349
    Quote Originally Posted by jonr View Post
    A tight loop reading and testing GPIO6 takes about 33 nsec per edge - meeting the 50 ns temporal precision target. This does tie up the processor such that nothing else gets done.
    I ran a test of the following collection loop with an input signal of about 2.36MHz:
    Code:
    // Collect time stamps for rising edges
    void  CollectSample(uint32_t numsamples) {
      uint16_t scount = 0;
    
      while (digitalReadFast(inpin1)); // wait for pin low to start
    
      do {   
        while (!digitalReadFast(inpin1)); // wait for rising edge
        timestamps1[scount] = ARM_DWT_CYCCNT; // capture CPU cycle count
        scount++;
        while (digitalReadFast(inpin1)); // wait for pin low to loop
     
      } while (scount < MAXSAMPLES);
    
    }
    The resulting data gave the following histogram for 10240 samples:
    Code:
    Histogram clock intervals and samples.
    244   1
    247   5694
    250   1
    251   2
    252   1
    256   4539
    257   1
    I think that the 9 clock cycle difference indicates the time to make an extra loop when the test just misses an incoming rising edge. That 9 clock cycles is about 14.4 nanoseconds. If you were timing your loop with a scope using a signal bit, your loop would be longer due to the time needed to set the signal bit.

    I tried looking at the assembly listing to count instructions, but the listing was nearly unintelligible with the standard 'fastest' optimization due to instruction reordering, dual issue, etc.etc. With 'debug' optimization the listing was a bit clearer---but still beyond my skills with ARM assembly. With that optimization, the histogram indicated about 13 clock cycles of spread, or
    20.8 nanoseconds loop time waiting for the rising edge.

    When I try the same clock as one input to my 2-channel GPT code, I get the following histogram:
    Code:
    Histogram clock intervals and samples.
    167   511999
    Frequency at max:   898.20 KHz   Period at max:  1113.3 nSec
    
    Histogram clock intervals and samples.
    62   1
    63   116320
    64   387783
    65   122
    66   2
    127   3449
    128   4321
    130   1
    Frequency at max:  2343.75 KHz   Period at max:  426.7 nSec
    Note that the input for the 2.3MHz signal has a significant number of values at twice the sample period. I think that this means the combination of short period, two-channel collection, and SD storage slows things enough that the software can't keep up and misses a pulse about 3% of the time. Note also that there is no spread or missed pulses on the 898KHz signal on channel 1. The lack of spread is probably because the signal is generated with a PWM output on the T4.1 and the PWM and GPT clocks are derived from the same source.

    If I slow the signal generator down to 1.7MHz, the histogram cleans up nicely:
    Code:
    Histogram clock intervals and samples.
    167   511999
    Frequency at max:   898.20 KHz   Period at max:  1113.3 nSec
    
    Histogram clock intervals and samples.
    86   2
    87   330467
    88   181529
    89   1
    Frequency at max:  1724.14 KHz   Period at max:  580.0 nSec
    The uncertainty of the 1.7MHz signal is now down to 1 GPT clock cycle, or 6.67 nanoseconds. The reference manual says that there is an intrinsic uncertainty of 1 count in the capture data, and this experiment seems to confirm that.

    I think that the speed limit for this method is determined by the interrupt handler latency and the duration of the ISR handler in the worst case---when two pulses arrive within a few nanoseconds of each other and both are processed in the same call to the ISR. It takes longer to process the input, and in the worst case there is additional time required to switch between storage buffer pointers and zero the sample count.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •