PMT pulse counting using Teensy 3.2

Status
Not open for further replies.
Would be fun/easy to test Serial.print() timing to USB. It is very efficient - not sure if not using a conditional and just sending each sample - where USB will decide when to send the packet. Do as indicated and make one full buffer and send conditionally, or just gather all sample data and send when sampling is done.

A T_3.6 can get just over 20K lines of 35 chars to the PC in a second, the T4 with higher speed USB is getting 100-300K depending on OS and the ability of the PC to receive the messages. In seeing that test I wonder but didn't attempt to see how much time is consumed in the USB transfers.

May be a way to look into the FTM code and let it count and see about reading it's count for change - if possible to do that without interrupting its function?

Making yield() void is handy - better to just stay in loop() during the test - but that would still allow common interrupts - like systimer 1000/sec.

T_3.6 can make some few million loop() passes per second - T4 I've seen over 20 million with minimal code just counting the passes and watching for a second to expire.

Knowing how long a sample test period lasts - and how many hits are expected between Trigger/reset would help understand the output data rates.
 
Hi I have a very elementary arduino experience and i need some advice: I would like to build a pendulum clock timer where an optical sensor detects the passage of the pendulum bob and measures its half period with a resolution of at least 1 micros, hence with a relative resolution of 1 ppm, as an order of magnitude.
The output should be acquired through a standard usb port over long times, weeks to months, to study the reliability and precision of the clock. The environment is a typical indoor apartment at pretty much constant temperature and I will calibrate the counter independently to 1ppm precision to take care of a potential arduino clock drift.
First of all, do you think an arduino would work for this task? Second which kind of counter scheme/program should I adopt?
Thanks oscar
 
Dear all,

Regarding a comment of Defragster in post #18 "should stay high a min of 5-10 ns to be detectable": I need to check it, but the TTL from the PM should stay HIGH longer than 10ns.

Regarding a comment of Defragster in post #19 "How long does a sample period run with photon firings?": There is two time scales associated with the acquisition time. One "experiment" last in the order of milliseconds (50ms - 500ms), but the "experiment" needs to be repeated many times. How many times? Enough to have good statistics (1000-10000 times). After each "experiment", I was thinking in putting the counter to 0 with a trigger signal, as illustrated in the figure on my post #15.

Regarding a comment of Defragster in post #19 "the rate was on 'Clock Ticks'?": Keep in mind that I will not have photons coming at the full rate of the PhotoMultiplier all the time, but occasionally it can happen: there is a probabilistic aspect to the emission of the photon after all! I want to minimise missing counts as much as possible. I will like not to be limited by the counter itself as I am already limited by the dead time of the Photo Multiplier. Currently I have around 5000 counts / s (measured using the existing expensive counter), but as I was saying, they are not equally distributed on time, that will be too easy ;-)

Regarding a comment of Defragster in post #19 "if the Trigger/Reset and the TTL_PM signals did a CHANGE only": Sorry I was not clear enough, I only care about a RISE change, the FALL does not carry any information in this context.

Regarding post #25 of Xfer: from I read in another of your posts (post #5 of https://forum.pjrc.com/threads/54363-Teensy-3-6-speed-of-DigitalReadFast) you were able to read and store in 41.7ns. But in that example, you store the status an not the time. I would have expected to have a detection of pin state change with better time resolution than that given the master clock speed.

Also, I should mention that I already have an Arduino DUE. The amtel chip on the DUE can do a fair amount and it will have to do until the Teensy3.6 arrives... I have been reading a bit the last two days and doing some tests with the DUE. I imagine ( I hope?) that what I have learn from the DUE will be easily transported to the Teensy3.6 and Teensy4.0. The prospect of having a time resolution of 2ns with the Teensy4.0 is just amazing!

Also, I have come across "https://github.com/manitou48/DUEZoo/blob/master/isrperf.txt" leading to the conclusion that interrupts should be avoided as much as possible.

The conclusion I have arrived so far (with the help in this forum and at the Arduino Due forum) is that the best way to go is to implement a code that blocks the microprocessor until a pin state change (LOW --> HIGH) happens, and use an interrupt to reset the clock counter.

In any case, I have the feeling that at least on of my original questions (which I may not have formulated them in the best terms at the start of this discussion): which time resolution can I expect from the Teensy3.6? I think we all agree that it should be possible to achieve 1/F_CPU, which is really impressive.

The second question is: at which rate it is possible to detect a pin change and save the number of ticks in terms of 1/F_CPU in a buffer?
I think that the answer to this second questions is much more difficult and depends strongly on the way the whole thing is approached.

Regarding post #26 by Defragster: I still have not worried (too much) about how to get the data to the PC. Serial should be fast enough to empty the buffer. I can also have a dead time between "experiments" to send data to PC. There is also the DMA to play with if I am really in trouble to get the data out (never done it before, but it seems the way to go in this context from what I have read)


I would also thanks again for the huge feedback and effort put in most of the posts!
 
Seems interesting - I'd play with some of the timings more but … other stuff. … too many questions … re the first 3 lines p#28

So if the TTL stays high 10 ns - then off some similar time - that might mean two photon triggers would be at least 20 ns apart? Are they expected at most ONE per triggering of the emitter?

5,000 counts in sample time of 500 ms is 10 /ms

The note on CHANGE related to how Teensy handle pin change interrupts - each change causes attached interrupt code to wake to verify the state for change requiring call IIRC, few transitions is better - though the timing of these likely to overwhelm interrupt processing per post #23.

Ran @Xfer's linked post ReadFast on T_3.6 at 256 MHz and it shows 31.3 ns and on current beta of new Teensy it shows : DigitalReadFast took 16.7 ns - so polling reads won't catch it either. changing that test1() for to read 'ARM_DWT_CYCCNT' shows it at 5.0 ns - but that includes no conditional logic.

Tightest loop recording CYCCNT to an array - without complete control - showing 47 ns on the 256 MHz units and 25 ns on the 600 MHz unit - with some other mods. This just fills an array with current CYCNT and only increments on PIN HIGH - but it is prone to over and under counting where it could read high on two loops - and most of the time is spent not reading the current value allowing it to be missed. This flawed loop does show it takes 15 cycles for the loop by taking the difference of two elements on the 600 MHz unit - which comes out to the 25 ns measured.
Code:
FASTRUN void test0()
{
	for (uint32_t iCycle = 0; iCycle < N_CYCLES_TEST0; iCycle+=digitalReadFast(D2)) // - never ends without 1 on pin 2
		dataBuffer[iCycle] = ARM_DWT_CYCCNT;
}

This test count be rewritten with _isr on interrupt to see what it can catch … indeed quite possible interrupts would be no better or worse … but ...

Since you ordered a Teensy and want to use it in place of DUE it may be that what was done there could be ported to Teensy and would be interesting to see. Would be a surprising first to see the DUE do anything faster than a T_3.6 can when properly written for Teensy's MCU.

Using the CYCCNT only needs to be recorded like above - where whatever the time difference is - even if the counter wraps the count cycles works where all numbers are unsigned 32 bit integers. Ignore the text somewhat - but the math is right
Code:
 	for (uint32_t iCycle = 0; iCycle < 10; iCycle++)
	{
  		Serial.print("DigitalReadFast Record Loop took "); Serial.print(dataBuffer[iCycle]); Serial.println(" cycles\t");
  		Serial.print("DIFF "); Serial.print([B]dataBuffer[iCycle+1] - dataBuffer[iCycle][/B]); Serial.println(" cycles\n");
	}

Indeed the Teensy CYCCNT is updated at F_CPU - but it takes as noted above some multiple cycles to do work around that and then record that 'current' value.

Reading your questions 1 & 2 that is a best guess above with polling. Using this test against the pseudo code posted earlier would show how it compares.

The question about data transfer was only important if any sort of real time updates were required. Results can easily be stored and transmitted between test periods it seems. For 10,000 4 byte numbers the Teensy can push that out USB in no time at 1 MByte/sec minimum based on the receiving end slowing it down. Or write it to a Hard Disk or Flash drive to be recovered after testing.
 
Last edited:
Regarding post #25 of Xfer: from I read in another of your posts (post #5 of https://forum.pjrc.com/threads/54363-Teensy-3-6-speed-of-DigitalReadFast) you were able to read and store in 41.7ns. But in that example, you store the status an not the time. I would have expected to have a detection of pin state change with better time resolution than that given the master clock speed.

And it probably is better: that test was done in a double-level for() loop, which slow things down. Moreover, there was an AND operator in the deepest loop.
In your case (my sample polling code) I think you would see better timings. :)
 
Hi everyone,

I do not have yet a Teensy to play with, so I have been playing with my DUE. I just wrote a quick update on the Arduino DUE forum. I link it here (https://forum.arduino.cc/index.php?topic=400516.15) if somebody wants to see how far I have got using that other platform. A long story short: using interrupts and systick, I can handle a square waveform of 1MHz with a 12ns time resolution (84MHz from the DUE)

Have a nice weekend!
 
I've had some good success counting pulses using interupts on the 3.5 and 3.6. Method I'm using is gateing the 3.3V output. I'm counting the pulses of an precision energy meter, used for the calibration of energy metering used on a national grid electricity network.

- Counting on one interupt on teensy work 100% perfect..
- Counting two pulses on one teensy. count is almost the same on both counts. when it should be somthing different. -
- Having two sepertate teensy each counting one pulse but on the same computer. same results as counting two on the one board.
- Having two sepertate teensy each counting one pulse but on a different computer. 100% perfect

Any ideas would be appreciated.
 
@jofre - I did a quick read of the DUE notes - nearly good reading 1 MHz pulses. From notes the Teensy 3.6 can improve on with interrupts that maybe if only by the difference in clock speed 256/84==3? Maybe still an order of magnitude short. I didn't get to write my test yet.

Polling isn't any better because of the logic needed to test and decide to record - I suppose that is where DMA would do better blindly reading/recording during the 50-500ms test period and then parsing the results after it ends.
 
Got the intended test working - need to refine it and observe the results - initial testing is against a PWM driven frequency - so continuous cycles of 2 to 12 MHz. Need to clean that up for best results to post , then take an outside device to feed the intermittent nature of the data stream here.

@jofre - seeing 5,000 marks in 500 milliseconds is very sparse at 10 per ms? How many generally appear in groups of what size? Can you give an example of the output from your current device? Would that be 500 counts across 50 ms?

Teensy 3.6 an accurate counter good up to 65 MHz, so a clean 22 MHz signal can be counted. Polling with T_3.6 at 256 MHz processor speed it takes 100 cycles between detection samples so against a constant clock of 25.6 MHz each time stamp period is a bucket of 9,10 or 11 counts that are recorded like that. Dropping the PWM freq to 2.5 MHz continuous it seems to be catching them all within 100 cycles. About an order of magnitude low - depending on the 10/ms distribution it might be useful?

I also did an interrupt counting that looks to be capturing 1 per period that is 144 cycles apart with a frequency of 1.85 MHz - so this polling method actually works much better

I can say a 600 MHz Teensy can do a bit better - but that should be posted on that thread when I get cleaner results and code worth posting.

By the way I am recording 8 bytes per sample in 5-10 ms for 25500 samples as that is what fits in memory { 4 bytes cycle count and 4 bytes running counter - for short runs this could be cut in half but 25K sample aren't needed as so far 5K is the indicated max } - and a record is only made when the 'input' is high. Working DMA at speed would record copious amounts of data if running fast enough it would capture each High multiple time and then more often the Low periods waiting as it would have to be a continuous stream AFAIK - and that data would have to be stored for processing to eliminate duplicates and recover the clock periods based on sample rate it would seem.
 
So, I've been playing around with a T3.2 (only Teensy I have on hand right now). I still think @ jofre's problem statement is not completely defined. First, if the experiment can really produce even short burts of pulses spaced only 45ns apart, then I'd say it's game over -- you're going to miss pulses. Sure even the T3.2 can COUNT pulses at up to 65 MHz. But there's a big difference between doing that and recording the arrival time of every pulse. The first is done purely in hardware while the second requires code involvement.

I experimented using an FTM with pre-scaler = 1 (48 MHz on a T3.2). That gives ~21ns timing resolution. The first problem is that the 16-bit FTM counter will rollover every ~1.4ms at this clock rate. So, I extended it to 32-bits by capturing its overflow events and keeping and incrementing the upper 16 bits of a 32-bit variable. That gets you ~89 seconds between rollovers -- seems reasonable. I also experimented with a 64-bit counter giving more than 12,000 years between rollovers -- perhaps a bit excessive.

So, using the 32-bit counter, I successfully captured input pulse and stuffed their arrival time into an array. But only up to about 1 Mhz input pulse rate. After that, even the short ISR takes too long. That number can surely be scaled up by the clock rate of the faster T3.6.

The nice thing about using input capture mode on the FTM is that capturing the arrival time is handled by hardware so there's no variable latency period that doing it with an ISR would cause.

Also, I definitely wouldn't try to output the data at the same time it's being captured. First capture, then send to the PC , then repeat. Define a buffer size and fill it, rather than sampling for a fixed amount of time and ending up with a variable number of samples. I think for the probability distributions involved, confidence in the statistics is determined by the number of samples you gather, not the amount of time you spend gathering them.

The example code below is just the sampling portion. It simply stuffs the timestamps into a circular buffer. It also computes the time delta between samples and prints a message if it falls outside of an expected window. I was testing with a fixed 1 MHz input signal, so this would indicate a sampling problem.

Last thing to consider when testing solutions is to make sure you input signal is asynchronous to the Teensy (I used different Arduino board). If your generate the signal within the same Teensy, it will be synchronous with the time measurement. This might produce unrealistic results.

Code:
#include "Arduino.h"

void printHex(uint32_t val);

volatile bool incorrectDelta = false;
volatile uint32_t deltaVal;

const uint8_t logBufferSize = 12;
const uint16_t bufferSize = 1 << logBufferSize;
const uint16_t indexMask = bufferSize - 1;
volatile uint32_t sampleBuffer[bufferSize];

void setup() {
  const uint8_t triggerInput = 3;

  Serial.begin(256000);
  delay(1000);
  Serial.println("Starting");
  delay(1000);

  pinMode(triggerInput, INPUT);

  FTM1_SC = 0;
  FTM1_MODE = FTM_MODE_WPDIS;
  FTM1_CNT = 0000;      // Reset counter
  FTM1_MOD = 0xFFFF;       // Terminal count

  *portConfigRegister(triggerInput) = PORT_PCR_MUX(3);  // FTM1 Ch 0 pin - PTA12
  FTM1_C0SC = FTM_CSC_ELSA;   // FTM1 Ch0 capture on rising edge

  FTM1_SC &= ~FTM_SC_TOF;      // Force read-modify-write, clear TOF flag
  FTM1_SC |= FTM_SC_TOIE;     // enable overflow interrupt

  FTM1_C0SC &= ~FTM_CSC_CHF;  // Force read-modify-write, clear CHF flag
  FTM1_C0SC |= FTM_CSC_CHIE;  // enable channel interrupt

  NVIC_SET_PRIORITY(IRQ_FTM1, 0);
  NVIC_ENABLE_IRQ(IRQ_FTM1);        // Enable FTM1 interrupts

  FTM1_SC |= FTM_SC_CLKS(1) | FTM_SC_PS(0); // run from sys clk
}

void loop() {
  static uint32_t oldMicros = micros();
  uint32_t currentMicros;
  uint32_t localCounterVal;

  if (incorrectDelta) {
    currentMicros = micros();
    noInterrupts()
    ;
    incorrectDelta = false;
    localCounterVal = deltaVal;
    interrupts()
    ;
    Serial.print(currentMicros - oldMicros);
    Serial.print(":  ");
    printHex(localCounterVal);
    Serial.println();
    oldMicros = currentMicros;
  }
}

void ftm1_isr() {
  static uint32_t upperWord = 0;
  static uint32_t lastCounter = 0;
  static uint16_t bufferIndex = 0;
  uint32_t delta;
  uint32_t capturedCounter;
  uint32_t ftmReg;

  ftmReg = FTM1_C0SC;
  if (ftmReg & FTM_CSC_CHF) {
    FTM1_C0SC = ftmReg & (~FTM_CSC_CHF);
    capturedCounter = FTM1_C0V | upperWord;
    if (capturedCounter <= lastCounter) {
      capturedCounter += 0x10000;
      upperWord += 0x10000;
      FTM1_SC &= ~FTM_SC_TOF;
    }
    delta = capturedCounter - lastCounter;

    if ((delta < 0x2C) || (delta > 0x3C)) {
      deltaVal = delta;
      incorrectDelta = true;
    }
    lastCounter = capturedCounter;

    sampleBuffer[bufferIndex++] = capturedCounter;
    bufferIndex &= indexMask;

  } else {
    ftmReg = FTM1_SC;
    if (FTM1_SC & FTM_SC_TOF) {
      FTM1_SC = ftmReg & (~FTM_SC_TOF);
      upperWord += 0x10000;
    }
  }
}

void printHex(uint32_t val) {
  Serial.print("0x ");
  for (int8_t shift = 28; shift >= 0; shift -= 4) {
    uint8_t hexDigit = (val >> shift) & 0xF;
    Serial.print(hexDigit, HEX);
    if ((shift & 0xF) == 0) {
      Serial.print(" ");
    }
  }
}
 
Indeed timing for each pulse - unless they are far enough apart (150 to 200 clock cycles?) will not be half fast enough, unless 'buckets are good enough to know that 'some #' happened starting at time X.

Interrupts won't do it and the T_3.6 only does that maybe 3 times better with polling - a future T4 can do better than that as I've mocked up a test - not knowing what to expect - but probably somewhere 8 to 12 MHz continuous.

Not seeing the answer to post #35 "example of the output"/'data stream' request … I've been busy on other things and the code is still waiting cleanup and a test against something beside continuous data.

Polling loop looks like this where it won't quit until all samples are counted - which worked on continuous pin toggle to exhaust the 25K sample buffer array in 5-10 ms at high freq:
Code:
					while ( dScount < NUM_SAMPLES - 100 ) {
						while (!digitalReadFast ( PIN_isr ) );
						dSample[dScount][0] = ARM_DWT_CYCCNT;
						dSample[dScount][1] = counter_read();
						dScount++;
					}

Which is really similar to the _isr():
Code:
FASTRUN void pulse() {
	dSample[dScount][0] = ARM_DWT_CYCCNT;
	dSample[dScount][1] = counter_read();
	dScount++;
}

Where the counter_read records the running "FreqCount.h" counter value - or equivalent. And the ARM_DWT_CYCCNT is the processor count of cycles.
 
Status
Not open for further replies.
Back
Top