Hello everyone,
I am an assistant professor at the University of Angers (France). We are currently trying to use the teensy 4.1 to counts single photons or more precisely to count and tag-time the electrical pulses emanating from single-photon counter (i.e. Avalanche PhotoDiode).
Just for information, from this raw data, we get access to the Brownian motion of nanoparticles in solution and we can measure their size and shape (via Fluctuation Correlation Spectroscopy or Dynamic Light scattering).
Obviously, electronics for time-tagging does exist (and, for the initiated, we do have a TCSPC card) but are quite expensive (a few thousand euros). Open source projects based on FPGA do exist (like this one that we have tested in the past) but FPGA are... hard to use (at least for us now).
The idea is to use a fast micro-controller (the teensy 4.1) to time tag the pulses, thoroughly test it and then create an open-source project (and also publish an article because, apparently, that's why I am paid for...).
Our aim is to time-tag the pulses with a least a temporal precision of, let's say 50ns, with a count rate as high as possible (at least 1 million pulses per second).
For this goal, we are using the input captures of the 32 bits General Purpose Timer (GPT).
We are using double buffering to transfer the capture register to RAM, then from RAM to PSRAM, and then from PSRAM to a fast SDCard. So far so good except for the count rate that can't exceed 500kHz without getting wrong capture times.
Consequently, we have investigated and found an unexpected bottleneck.
We have measured on the oscilloscope the time taken for the capture interrupt in order to take place (see code below) and found it was 1µs (which explains the problem at 500kHz).
By increasing the clock speed sent to the GPT (PERCLCK_CLK_ROOT) from 24Mhz to 300Mhz we have reduced the interrupt total time to around 500ns. Even the minimal interrupt function (only resetting the interrupt flag) takes 150ns.
In other words, accessing the registers for writing or reading takes a lot of cycles and we did not plan it that while crafting all our memory transfers map in order to measure at the higher rate possible.
Now, time for some code.
Here is the setup of the GPT timer (number 2) in order to perform input capture :
Here is the code to start the capture :
And here is the code executed during the interrupt capture :
And here is the "minimalistic" interruption I was mentioning earlier and takes 150ns :
As a side note, and this could be a clue, when the interruption is only :
I get a periodic signal at "only" 23Mhz (instead of the 150Mhz that digitalWriteFast toggling can attain)
Finally one specific question :
- Could you tell me how to reduce the access time to the registers? Or, conversely, could you explain to me why it can't be reduced?
I am an assistant professor at the University of Angers (France). We are currently trying to use the teensy 4.1 to counts single photons or more precisely to count and tag-time the electrical pulses emanating from single-photon counter (i.e. Avalanche PhotoDiode).
Just for information, from this raw data, we get access to the Brownian motion of nanoparticles in solution and we can measure their size and shape (via Fluctuation Correlation Spectroscopy or Dynamic Light scattering).
Obviously, electronics for time-tagging does exist (and, for the initiated, we do have a TCSPC card) but are quite expensive (a few thousand euros). Open source projects based on FPGA do exist (like this one that we have tested in the past) but FPGA are... hard to use (at least for us now).
The idea is to use a fast micro-controller (the teensy 4.1) to time tag the pulses, thoroughly test it and then create an open-source project (and also publish an article because, apparently, that's why I am paid for...).
Our aim is to time-tag the pulses with a least a temporal precision of, let's say 50ns, with a count rate as high as possible (at least 1 million pulses per second).
For this goal, we are using the input captures of the 32 bits General Purpose Timer (GPT).
We are using double buffering to transfer the capture register to RAM, then from RAM to PSRAM, and then from PSRAM to a fast SDCard. So far so good except for the count rate that can't exceed 500kHz without getting wrong capture times.
Consequently, we have investigated and found an unexpected bottleneck.
We have measured on the oscilloscope the time taken for the capture interrupt in order to take place (see code below) and found it was 1µs (which explains the problem at 500kHz).
By increasing the clock speed sent to the GPT (PERCLCK_CLK_ROOT) from 24Mhz to 300Mhz we have reduced the interrupt total time to around 500ns. Even the minimal interrupt function (only resetting the interrupt flag) takes 150ns.
In other words, accessing the registers for writing or reading takes a lot of cycles and we did not plan it that while crafting all our memory transfers map in order to measure at the higher rate possible.
Now, time for some code.
Here is the setup of the GPT timer (number 2) in order to perform input capture :
Code:
void setupTimerGPT2(void)
{
// IOMUX Configuration in order to have physical access to capture 1 et 2 pins of GPT2
//GPT capture 1
IOMUXC_GPT2_IPP_IND_CAPIN1_SELECT_INPUT = 1; // remap GPIO_AD_B1_03_ALT8 GPT2 Capture1 (Channel 1)
IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_03 = 8; // GPT2 Capture1 configuration ALT8 Pin 15
IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_03 = 0x13000; //Pulldown & Hyst
//GPT capture 2
IOMUXC_GPT2_IPP_IND_CAPIN2_SELECT_INPUT = 1; // remap GPIO_AD_B1_04_ALT8 sur GPT2 Capture2 (Channel 2)
IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_04 = 8; // GPT2 Capture2 configuration ALT8 Pin 40
IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_04 = 0x13000; //Pulldown & Hyst
//Configuration du bus d'horloge
// #define CCM_CSCMR1_PERCLK_CLK_SEL ((uint32_t)(1<<6))
// Change the Clock Controller Module in order to use PERCLK_CLK_ROOT for the counter and not OSC@24MHz (default)
CCM_CSCMR1 &= ~CCM_CSCMR1_PERCLK_CLK_SEL; //
// Change the prescaler between AHB_CLK_ROOT (typically 600MhZ) and PERCLK_CLK_ROOT (default is 4 -> 150MHz)
CCM_CBCDR = CCM_CBCDR_IPG_PODF(1); // NB I can't get 0 (that is to say no prescaler) to work
// Set the CCM Clock Gating Register
CCM_CCGR0 |= CCM_CCGR0_GPT2_BUS(CCM_CCGR_ON) |
CCM_CCGR0_GPT2_SERIAL(CCM_CCGR_ON); // enable clock
//Clear GPT2 registers, namely CR, PR and SR
GPT2_CR = 0;
GPT2_PR = 0; //No prescaler.
// "Clear" bit flags (ROV, IF1 and IF2) writing one in them
GPT2_SR = GPT_SR_ROV | //Clear bit ROV
GPT_SR_IF1 | //Clear bit IF1
GPT_SR_IF2; //Clear bit IF2
//CR register of GPT2 (Control Register)
GPT2_CR = GPT_CR_EN | //EN = 1 activate TIMER GPT2
GPT_CR_FRR | //Free run mode
GPT_CR_CLKSRC(1) | //Clock source is Peripheral Clock
GPT_CR_IM1(1) | //Capture activated on channel 1 on rising edge only
GPT_CR_IM2(1); //Capture activated on channel 2 on rising edge only
//IR register of GPT2 (Interruptions)
GPT2_IR = GPT_IR_ROVIE | //Interruption on overflow of the 32bits counter
GPT_IR_IF1IE | //Interruption on Channel 1 capture
GPT_IR_IF2IE; //Interruption on Channel 2 capture
}
Here is the code to start the capture :
Code:
void Start_Capture(void)
{
// Clear the interrupt flags
GPT2_SR = GPT_SR_ROV |
GPT_SR_IF1 |
GPT_SR_IF2;
// Custom variable initialization (not very relevant for this post)
TimeTagPtr1 = 0;
TimeTagPtr2 = 0;
TimeTagNb1 = 0;
TimeTagNb2 = 0;
OverflowCount = 0;
//Double buffer management
halfBuffer1 = false;
halfBuffer2 = false;
//enable IRQ_GPT2 interruption
NVIC_ENABLE_IRQ(IRQ_GPT2);
}
And here is the code executed during the interrupt capture :
Code:
void GPT2capture() {
// For timing on the oscilloscope
//digitalWriteFast(DEBUG_BLINK_PIN, HIGH);
//Test the origin of interruption
if (GPT2_SR & GPT_SR_ROV) {
//32 bits counter overflow
GPT2_SR |= GPT_SR_ROV; //Reset ROV flag
// Tag the event "overflow" with the special time 0xFFFFFFFF
PsRamBuffer1[TimeTagPtr1++] = 0xFFFFFFFF;
PsRamBuffer2[TimeTagPtr2++] = 0xFFFFFFFF;
// buffer wrap up
if (TimeTagPtr1 == buffersize) TimeTagPtr1 = 0;
if (TimeTagPtr2 == buffersize) TimeTagPtr2 = 0;
}
if (GPT2_SR & GPT_SR_IF1) {
//capture onchannel 1
GPT2_SR |= GPT_SR_IF1; //reset IF1 flag
PsRamBuffer1[TimeTagPtr1++] = GPT2_ICR1; //read and store capture register
if (TimeTagPtr1 == buffersize) TimeTagPtr1 = 0;
TimeTagNb1++;
}
if (GPT2_SR & GPT_SR_IF2) {
//capture onchannel 2
GPT2_SR |= GPT_SR_IF2; //reset IF2 flag
PsRamBuffer2[TimeTagPtr2++] = GPT2_ICR2; //read and store capture register
if (TimeTagPtr2 == buffersize) TimeTagPtr2 = 0;
TimeTagNb2++;
}
asm volatile ("dsb"); // wait for clear memory barrier
// For timing on the oscilloscope
//digitalWriteFast(DEBUG_BLINK_PIN, LOW);
}
And here is the "minimalistic" interruption I was mentioning earlier and takes 150ns :
Code:
void GPT2capture()
{
GPT2_SR |= GPT_SR_IF1; //Reset the interruption flag (assuming that only event are detected on channel 1)
asm volatile ("dsb");
}
As a side note, and this could be a clue, when the interruption is only :
Code:
void GPT2capture()
{
digitalWriteFast(DEBUG_BLINK_PIN, HIGH);
asm volatile ("dsb");
digitalWriteFast(DEBUG_BLINK_PIN, LOW);
}
I get a periodic signal at "only" 23Mhz (instead of the 150Mhz that digitalWriteFast toggling can attain)
Finally one specific question :
- Could you tell me how to reduce the access time to the registers? Or, conversely, could you explain to me why it can't be reduced?