counting clock cycles as a fast timer

Status
Not open for further replies.

Dizzixx

Member
Hello all,

I apologize if this question comes off as ignorant but I am relatively inexperienced with micros etc. I have minimal experience using Arduino. I am working on a fast event timing circuit for time of arrival during hypervelocity events. I am interested in using a Teensy 3.6 as with the 180 Mhz processor if I were able to count with some accuracy and consistency the number of clock cycles to expire between a trigger and stop I should theoretically be able to determine the elapsed time.

I realize using the Teensy in this capacity is likely not the best, originally I was instead thinking of using a solution like the ACAM TDC-GPX chip or the TI THS-788, those would both be better suited to what I am trying to do but take a more significant time and resources investment. If I can achieve so-so results expediently with the Teensy that would be great as it would allow me to prove out other elements of the system at minimal cost and then re-consider a more fit to purpose design.

So with that background out of the way what is the most accurately that I can count the number of cycles between something like a trigger pin going hi and then a stop pin going high?

(I am being intentionally vague here to leave it open to the best solution, so by all means make wacky suggestions I am not particularly constrained yet)

What level of consistency can be achieved?

How consistent is the 180Mhz clock speed? Can I clock faster? If so will it affect stability/consistency?

Are there steps that can be taken to ensure consistency, ie - force some kind of reset prior to timing, disable certain functions/routines during timing, disable hardware components etc?

I imagine I may have to disable/pare down features in some way while doing the timing, that is fine, during timing I only care to count cycles. Between trigger and stop I expect roughly 1-2us to elapse but I would ideally like to store counts for a number of sequential stops each using the same trigger.

Something like what was being discussed in this thread: https://forum.pjrc.com/threads/28407-Teensyduino-access-to-counting-cpu-cycles

Thank you in advance for your feedback any and all help id greatly appreciated.
 
That post should show what you want: Teensyduino-access-to-counting-cpu-cycles I just used that to see 'times' across periods of no interrupts where normal clock ticks get counted.

The T_3.6 processor spec/default is 180 MHz - There should be options up to 240 MHz noted as Overclock on the Tools menu for the processor speed before compiling. Generally the T_3.6 seems stable up to 240 MHz - YMMV.

Manitou published RTC code for getting interrupts on the second - it matched closely the running microseconds. You could do that for the cycle count to see, or time some other known event for calibration at the time perhaps.
 
I would use a FTM timer in input capture mode. Take a look at FreqMeasureMulti. The timer runs at FBUS, that should be 60MHz at 180MHz CPU clock. You can expect the result to be cycle accurate.

If you are using pin interrupts, you will have some triggering jitter and less accuracy.
 
Based on https://forum.pjrc.com/threads/28039-Teensy-3-1-time-accuracy leads me to believe that micro frequency error is on the order of ppm where I was assuming full percents previously, making it for my purposes largely negligible.

How much trigger jitter can I expect? Right now in my model I am accounting for partial cycle error (ie - trigger falls between cycles) as well and an added pad of 2 cycles on top of that, does that sound fairly reasonable? Is there additional downtime when an interrupt flags or something I am unaware of?

As far as calibration goes I was thinking of having a .5-1 MHz oscillator circuit drive the trigger pins a fixed number of cycles and then count the number of triggers or something to compute some kind of fixed offset prior to using the timing circuit. Repeat multiple times and use mean offset. Sample duration probably 100x longer (~.2 ms) than expected shortest event trigger timing.

Well I think I am definitely going to buy a teensy 3.6 and try this out....
 
How much trigger jitter can I expect? Right now in my model I am accounting for partial cycle error (ie - trigger falls between cycles) as well and an added pad of 2 cycles on top of that, does that sound fairly reasonable?
No. Stating a number is tricky, but it can be quite a few clock cycles.

Even worse, something may disable interrupts. You can't really control this very well, since receiving data via USB or serial ports and all kinds of libraries will potentially disable interrupts. Even with very well behaved library code that will be hundreds of nanoseconds.
 
No. Stating a number is tricky, but it can be quite a few clock cycles.

Even worse, something may disable interrupts. You can't really control this very well, since receiving data via USB or serial ports and all kinds of libraries will potentially disable interrupts. Even with very well behaved library code that will be hundreds of nanoseconds.

Interesting link. Thank you for sharing. This is what I was afraid of. 12 cycles would push the error intolerably high. But if it were always a offset by a given amount, or offset over a repeatable amount given certain parameters.... as https://forum.pjrc.com/threads/26548-Syncing-the-RTC-with-milliseconds would lead me to believe may be possible then I could live with that.

Is there a way to temporarily disable USB/serial connections via software?
 
The interrupt overhead should be consistent - if no USB or SERIAL communications are in use they won't be active.

Is this just a static time measure or does it require a dynamic response? If static you could add a second/third Teensy and compare the answers?

I read the linked ARM notes - was wondering if PJRC set up for FPU 'lazy Stacking'::
It is software programmable whether the Cortex-M4 with FPU will stack seventeen floating point registers during interrupt entry. It is generally preferable to use the "lazy stacking" option, which defers this additional stacking until it becomes necessary, and in many cases avoids the need to stack these registers at all.

What is the duration of the event between trigger and stop? It is easy or expensive to repeat?

Perhaps Before, then After trigger: would sitting in a busy loop recording the cycle count could show where it breaks for the stop event? The Stop event could read cycle count, and the last recorded from the busy loop. Then you could detect discontinuities/consistency across the counts indicating unwanted interrupts.

At the hypervelocity under study - how much distance is 1/240000000 of a second?
 
It is effectively a static measure with post response readout to a screen and probably write to SD card.

I'll be honest while interesting and I will read through it at length. My first quick read of the ARM notes just gave me an idea of roughly how much latency could be expected. I do not entirely understand what is meant by lazy versus not lazy stacking and how it affects me as a user.

Velocities in the range of 7-8 km/s (so that 240MHz becomes 29-33 nm) and distances being measured are in the range of 12.7mm to 76.2 mm resulting in event durations (ie - trigger to stop) ranging from 1.59 to 10.89 microseconds. IF I think its doable, and that is a big if, I would want to cut the minimum spacing in half, doing this results in the spatial error becoming much more significant. I can hold .002'' (.050mm) tolerance on pin to pin spacing consistently getting lower than that gets expensive and difficult.

I would drive the spacing apart as this significantly reduces both the spatial and temporal error but the event is driven by high explosives which are fairly expensive and time consuming to deal with. I may end up doing this anyways...

I few the setup as having the following steps

1.)Turn on device. Device initializes and waits to acheive some kind of steady state
2.)Arm the device. Perform internal calibration against programable oscillator to check timing offset near the desired velocity (ie - from the above example 7km/s and 12.7mm spacing becomes 551 KHz) or as a sweep over some predefined range. Next arm the charging circuit. Timing may end up being through discharge of an RC circuit. You do not charge the RC circuit until ready and even then only behind several layers of controls, this would be the last step prior to going into measure more.
3.)Measure. Go into standby loop waiting for trigger (OR alternatively once armed start counting and treat trigger as any other pin and then just difference from the trigger pin, this is more attractive in some ways in that it makes setup more flexible... but I was worried about the long duration counts having more issues with consistency...and you have to deal with the cycle count resetting)
4.)Once triggered loop and if stop pin goes hi set cycle number of stop pin. Continue until all stop pins have gone hi once, or some timeout.
5.)Compute differences and print to LCD

I'm curios what the most elegant, consistent method of checking the pins during step 4 is. If I say check the state of all pins and write HAS_STOPPED and then also the current cycle count each iteration that is very consistent in that it does exactly the same thing each loop... but I expect the latency to increase significantly, question is would the latency remain more or less constant.... is there a way to streamline this process by handling all the pins of a bus at the same time?
 
You can handle pins by port, some info here though I'm sure there is actual examples google isn't finding yet
https://forum.pjrc.com/threads/26054-Teensy-3-1-I-O-Ports
Given the depth of RAM on a Teensy would it work to just dump the port state to a 32 bit array during the sampling period, then post event process them to get the actual state change times? Or even use the logic but buffer everything anyway to double the chances of capturing event.
 
Note in the last beta release, if you write to EEPROM, it has to temporarily drop the clock frequency from 180Mhz while it writes to the EEPROM. So you probably need to account for this if you are counting clock cycles.

The Teensy 3.6 does have an internal real time clock that counts the following:
The time counter consists of a 32-bit seconds counter that increments once every second
and a 16-bit prescaler register that increments once every 32.768 kHz clock cycle.

There are compensation registers that you can use to adjust the counting. Look at pages 1,319 through 1,342 of the fine datasheet (for 3.6)
 
Note in the last beta release, if you write to EEPROM, it has to temporarily drop the clock frequency from 180Mhz while it writes to the EEPROM. So you probably need to account for this if you are counting clock cycles.

The Teensy 3.6 does have an internal real time clock that counts the following:


There are compensation registers that you can use to adjust the counting. Look at pages 1,319 through 1,342 of the fine datasheet (for 3.6)

Can you explain the comment on the EEPROM and counting? I am not sure why I would need to write to EEPROM while counting cycles. I can see how it might be nice to have that redundancy if something failed but if it affects timing it would be better to just forgo writing to EEPROM or some other non-volatile memory.

I will look at the compensation stuff when I get a chance. I ordered the Teensy yesterday and a bunch of other stuff so hopefully I will get a chance to play with things this weekend.
 
The input pin interrupts are synchronized to the bus clock so even if the processor is running at 180MHz you will not get better than 16.7nS resolution (1000/60MHz), the same goes for reading the port registers from a tight loop. This makes the FTM timer in capture mode as suggested by tni in #3 your best bet.
 
@mlu: Interesting info!

FYI - Kinetis.h offers this to bump F_BUS to 120MHz for 240MHz:
#if (F_CPU == 240000000)
#define F_PLL 240000000
#ifndef F_BUS
#define F_BUS 60000000
//#define F_BUS 80000000 // uncomment these to try peripheral overclocking
//#define F_BUS 120000000 // all the usual overclocking caveats apply...
#endif

and at 180:
#elif (F_CPU == 180000000)
#define F_PLL 180000000
#ifndef F_BUS
#define F_BUS 60000000
//#define F_BUS 90000000
 
Last edited:
Can you explain the comment on the EEPROM and counting? I am not sure why I would need to write to EEPROM while counting cycles. I can see how it might be nice to have that redundancy if something failed but if it affects timing it would be better to just forgo writing to EEPROM or some other non-volatile memory.

I was just warning you that if you happen to write to the EEPROM during the middle of the exercise, that it has to change the clock speed down to 120Mhz, due to the fact that EEPROM writing isn't compatible with speeds over 120Mhz. There may be other places that have to change the bus speed temporarily.

For more details, consult the fine datasheet. I know about the EEPROM stuff, because in the early beta days, I couldn't write to EEPROM unless I dropped the clock speed to 120Mhz or slower. Degragster dug in and fixed it initially, and then Paul picked it up (and possibly rewrote it) for use in beta2 of 1.31.
 
Indeed Paul re-wrote EEPROM speed shift more cleanly/safely - but other than startup reading the serial number - only direct calls to EEPROM write so far use the speed drop from HSRUN to 120 MHz on the fly.

Indeed this wouldn't be the time for EEPROM writes as they take some ms minimal to complete in any case.

@MM - now I see why I abbreviate your name so as not to mistype it :)
 
Last edited:
If I understand correctly you are saying that it would be possible to achieve 120 bus speed on the interrupts? Would this possibly be more consistent? If I were able to achieve +/- 2 bus cycle accuracy it might still be tolerable and I will probably try it both ways for the hell of it.

For the uninitiated what are the usual overclocking caveats?

GremlinWrangler I like the idea of full capture. If I assume a 64Kb memory overhead for the sketch and other things this leaves 192Kb for initial capture after I can write to SD and clear the ram. If I then assume I can pull pin flags using FTM timer each cycle and write them direct to a reallocated array and I want to give myself some extra cycles (I used the random number 35 to pad) for leeway, then...

For 6 pins storing the flag as ints (2bytes each) and an int for cycle count (2bytes each) I can in theory capture 13709 cycles or ~76.16us (approx 609mm @ 8km/s) at 180MHz
For 6 pins storing pin state as 32 bit floats (not sure why?) and an int for cycle count I can in theory capture 7384 cycles or ~ 40.83us (approx 326.64mm @ 8km/s) at 180MHz

So either would work given my presently envisioned test setup but I prefer the added duration going by means of int flags (ie - is triggered 1 or not 0) this I believe uses some kind of digital logic to perform leading edge detection as part of the FTM timer right?

Is it possible to poll for voltage using the FTM setup also? If so using the float array could also be attractive as it can provide additional debugging information. But given I intend to have my own discriminating front end to do the double duty of both detecting events and then protecting the 3v logic of the teensy I end up with digital pulses anyways making this maybe moot.

Also if I wanted to improve the amount of available RAM I could screw with USB and serial configurations (I dont see a need at the moment) -- see: https://forum.pjrc.com/threads/29596-Teensy-3-1-taking-too-much-RAM-compare-to-Mega-2560

For others who might stumble on this in the future:
This appears to be a good writeup on using the FTM: http://shawnhymel.com/681/learning-the-teensy-lc-input-capture/

This is another good example: http://www.digitalmisery.com/2013/06/timer-input-capture-on-teensy-3-0/
 
Yes - F_BUS MHz can be pushed to 120 at 240 and 90 at 180 swapping the commented lines, I've edited that and run some time at 240 w/ F_BUS 120 and it worked for me. The main caveat is that: 'it might not work for you' - it is above the design spec for the part and caveats like: may not be stable, or reliable, and may not be good for the part long term. Some reports of not working over 180 were noted in the beta, but those were rare and may have been other beta/pre-production issues. I haven't seen any notes about pre-mature failure, but that could be something one might expect long term.

Note - working with these 32 bit processors they - at least one note I saw - are more efficient dealing with 32 bit values. I wrote up a pseudo sketch using a circular buffer that over writes waiting for the trigger - then goes to wait for the stop. The stop being only 2-12 usec won't get to use a lot of buffer before stopping. I'll let you see if I get to code it up and see it work.

When set as a digital for interrupt detection - it seems the values would only be 0 or 1?
 
Yes - F_BUS MHz can be pushed to 120 at 240 and 90 at 180 swapping the commented lines, I've edited that and run some time at 240 w/ F_BUS 120 and it worked for me. The main caveat is that: 'it might not work for you' - it is above the design spec for the part and caveats like: may not be stable, or reliable, and may not be good for the part long term. Some reports of not working over 180 were noted in the beta, but those were rare and may have been other beta/pre-production issues. I haven't seen any notes about pre-mature failure, but that could be something one might expect long term.

Note - working with these 32 bit processors they - at least one note I saw - are more efficient dealing with 32 bit values. I wrote up a pseudo sketch using a circular buffer that over writes waiting for the trigger - then goes to wait for the stop. The stop being only 2-12 usec won't get to use a lot of buffer before stopping. I'll let you see if I get to code it up and see it work.

When set as a digital for interrupt detection - it seems the values would only be 0 or 1?

0 or 1 more or less being the definition of binary digital systems yeah it would be. Of course having said that I do recognize that even the digital signal is in truth analog with a quick and consistent rise time (hopefully), and that the front end of the teensy is threshold detecting to discriminate (or maybe I should say at least I think that is what is going on).

I do not know enough about the framework of the board or the subtleties involved to say I understand exactly the difference between treating an input as digital versus analog especially as it pertains to things on this timescale.

Your comment earlier about 32bit systems perhaps being more suited to 32bit operations could mean that defining 32bit analog pins could have some efficiency advantages somewhere... also I do not know how the hardware or firmware/software latency compares between the digital discriminator and analog ADC, perhaps the ADC is used in both cases and then the threshold is checked after this point so that perhaps the additional step of discrimination might be eliminated if using a purely analog input.

Inquiring minds want to know...
 
Not sure the difference in setup from Arduino Mega to Teensy 3.6 but this would lead me to believe that the digital read is likely faster. Possibly because it has circuitry more or less like a flip flip doing the digital discriminating which is what my low latency discriminator circuit would be doing in front of it.

http://forum.arduino.cc/index.php?topic=6549.0

And here Speed-of-digitalRead-and-digitalWrite-with-Teensy3-0

Paul discusses digitalWriteFast() and digitalReadFast() giving an example of write speed, he does not explicitly tackle the issue of read speeds but I imagine they are in some ways similar.
 
Last edited:
Digital read will be much faster, see the adc library for upper limits in what you can do with the ADC https://forum.pjrc.com/threads/25532-ADC-library-update-now-with-support-for-Teensy-3-1 and maximum sample speed can only be carried out on two pins at a time (I think a Mhz or so, which I think is 7mm of travel for this rig). So if you can condition your inputs for digital values you get a much better sample rate. Then again ADC may allow you more options to post event extract what happened if a signal glitches or something (see mars lander crashing due legs unfolding being mistaken for touch down).

Will note this sort of application is what FPGAs are for and even if you tweak the Teensy to the edges you will only be approching what an FPGA can do. Then again if you don't have access to an FPGA expert and time is tight you will got more done faster with a Teensy even if it's not the perfect hammer for the job.
 
Digital read will be much faster, see the adc library for upper limits in what you can do with the ADC https://forum.pjrc.com/threads/25532-ADC-library-update-now-with-support-for-Teensy-3-1 and maximum sample speed can only be carried out on two pins at a time (I think a Mhz or so, which I think is 7mm of travel for this rig). So if you can condition your inputs for digital values you get a much better sample rate. Then again ADC may allow you more options to post event extract what happened if a signal glitches or something (see mars lander crashing due legs unfolding being mistaken for touch down).

I note this sort of application is what FPGAs are for. and even if you tweak the Teensy to the edges you will only be approching what an FPGA can do. Then again if you don't have access to an FPGA expert and time is tight you will got more done faster with a Teensy even if it's not the perfect hammer for the job.

Yes an FPGA would be a better solution but is well outside my expertise, we do not have someone available with that knowledge. I intend to demonstrate feasibility using the teensy, then if people are happy I would use a purpose built event timing chip like the TI or ACAM

TI THS-788 - http://www.ti.com/product/THS788
ACAM TDC-GPX - http://www.acam.de/products/time-to-digital-converters/tdc-gpx/

For my purposes the GPX is a better fit as it provides up to 8 continues channels with 81ps resolution and 200MHz per channel and unlimited duration. The TI cost more and only provides four channels, has better resolution at 13ps while again 200Mhz per channel, but can only time 0-7s. These are both aimed more towards the nuclear community and provide multi-hit capability per channel.

Although, if I were for instance able to achieve desirable performance through a teensy, even if its only 8 channels... then I would still be able to achieve 32 channels for cost of one GPX (which would still require something to communicate with, along with the additional setup of the chip and its asynchronous communication) and it would in many ways be more flexible.

For anyone else who happens to be interested this is the kind of discriminator my front end will use:
http://grapes-3.tifr.res.in/publications/journal/pramana-disc.pdf
 
Got the board and some other things last night. Put the headers on but didn't get a chance to mess with anything.

HOLY CRAP THESE THINGS ARE SMALL! You don't really appreciate it until you hold it in your hands....

Hopefully I should have an update soon.
 
Okay so I got to play around with things a little this weekend.

Code:
#define CPU_RESET_CYCLECOUNTER    do { ARM_DEMCR |= ARM_DEMCR_TRCENA;          \
                                       ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA; \
                                       ARM_DWT_CYCCNT = 0; } while(0)
//Number of cycles to have passed
int cycles;

//pin numbers to set input/output and read values
//PORT D
const int PIN_D00 = 2;
const int PIN_D01 = 14;
const int PIN_D02 = 7;
const int PIN_D03 = 8;
const int PIN_D04 = 6;
const int PIN_D05 = 20;
const int PIN_D06 = 21;
const int PIN_D07 = 5;
const int PINS_D[8] = {PIN_D00,PIN_D01,PIN_D02,PIN_D03,PIN_D04,PIN_D05,PIN_D06,PIN_D07};

//initialize individual pin read values
//PORT D
byte BYTE_D;
int BITS_D[8] = {0,0,0,0,0,0,0,0};

//number of iterations in for loop conparisons
const int numItr = 1000000;

//Turn all interupts off
void interuptsOFF(){
  noInterrupts();//turn off interupts  
  cli();//turn off interupts
  __disable_irq();    
}

//Turn all interupts back on
void interuptsON(){
  interrupts();//turn interupts back on
  sei();//turn interupts back on
  __enable_irq();  
}

//print byte value to serial port for monitoring
void serialPrintByte(byte myByte,int byteLength=8){
  for( int i=byteLength-1; i>=0; i--){
    Serial.print(bitRead(myByte,i));   
  }
  Serial.println("");
}

//print bits value to serial port for monitoring
void serialPrintBits(int myBits[],int bitsLength=8){
  for( int i=bitsLength-1; i>=0; i--){
    Serial.print(myBits[i]);   
  }
  Serial.println("");
}

//Initialization for each port to be digital input
void initializePort_input(const int PINS[]){
  for(int i=0; i<=int(sizeof(PINS)); i++){
    pinMode(PINS[i],INPUT);
  }
}

//Read ports as byte
//PORT D
int read_BYTE_D(){
  CPU_RESET_CYCLECOUNTER;
  BYTE_D = GPIOD_PDIR;
  cycles = ARM_DWT_CYCCNT;
  return cycles;
}

//Read ports as bits
//PORT D
int read_BITS_D(){
  CPU_RESET_CYCLECOUNTER;
  BITS_D[0] = digitalReadFast(PINS_D[0]);
  BITS_D[1] = digitalReadFast(PINS_D[1]);
  BITS_D[2] = digitalReadFast(PINS_D[2]);
  BITS_D[3] = digitalReadFast(PINS_D[3]);
  BITS_D[4] = digitalReadFast(PINS_D[4]);
  BITS_D[5] = digitalReadFast(PINS_D[5]);
  BITS_D[6] = digitalReadFast(PINS_D[6]);
  BITS_D[7] = digitalReadFast(PINS_D[7]);
  cycles = ARM_DWT_CYCCNT;
  return cycles;
}

void setup() {
  // put your setup code here, to run once:
  //initialize each port to read as input
  //PORT D
  initializePort_input(PINS_D);
  //allow serial com to terminate
  while (!Serial);
  delay(100);
}

void loop() {
  //Method 1
  //turn off interupts
  interuptsOFF();
  //read port D and count cycles
  cycles = read_BYTE_D();
  //allow interupts back on
  interuptsON();
  //print data to screen
  Serial.println("---------------");
  Serial.println("METHOD 1: PORT D BYTE READ, FUNCTION CALL");
  Serial.print("cycles: ");
  Serial.println(cycles);
  serialPrintByte(BYTE_D);
  Serial.println("---------------");
  //stall before next read
  delay(1000);

  //Method 2
  //turn off interupts
  interuptsOFF();
  //read port D and count cycles  
  CPU_RESET_CYCLECOUNTER;
  BYTE_D = GPIOD_PDIR;
  cycles = ARM_DWT_CYCCNT;
  //allow interupts back on
  interuptsON();
  //print data to screen
  Serial.println("---------------");
  Serial.println("METHOD 2: PORT D BYTE READ, DIRECT");
  Serial.print("cycles: ");
  Serial.println(cycles);
  serialPrintByte(BYTE_D);
  Serial.println("---------------");
  //stall before next read
  delay(1000);  

  //Method 3
  //turn off interupts
  interuptsOFF();
  //read port D and count cycles
  cycles = read_BITS_D();
  //allow interupts back on
  interuptsON();
  //print data to screen
  Serial.println("---------------");
  Serial.println("METHOD 3: PORT D BIT READ, FUNCTION CALL");
  Serial.print("cycles: ");
  Serial.println(cycles);
  serialPrintBits(BITS_D);
  Serial.println("---------------");
  //stall before next read
  delay(1000);  

  //Method 4
  //turn off interupts
  interuptsOFF();
  //read port D and count cycles
  CPU_RESET_CYCLECOUNTER;
  BITS_D[0] = digitalReadFast(PINS_D[0]);
  BITS_D[1] = digitalReadFast(PINS_D[1]);
  BITS_D[2] = digitalReadFast(PINS_D[2]);
  BITS_D[3] = digitalReadFast(PINS_D[3]);
  BITS_D[4] = digitalReadFast(PINS_D[4]);
  BITS_D[5] = digitalReadFast(PINS_D[5]);
  BITS_D[6] = digitalReadFast(PINS_D[6]);
  BITS_D[7] = digitalReadFast(PINS_D[7]);
  cycles = ARM_DWT_CYCCNT;
  //allow interupts back on
  interuptsON();
  //print data to screen
  Serial.println("---------------");
  Serial.println("METHOD 4: PORT D BIT READ, DIRECT");
  Serial.print("cycles: ");
  Serial.println(cycles);
  serialPrintBits(BITS_D);
  Serial.println("---------------");
  //stall before next read
  delay(1000);  
  
}

I used the above code to check the number of cycles required to do a port read and compared that to the number of cycles to perform a digitalReadFast() on the same pins. I also made a comparison between wrapping these reads in a method just to see if that appears to affect things. Earlier I started comparing for loops to manual repeated queries also but I made changes as I went and lost exactly what that code looked like

Results:

METHOD 1: PORT D BYTE READ, FUNCTION CALL - 10 cycles
METHOD 2: PORT D BYTE READ, DIRECT - 8 cycles
METHOD 3: PORT D BIT READ, FUNCTION CALL - 56 cycles
METHOD 4: PORT D BIT READ, DIRECT - 49 cycles

Earlier when I was doing it slightly differently I was able to get the direct port (Byte) read down to 6 cycles. So now I am wondering what I can do to further reduce the number of cycles? Currently I am disabling interrupts and to be honest I am not sure the difference between cli(), noInterrupts(), and __disable_irq(), insight would be appreciated. Also if anyone can point out other non-essential things to turn off in order to reduce cycles between reads it would be appreciated. Does it make sense to also/alternatively use atomic blocks? (http://www.nongnu.org/avr-libc/user-manual/group__util__atomic.html)

I have not yet looked and how much overhead reading these into an array versus into int and byte variables as I am using now, but am planning to try that if I can get the number of cycles here down further.

Is there a good library merging i2c and lcd as in the ardafruit RGB LCD library? (https://learn.adafruit.com/rgb-lcd-shield/using-the-rgb-lcd-shield) as I have this shield and would like to try

One other thing, I was checking that I was actually reading the pins as I expected using a dip switch to toggle the pin low or high by either grounding it out or connecting it to 3.3v.

I noticed that several of the higher bits (for example: const int PIN_D05 = 20; const int PIN_D06 = 21; const int PIN_D07 = 5; in the above code) would not change regardless of what I did. I was not using pullup/pulldown resistors, is it likely this is the cause or is there something else I need to do to be able to use all the pins of each port? I think I am just going to try to use INPUT_PULLUP as described here -- http://www.pjrc.com/teensy/td_digital.html and see if that doesnt sort it out.

I tested code having the same format as above for each of the ports and has simillar problems on the others as well.

The port setup for the other ports

Code:
//PORT B
const int PIN_B00 = 16;
const int PIN_B01 = 17;
const int PIN_B02 = 19;
const int PIN_B03 = 18;
const int PIN_B16 = 0;
const int PIN_B17 = 1;
const int PIN_B18 = 32;
const int PIN_B19 = 25;
const int PINS_B[8] = {PIN_B00,PIN_B01,PIN_B02,PIN_B03,PIN_B16,PIN_B17,PIN_B18,PIN_B19};

//PORT C
const int PIN_C00 = 15;
const int PIN_C01 = 22;
const int PIN_C02 = 23;
const int PIN_C03 = 9;
const int PIN_C04 = 10;
const int PIN_C05 = 13;
const int PIN_C06 = 11;
const int PIN_C07 = 12;
const int PIN_C08 = 28;
const int PIN_C09 = 27;
const int PIN_C10 = 29;
const int PIN_C11 = 30;
//const int PINS_C[12] = {PIN_C00,PIN_C01,PIN_C02,PIN_C03,PIN_C04,PIN_C05,PIN_C06,PIN_C07,PIN_C08,PIN_C09,PIN_C10,PIN_C11};
const int PINS_C[12] = {PIN_C00,PIN_C01,PIN_C02,PIN_C03,PIN_C04,PIN_C05,PIN_C06,PIN_C07};

Which leads me to my next question. If the register is only 8bits long how are the extra pins of port C handled? Also how are the non-sequential and or missing bits handled? For instance for port B

Pin Bit
16 PTB00
17 PTB01
19 PTB02
18 PTB03
49 PTB04
50 PTB05

31 PTB10
32 PTB11
0 PTB16
1 PTB17
29 PTB18
30 PTB19
43 PTB20
46 PTB21
44 PTB22
45 PTB23

(Bold indiccated pin on the backside as a pad -- from https://forum.pjrc.com/threads/34808-K66-Beta-Test?p=112462&viewfull=1#post112462)

Of which only 5 of the first 8 bits are present, then 2 of the next 8, then 8 of the last 8. So if I do

BYTE_B= GPIOB_PDIR;

exactly which pins/bits am I getting? and is it possible to read another (different) 8 in the same manner? I think the answer is no and this is why people reccommend using PORT D/C because they have sets of sequential bits. But D actually has two sets, can I read the second set the same way somehow?

And then there is this:

I have one quick question and one possibly more difficult one. This is my first time using Port Manipulation and normally the pins are set high or low using 8-bits (B11110000) but Teensy 3.1 uses 32 bit right? does that mean you should use something like B00000000111100001111000011110000 ?

from -- https://forum.pjrc.com/threads/1753...PORT-DDR-D-B-registers-vs-ARM-GPIO_PDIR-_PDOR

which leads me to believe I should be able to read 32 bits simultaneously, or is this only true of the 3.1?

Thanks
 
Last edited:
Your cycle measurements are worthless. You are comparing apples with cucumbers. An ideal case port read runs in 1 to 3 clock cycles (depending on instruction pairing, flash waits). All the other overhead depends on what exactly you want to do. I.e. if the port register address is not already loaded that takes additional time. Your Method 1 and 2 are the same, if the function call is inlined. Otherwise, there is an optimization barrier for the function call and the compiler must reload registers. You are not measuring the function call, you are measuring the missed optimizations.

The compiler can reorder some instructions across your cycle timing code. Unless you know exactly what you are doing and look at the disassembled code code, this sort of micro-benchmark won't give useful results.

If you want to use this for timing, there is no way you will do better than the FTM timer input capture.

So if I do

BYTE_B= GPIOB_PDIR;

exactly which pins/bits am I getting?
The port register is 32 bits. BYTE_B is an 8-bit variable, so you are storing the lowest 8 bits. So if you use a 32-bit variable (e.g. uint32_t), you will get all of them.

Currently I am disabling interrupts and to be honest I am not sure the difference between cli(), noInterrupts(), and __disable_irq(), insight would be appreciated.
It's all exactly the same thing:
Code:
#define cli() __disable_irq()
#define noInterrupts() __disable_irq()
 
@Dizzixx Seems like you are doing something similar to what I am doing but perhaps a bit faster. I plan on using the FTM to count clock cycles that are controlled by START and STOP events. My plan is to use some logic to gate a clock based on those events and count clocks with the FTM counter. When the STOP event occurs the program flow would exit its WAIT state and read the FTM value and procede. Hopefully I will count events with no variable time/clock states involved. Surely my events are significantly slower that what you are measuring, but with a logic family upgrade I think the time events may be in your ballpark. That is really for you to decide. See the attached file for a schematic. I'm an LTSpice user, but don't know if the rest of the participants here would appreciate an *.ASC file.
 

Attachments

  • Teensy_timer.pdf
    21.9 KB · Views: 228
Status
Not open for further replies.
Back
Top