Why no substantial speed inscrease when switching from 24MHz to 72MHz

Hi all,
I am running 9 functions inside an Interrupt Service Routine (ISR) which is triggered 140 times/second (i.e. 140Hz) on a Teensy 3.1. These functions have to do with calculating variables and parameters and don't have anything to do with peripherals.

I am turning pin 0 on an off in order to measure the relative time required to run these functions by observing on an oscilloscope.

Code:
void timerCallback0() {
  //cli();
  digitalWrite(0, HIGH); //to measure the execution time of these routines with an oscilloscope
  Function1();
  Function2();
  Function3();
  Function4(); 
  Function5();
  Function6();  
  Function7();
  Function8();
  Function9();   
  digitalWrite(0, LOW);
  //sei();
}

My aim is to increase the frequency of the ISR from 140Hz to as much as possible so these functions are called as frequently as possible.
When I set the Teensy to run at 24Mhz, I see a duty cycle of 78% on the scope, and when I increase it to 72Mhz, the duty cycle is 74% - just very little improvement which kind of surprises me, I thought it should be 78%/3 = 26% or something like that. This is the main reason I am switching from Arduino UNO running at 16Mhz to Teensy for this application. I was able to run this at 70Hz on the UNO, so for an MCU that is almost 10x faster, and just 2x improvement it's disappointing.

My question is why so little improvement?
I have never run the Teensy overclocked, so the side question is if I overclock to 96Mhz or 120Mhz, should there be any concerns (aside from the possible increase in heat) since I have heard somewhere that overclocking may affect interrupts. Has anyone overclocked to 96Mhz or 120Mhz in some serious application?

Any comment is really appreciated.

Dave
 
Last edited:
Function calls in an isr might be part of the problem, depending on the compiler and the functions. Jumps have to be calculated, the processor registers might have to be freed up, etc. That means that there are chances that your Teensy is more occupied with administrative tasks than with doing efficient calculations.
If I were you, I'd put the function code directly into the ISR and see what happens.
 
Hi all,
I am running 9 functions inside an Interrupt Service Routine (ISR) which is triggered 140 times/second (i.e. 140Hz) on a Teensy 3.1. These functions have to do with calculating variables and parameters and don't have anything to do with peripherals.

I am turning pin 0 on an off in order to measure the relative time required to run these functions by observing on an oscilloscope.

Code:
void timerCallback0() {
  //cli();
  digitalWrite(0, HIGH); //to measure the execution time of these routines with an oscilloscope
  Function1();
  Function2();
  Function3();
  Function4(); 
  Function5();
  Function6();  
  Function7();
  Function8();
  Function9();   
  digitalWrite(0, LOW);
  //sei();
}

My aim is to increase the frequency of the ISR from 140Hz to as much as possible so these functions are called as frequently as possible.
When I set the Teensy to run at 24Mhz, I see a duty cycle of 78% on the scope, and when I increase it to 72Mhz, the duty cycle is 74% - just very little improvement which kind of surprises me, I thought it should be 78%/3 = 26% or something like that. This is the main reason I am switching from Arduino UNO running at 16Mhz to Teensy for this application.

My question is why so little improvement?
I have never run the Teensy overclocked, so the side question is if I overclock to 96Mhz or 120Mhz, should there be any concerns (aside from the possible increase in heat) since I have heard somewhere that overclocking may affect interrupts. Has anyone overclocked to 96Mhz or 120Mhz in some serious application?

Any comment is really appreciated.

Dave

The code and constants are stored in flash and depending on the type of operation you are doing, you may be limited by access to the flash.
Now, the access to the flash for F_CPU 24 to 72 MHz occurs with the same speed (24 MHz) [see mk20dx256.c lines 960 to 1010] it increases only slightly to 28.8 MHz for F_CPU=144 MHz.

Concerning overclock: All my T3.2 run at 144 MHz and I cannot remember issues with speed.

When running faster than 72 MHz, then there is speed improvements when putting code and constants into RAM.
AFAIK: below 72 MHZ access to flash is optimized and can hardly be improved.
 
The code and constants are stored in flash and depending on the type of operation you are doing, you may be limited by access to the flash.
Now, the access to the flash for F_CPU 24 to 72 MHz occurs with the same speed (24 MHz) [see mk20dx256.c lines 960 to 1010] it increases only slightly to 28.8 MHz for F_CPU=144 MHz.

Concerning overclock: All my T3.2 run at 144 MHz and I cannot remember issues with speed.

When running faster than 72 MHz, then there is speed improvements when putting code and constants into RAM.
AFAIK: below 72 MHZ access to flash is optimized and can hardly be improved.

Thanks, makes sense to me. Regarding overclocking, the term "overclocking" just make me nervous. Will try to run at 120MHz. How to "put code and constants into RAM"? I still have lots of RAM available. Would like to try and see the difference.

Update: I have just tried running at 120MHz and the result is exactly the same as 72MHz, the duty cycle is 74%.
 
Last edited:
You should get a speedup of very close to 3x going from 24MHz to 72MHz (4x going to 96MHz). Either some other higher priority code is running (your interrupt can be interrupted) or your functions spend a lot of time waiting on something.

You should test your functions separately. You can easily count/measure clock cycles on Teensy 3.1.

How to "put code and constants into RAM"? I still have lots of RAM available. Would like to try and see the difference.
Don't bother running at 96MHz or less. The difference will be minimal.
 
Thanks, makes sense to me. Regarding overclocking, the term "overclocking" just make me nervous. Will try to run at 120MHz. How to "put code and constants into RAM"? I still have lots of RAM available. Would like to try and see the difference.

Update: I have just tried running at 120MHz and the result is exactly the same as 72MHz, the duty cycle is 74%.

data: remove word 'const' from declaration
code: add FASTRUN before function declaration, e.g.:
Code:
FASTRUN void rfft_256(short *xx, short *yy) {......}
if that generates compiler errors , add at the top of the file or in a header file
Code:
#ifndef FASTRUN
	#define FASTRUN __attribute__ ((section(".fastrun")))
#endif
 
Function calls in an isr might be part of the problem, depending on the compiler and the functions. Jumps have to be calculated, the processor registers might have to be freed up, etc. That means that there are chances that your Teensy is more occupied with administrative tasks than with doing efficient calculations.
If I were you, I'd put the function code directly into the ISR and see what happens.
Thanks. I will keep that as the last resort, since there are a total of like more than a thousand lines of code to put in ISR right now.
To keep things in perspective, assuming a worst case function call overhead of 20 clock cycles, you are looking at 72'000'000 / (10 functions * 20 clock cycles * 140Hz) = 1 / 2500.

Inlined functions are as good as putting the code directly into the ISR.
 
You should get a speedup of very close to 3x going from 24MHz to 72MHz (4x going to 96MHz). Either some other higher priority code is running (your interrupt can be interrupted) or your functions spend a lot of time waiting on something.

Don't bother running at 96MHz or less. The difference will be minimal.
Very encouraging. I will try along this direction and update.

Either some other higher priority code is running (your interrupt can be interrupted)
I tried uncommenting the cli() & sei() and it's the same. Also the duty cycle stay constantly at 74% suggesting there is no nesting interrupt?

Don't bother running at 96MHz or less. The difference will be minimal.
I take it that you advise to run above 96MHz in any case, like 120MHz, 144Mhz?
 
Maybe one of these is doing something that takes a fixed amount of time?

Code:
  Function1();
  Function2();
  Function3();
  Function4(); 
  Function5();
  Function6();  
  Function7();
  Function8();
  Function9();

Maybe one of these has something like delay()? Or sending lots of serial data, much more that fits in the transmit buffer? Or any number of other things which depend on actual time or or the timing of external events?

Are you measuring with that same cheap oscilloscope which caused so much trouble a few days ago?
 
Maybe one of these is doing something that takes a fixed amount of time?

Maybe one of these has something like delay()?

Are you measuring with that same cheap oscilloscope which caused so much trouble a few days ago?

You are right! About the oscilloscope, right its still the same one, I am still saving up for a professional scope, but anyway in this case I strongly believe that its not the scope that is the culprit.

Following tni's advice, I have been able to measure the clock cycles of every functions, and there is one stands out with execution time more than 160 times of all the other combined. Turns out that I am reading an analog accelerometer (MMA7361L) and to reduce noise I had to to 10 readings with a delay of 500us each:

Code:
  Acc_RAW=0;
  for (i=0; i<10; i++)
  {
    Acc_RAW += analogRead(ACC_PIN);
    delayMicroseconds(500);
  }

So cause found thanks to all of you. Will look to improve this or look for other accelerometer alternatives.
 
Next time you post this sort of question, include the complete code (eg, the "Forum Rule"). It really saves everyone a lot of time. We can help you much better than having to guess what might be in your 9 functions!
 
Next time you post this sort of question, include the complete code (eg, the "Forum Rule"). It really saves everyone a lot of time. We can help you much better than having to guess what might be in your 9 functions!

to add to Paul's comment: especially if you say in OP
These functions have to do with calculating variables and parameters and don't have anything to do with peripherals
emphasis by me
 
Not mentioning the oscilloscope used is known to have serious issues (only 500 kSample/sec acquisition) is also less than fully honest & forthcoming. We really do try to help here. The least you can do is be honest with us and post the full code.
 
Back
Top