Are interrupts slow?

Status
Not open for further replies.

samy

Member
Hi all!

I'm using Teensy 3.1 running at 72MHz to attempt to man-in-the-middle SPI communication between another two chips. Specifically, I have the other MCU's (master) CLK/EN lines hooked up to the Teensy and slave, and have pin 9 on the Teensy hooked up as the MOSI to the slave. I'm trying to have the Teensy send data to the slave, but have the other MCU control the timing. This is a bare bones test as the software will get more advanced shortly. I was thinking of using hardware SPI, however my future intentions are to change bits on the fly from the original MOSI, so I may need to avoid it.

The issue I'm experiencing is the MCU's SPI clock has a duty cycle of 1.04us, and it seems that either interrupts aren't fast enough or I'm doing something else wrong because I can't have the Teensy keep up in the most bare bones configuration. Is this simply the number of clock cycles some of the code is taking? I tested digitalWriteFast() and found that it's 20ns on my Teensy which is great but for a small function it seems to take 1540ns to perform a digitalWriteFast and a few more things.

Here is an image of a sample of the pins through a logic analyzer and the associated code (a lot of the code is repeated simply to get rid of functions and try to execute things as quickly as possible)

In this image, CLK is the MCU SPI clock (attached to pin 14 of Teensy), RSSI is the DATA_OUT pin of Teensy (pin 9), and EN is the SPI EN pin from the MCU (pin 10 on Teensy). It appears that it takes 200ns from when CLK goes low to hit the interrupt, call clockChange(), and change the bit. Should it be taking 1540
MvHluv1trYwXRmfkMPGx1UAYTNgsBS687HVOZPDkJNg.png

Code:
Code:
#define DATA_OUT 9
#define CLK 14
#define EN 10

volatile int outBits;
volatile byte clkstate;
volatile byte enstate;
volatile byte nextBit;

void setup()
{
  pinMode(CLK, INPUT);
  pinMode(EN, INPUT);
  pinMode(DATA_OUT, OUTPUT);

  attachInterrupt(digitalPinToInterrupt(CLK), clockChange, CHANGE);
  attachInterrupt(digitalPinToInterrupt(EN), spiChange, LOW);

  enstate = digitalReadFast(EN);
  clkstate = digitalReadFast(CLK);
}

void loop()
{
}

// EN went low, send the first bit
void spiChange()
{
  enstate = !enstate;

  // if EN goes LOW (data coming in)
  if (enstate == LOW)
  {
    digitalWriteFast(DATA_OUT, nextBit);
    nextBit = outBits & 1;
    outBits >>= 1;
  }
  else
  {
    outBits = 0x5555;
    nextBit = outBits & 1;
    outBits >>= 1; 
  }
}


void clockChange()
{
  // if EN and clock goes low, load in new bit
  if (clkstate == !LOW)
  {
    digitalWriteFast(DATA_OUT, nextBit);
    nextBit = outBits & 1;
    outBits >>= 1;
  }

  clkstate = !clkstate;
}
 
Hi all!

I'm using Teensy 3.1 running at 72MHz to attempt to man-in-the-middle SPI communication between another two chips. Specifically, I have the other MCU's (master) CLK/EN lines hooked up to the Teensy and slave, and have pin 9 on the Teensy hooked up as the MOSI to the slave. I'm trying to have the Teensy send data to the slave, but have the other MCU control the timing. This is a bare bones test as the software will get more advanced shortly. I was thinking of using hardware SPI, however my future intentions are to change bits on the fly from the original MOSI, so I may need to avoid it.

The issue I'm experiencing is the MCU's SPI clock has a duty cycle of 1.04us, and it seems that either interrupts aren't fast enough or I'm doing something else wrong because I can't have the Teensy keep up in the most bare bones configuration. Is this simply the number of clock cycles some of the code is taking? I tested digitalWriteFast() and found that it's 20ns on my Teensy which is great but for a small function it seems to take 1540ns to perform a digitalWriteFast and a few more things.

Here is an image of a sample of the pins through a logic analyzer and the associated code (a lot of the code is repeated simply to get rid of functions and try to execute things as quickly as possible)

In this image, CLK is the MCU SPI clock (attached to pin 14 of Teensy), RSSI is the DATA_OUT pin of Teensy (pin 9), and EN is the SPI EN pin from the MCU (pin 10 on Teensy). It appears that it takes 200ns from when CLK goes low to hit the interrupt, call clockChange(), and change the bit. Should it be taking 1540

Maybe you could explain why you prefer NOT to use SPI read/write?
 
WMXZ, the ultimate goal is to read in SPI data (MOSI) from the other MCU (the actual master) and adjust bits on the fly depending on what's coming in, and shoot those out. I'd prefer not to buffer and I'd like to make decisions on the fly, so if I see the first MOSI bit as '1', I may want to adjust it to '0', but if the next is '0', I may want to change it to '1'...

If I can do with this with the SPI functionality, that would be great but I'm guessing internal SPI logic will acquire all 16 bits before sending back to me, at which point it's too late for me to be the MOSI and send data back out because EN will already go HIGH. I need to transmit bits while EN is low and the clock is running.
 
I'd have a look at the Soft SPI implementation that Bill Greiman or Paul have been working on. Then pass the bits along as needed while manipulating the two parts independently. Or wait for Teensy 3++ since that may include two hardware SPI implementations.
 
Its not really the digitalWriteFast that takes time, its the interrupt overhead. It is not possible to run an interrupt handler that fires at 2MHz.
Since it seems you only send a bit after low->high transition you could change the the clock interrupt to only trigger on low->high.

You might need to run the transmission from the EN interrupt, make sure the priority is high enough not to be interrupted and then read the clock signal from a polling loop. Of course this can mess up some timing interrupts but if the total transaction time is less than some 100uS then its probably no problem.
 
mlu, thanks. This works better but not quite fast enough.
Screenshot 2015-09-22 13.46.01.jpg


Code:
#define DATA_OUT 9
#define CLK 14
#define EN 10

volatile int outBits;
volatile byte nextBit;
volatile byte sent = 1;

void setup()
{
  outBits = 0x5555;
  nextBit = outBits & 1;
  sent = 1;

  pinMode(CLK, INPUT);
  pinMode(EN, INPUT);
  pinMode(DATA_OUT, OUTPUT);

  attachInterrupt(digitalPinToInterrupt(EN), spiChange, LOW);

}

void loop()
{
}

byte clk;

// EN went low, send the first bit
void spiChange()
{
  digitalWriteFast(DATA_OUT, nextBit);

  while (1)
  {
    clk = digitalReadFast(CLK);
    if (sent == 0 && clk == LOW)
    {
      digitalWriteFast(DATA_OUT, nextBit);
      sent = 1;
    }
    else if (sent == 1 && clk == HIGH)
    {
      // load new bits
      sent = 0;
      nextBit = outBits & 1;
      outBits >>= 1; 
    }

    // check for EN going high when done
    else if (!outBits && digitalReadFast(EN) == HIGH)
    {
      outBits = 0x5555;
      nextBit = outBits & 1;
      outBits >>= 1;
      break;
    }
  }
}

I also tried removing interrupts altogether and checking via digitalReadFast() but things are still too slow. Would next best bet be to implement in ASM or perhaps some other functionality of the chip that could help in this case?
 
I'd use FASTRUN and no interrupts (disable them). If ~96 instructions per micro-second isn't enough, then you need hardware assist or a faster CPU.
 
You'll need to use attachInterruptVector() to get direct control of the interrupt, rather than going through the slow multi-pin attachInterrupt handler.

Normally I'd write a very detailed message, but I'm currently occupied with a critical task to get an urgent order shipped out today. Hopefully this tip helps. It's been discussed in prior threads, so search for terms like attachInterruptVector and "latency".

I can write more later if needed...
 
Thanks Paul & jonr!

Paul, no problem, get that order out! I appreciate the tip, I'll investigate and report back.
 
Perhaps (most likely) I don't understand how to use attachInterruptVector?

Even though I have an SPI clock/enable pin firing on pins 14 (clock) and enable (10), I never seem to hit the interrupt. I was going for pin 14, so I used IRQ_PORTD with no success, I also tested A/B/C/E just in case I was using the wrong port but pin 9/DATA_OUT never goes high unless I change back to the commented out attachInterrupt() line:

Code:
#define DATA_OUT 9
#define CLK 14
#define EN 10

void setup()
{
  pinMode(CLK, INPUT);
  pinMode(DATA_OUT, OUTPUT);
  
  digitalWriteFast(DATA_OUT, LOW);

  // uncommenting the next line DOES allow DATA_OUT to go HIGH
//  attachInterrupt(digitalPinToInterrupt(CLK), clkIrq, CHANGE);
  attachInterruptVector(IRQ_PORTD, clkIrq);
}


void clkIrq()
{
  digitalWriteFast(DATA_OUT, HIGH);
}

void loop()
{
}
 
You can speed up the code in #6 by not using volatile variables inside the loop of the interrupt handler. Copy the volatile variables to local, nonvolatile, variables that probably will be kept in registers. If they must be written back to volatile global variables, do that just before exit from the interrupt handler. This will eliminate quite a lot of reading and storing volatile variables from memory, and the volatile keywords prevents unneeded read and writes from beeing optimized away.

Make the clock variable a unsigned integer, and a local variable in the interrupt routine, since the Teensy 3.1 is 32 bit, some byte operations actually takes longer than the corresponding 32 bit versions.

You could also try eliminating the nextBit variable and simply use (outBits & 1) but actually if they are local variables and not volatile then the compiler might optimize the code to this anyway.
 
Last edited:
Very interesting! I'm getting close to speeds I need -- instead of digitalWriteFast and digitalReadFast, I'm now accessing the registers directly, and I've replicated the attachInterrupt function and using attachInterruptVector to call my own functions. It's working much faster but still barely not working as fast as I'd need. Any way to speed this up? I assume not making it volatile...

I could also change outBits before hand if there are faster operations, I assume other bitwise operators are the same speed as `& 0x800` but not sure.


Code:
...

volatile uint16_t outBits;

// interrupt handler
FASTRUN static void clkIrq(void)
{
  // pin14 falling?
  if (PORTD_ISFR & CORE_PIN14_BITMASK)
  {
    // clock gone high, now spit out next bit
    if (outBits & 0x8000) CORE_PIN9_PORTSET = CORE_PIN9_BITMASK;
    else CORE_PIN9_PORTCLEAR = CORE_PIN9_BITMASK;
    outBits <<= 1;
  }

  PORTD_ISFR = PORTD_ISFR;
}
 
Any way to speed this up?

Have you tried uncommenting the faster overclocking options in boards.txt?

If you try 168 MHz, upgrade to Teensyduino 1.25, and use FASTRUN on pretty much everything. Version 1.25 slows the flash clock for that 168 MHz mode (previously 168 MHz crashed on almost all boards due to the flash memory). Now it might actually work, but the only way to benefit from that speed is running from RAM.

While anything over 72 MHz is technically overclocking, many people have reported 120 MHz is very stable. Freescale coincidentally makes numerous other parts with exactly this same processor on the same 90 nm silicon process which are specified at 120 MHz.

Reports are mixed about 144 MHz. Some people report rare crashes, but long periods of reliable operation. If this is a proof-of-concept project or a tool that doesn't have to run continuously for days or months or years, maybe the faster overclocking speeds can help.

If you do explore 168 MHz, please let us know how it works? Until recently, it pretty much never worked. Now with the flash clock change, I'm hoping to hear more 168 MHz feedback (especially if it really does offer much benefit over 120 and 144). :)
 
Great, thanks Paul!

This is a proof-of-concept so intermittent crashing wouldn't be terrible, I'll explore overclocking past 96MHz. I've been running all functions (except loop, which is empty, and setup) with FASTRUN. I'll report back with how testing goes on overclocking.

I'm not running anything in the main loop -- would it be more efficient to disable interrupts and just wait on the registers in the main loop?

Would it also more CPU efficient to operate on 32-bit values over 16-bit values?

I'm getting nanoseconds away from this working!
 
Running on Teensy 3.2 and all functions as FASTRUN and all vars as non-volatile uint32_t's, I can't get 168MHz to work at all (Teensy outputs nothing on my logic analyzer), while 144MHz seems to work well. I don't know if it's crashing at 144MHz at all as I don't have logging but every sample I take with the analyzer looks good, and the SPI man-in-the-middling is working much faster. Some things I was timing at 500ns at 72MHz runs at about 330ns at 144MHz.
 
Did you try a simple blink? Is USB in use?

<EDIT: removed erroneous USB info from misplaced historical browsing of the forum>

PaulStoffregen said:
... I'm not aware of any USB specific issues with overclocking.

Long ago, the very first attempts at overclocking didn't support USB because they didn't configure the USB clock divider. That was fixed ages ago, when the faster overclocking was added to the official Teensyduino core library (but commented out in boards.txt).
 
Last edited:
I'm not printing any serial data or using USB for anything other than power + programming -- logic analyzer hooked up to GPIO pins as the Teensy should be listening on some GPIO pins and controlling other pins based off of that input.
 
If your not running USB or other services that needs interrupts, ther's not really any reason to run your code from interrupts. Or just all inside the EN irq. Will give more headroom even if 144MHz is acceptable. Just my feeling :)
 
I'll try that. Is there a more efficient way to detect a falling edge than storing the last status of the pin in memory? I'm thinking:
Code:
status = digitalReadFast(PIN);
if (status == LOW && lastStatus == HIGH) { /* pin changed from HIGH to LOW, essentially my FALLING interrupt */ }
lastStatus = status;
 
There are caveats about USB at higher rates ...

Really? I'm not aware of any USB specific issues with overclocking.

Long ago, the very first attempts at overclocking didn't support USB because they didn't configure the USB clock divider. That was fixed ages ago, when the faster overclocking was added to the official Teensyduino core library (but commented out in boards.txt).
 
Status
Not open for further replies.
Back
Top