Maximum interrupt speed on Teensy 4

Status
Not open for further replies.
Hello,
I've a custom protocol (similar to SPI) to decode, made with 3 lines:
- ENABLE, High when data is valid
- CLOCK, at 6 Mhz
- DATA

I've attached 2 interrupts on ENABLE and CLOCK input, for acquiring DATA only when CLOCK and ENABLE are HIGH.
Seems that CLOCK line is too fast, and the interrupt on this pin is never activated.

I tried slowing down the CLOCK line to 100 kHz and it works, but I need 6 MHz.
Is there any workaround?

Code:
const int ledPin  = 13;
const int dataPin =  0;
const int clkPin  =  1;
const int gatePin =  2;

char currByte;
int idxBit = 0;

#define DATALEN 35000
char dataBuff[DATALEN];
int dataIdx = 0;

void setup() {
  pinMode(ledPin,  OUTPUT);
  pinMode(dataPin, INPUT);
  pinMode(clkPin,  INPUT_PULLUP);
  pinMode(gatePin, INPUT_PULLUP);
  
  memset(dataBuff, 0, DATALEN);
  
  Serial.begin(20000000);
  Serial.println("START LOGGER v1.0");
  attachInterrupt(digitalPinToInterrupt(gatePin), GateChanged, CHANGE);
}

void GateChanged() {
  int val = digitalRead(gatePin);
  
  if (val == 1) {
    // Begin acquisition
    currByte = 0;
    idxBit = 0;
    dataIdx = 0;
    attachInterrupt(digitalPinToInterrupt(clkPin), DataArrived, RISING);
  } else {
    // End acquisition
    detachInterrupt(digitalPinToInterrupt(clkPin));
  }
}

void DataArrived() {
  int val = digitalRead(dataPin);
  
  if (val == 1) {
    currByte = currByte | (val << idxBit);
  }
  
  idxBit++;

  if (idxBit == 8) {
    //Store byte
    dataBuff[dataIdx++] = currByte;
    idxBit = 0;
    currByte = 0;
  }
}

void loop() {
  if (dataIdx > 0) {
    Serial.print("Data [");
    Serial.print(dataIdx);
    Serial.print("] = ");
    for (int i = 0; i < dataIdx; i++) {
      Serial.print(dataBuff[i], HEX);
      Serial.print(" "); 
    }
    Serial.println("");
    dataIdx = 0;
    delay(2000);
  }
  delay(10);
}
 
Changing things like digitalRead(dataPin) to digitalReadFast(dataPin) is the first step - it will get some cycles back.

Aassuming the gatePin is not part the problem and just a 'slow' toggle? But some of the same applies

Not sure how much a change of this:
Code:
  int val = digitalRead(dataPin);
  
  if (val == 1) {
    currByte = currByte | (val << idxBit);
  }

to this gains:
Code:
  if (digitalReadFast(dataPin)) {
    currByte = currByte | (1 << idxBit);
  }

For check and set of dataIdx to be of any value it should be made volatile.

But loop() will see that after it goes to 1 or maybe 2 and then set it to 0 while the data may still be arriving. It might be better to set a flag in GateChanged() when that goes to // End acquisition.
 
Changing things like digitalRead(dataPin) to digitalReadFast(dataPin) is the first step - it will get some cycles back.
Aassuming the gatePin is not part the problem and just a 'slow' toggle? But some of the same applies

I've an oscilloscope monitoring the 3 lines, and all the data is noisy (compared to 100 kHz), but with the correct timing. I'll attach a photo tomorrow.


Not sure how much a change of this:
Code:
  int val = digitalRead(dataPin);
  
  if (val == 1) {
    currByte = currByte | (val << idxBit);
  }

to this gains:
Code:
  if (digitalReadFast(dataPin)) {
    currByte = currByte | (1 << idxBit);
  }
I'll try this trick

For check and set of dataIdx to be of any value it should be made volatile.
But loop() will see that after it goes to 1 or maybe 2 and then set it to 0 while the data may still be arriving. It might be better to set a flag in GateChanged() when that goes to // End acquisition.
In the final code, the transfer and the reset will be made after gatePin goes LOW.
For debugging I've put on loop.

Another question: if I don't need to process anything in the loop function, is better to put a long delay or let it empty?
 
i think the best you can do with attachInterrupt() on T4 is 3.4mhz, see
https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=206846&viewfull=1#post206846

You could dig into the attachInterrupt() code and copy parts of it, instead using attachInterruptVector() for the whole port. In your function you'll need to clear the interrupt status, as that code does before it calls your function. Eliminating the check for the other pins on that port and the function call overhead should buy you a little more speed.

And yeah, SPI slave mode or FlexIO hardware could do this efficiently. But that's a steep hill to climb since there isn't a lot of example code for using the hardware in those ways.
 
... why don't you use SPI?

Because I cannot choose the protocol. Even it seems a SPI, there is not answer to the data (like SPI has, with the same data length).
It is a one direction stream data at 6Mhz, and the packet has 5ms of duration with an interval of 7 ms.
I want to test the sender, emulating the receiver with this Teensy 4. I though I could achieve this, but I'm not sure anymore.
 
I see no reason why SPI wouldn't do that.
Worst case you had to decode/process somehow the received data afterwards.

What do you means with "answer"? Spi has no such feature.
 
Last edited:
I see no reason why SPI wouldn't do that.
Worst case you had to decode/process somehow the received data afterwards.

What do you means with "answer"? Spi has no such feature.

Sorry for my bad english.
SPI works with shift registers, so if I want to read a stream of 35000 bytes from a peripheral like I want, I've to send 8 (or 16) bytes from the master to the slave and get back 8 bytes from it's stream, and so on.
In this simplified protocol (that I've to read and I cannot modify), I need only 3 wires instead of 4, and there is no need of the master, and there is no latency between multiple reads.
 
... so, the master sends when CS is high, and receives when CS is low? (This might work with just a second SPI and a inverter for CS) or FlexSpi.
 
I believe FlexSPI is master only. Sounds like this use has the external device sending a clock to Teensy, so slave mode would be needed to receive its clock + data. FlexSPI is designed for memory mapping SPI/QSPI memory chips, where Teensy decides to assert CS and CLK and do a read or write based on cache misses that need to access memory. I'm pretty sure there's no way for FlexSPI to respond to events other than memory fetches & stores from the AXI bus inside the chip.

Normal SPI (or "LPSPI" in NXP lingo) might be able to do it in slave mode. I believe it allows any word sizes up to 32 bits, not limited to only 8 or 16 bit words like the SPI hardware in most chips. But if the incoming data length isn't known in advance, guessing which word size to use for the last bits of the message before CS is deasserted could be problematic.

Even though generating a pin change interrupt for every clock is terribly inefficient, if this is a project that just needs to get done the simplest & quickest way, I don't see why that's not a valid way to try (if the interrupt code can be optimized enough). If you have a 600 MHz chip and you need to get a problem solved quickly, why not try the easiest way?

Of course if we had a really flexible SPI slave library or good documentation & tutorials on FlexIO (in my dream world which has infinite hours to make software) then those might be the quickest way to solve the problem. But today we just don't have that, and especially the FlexIO (for special uses other than normal serial protocols) is poorly documented by NXP's reference manual.
 
... so, the master sends when CS is high, and receives when CS is low? (This might work with just a second SPI and a inverter for CS) or FlexSpi.

When the master want to send data puts CS high (Gate), when it finish put CS low. It's a one direction communication.
Is not a SPI, but it has something in common. In this scenario the Teensy should be the slave.
 
With 1.5 MHz of clock on master, I can receive all data correctly with my original code and the @defragster modification; if I use a clock of 3 MHz, the received data is not valid anymore.
I think that 6 Mhz is only a dream.
 
You could also try polling in a tight loop. Running on the 600MHz processor this might be quite fast.
Here some absolutely untested "pseudo code".

Code:
while(cs == HIGH)
{
    if(clock changed)
   {
      digitalReadFast(Data) 
      //shift it to the storage 
    }
}
 
You could also try polling in a tight loop. Running on the 600MHz processor this might be quite fast.
Here some absolutely untested "pseudo code".

Code:
while(cs == HIGH)
{
    if(clock changed)
   {
      digitalReadFast(Data) 
      //shift it to the storage 
    }
}

With your hint I successfully tested this bus at 6MHz, but only setting the Teensy CPU at 912MHz (overclocked).

This my new code:

Code:
const int ledPin  = 13;
const int dataPin =  0;
const int clkPin  =  1;
const int gatePin =  2;

char currByte;
byte idxBit = 0;

#define DATALEN 35000
char dataBuff[DATALEN];
int dataIdx = 0;


void setup() {
  pinMode(ledPin,  OUTPUT);
  pinMode(dataPin, INPUT);
  pinMode(clkPin,  INPUT_PULLUP);
  pinMode(gatePin, INPUT_PULLUP);
  
  memset(dataBuff, 0, DATALEN);
  
  Serial.begin(20000000);
  Serial.println("START LOGGER v1.0");
  attachInterrupt(digitalPinToInterrupt(gatePin), BeginRead, RISING);
}

void BeginRead() {
  byte clockVal = 1, prevVal = 1;
  byte gateVal = 1;

  // Begin acquisition
  currByte = 0;
  idxBit = 0;
  dataIdx = 0;
  
  while (gateVal == 1) {
    clockVal = digitalReadFast(clkPin);
    
    if (clockVal == 1 && prevVal == 0) {
      // Rising
      prevVal = 1;
    } else if (clockVal == 0 && prevVal == 1) {
      // Falling
      prevVal = 0;
      DataArrived();
    }

    //if (dataIdx > 4)
      gateVal = digitalReadFast(gatePin);
  }
    
  if (dataIdx > 0) {
    Serial.print("Data [");
    Serial.print(dataIdx);
    Serial.print("] = ");
    for (int i = 0; i < dataIdx; i++) {
      Serial.print(dataBuff[i], HEX);
      Serial.print(" "); 
    }
    Serial.println("");
    dataIdx = 0;
    delay(2000);
  }
}

FASTRUN void DataArrived() {  
  if (digitalReadFast(dataPin) == 1) {
    currByte = currByte | (1 << idxBit);
  }
  
  idxBit++;

  if (idxBit == 8) {
    //Store byte
    dataBuff[dataIdx++] = currByte;
    idxBit = 0;
    currByte = 0;
  }
}

void loop() {

}
 
Last edited:
Setting the Teensy CPU to 600MHz, it lost the first bit of the stream.
With idxBit = 1, the data stream is correct unless the first bit.
I think the attachInterrupt routine is not fast as the clock to monitor (6MHz)
 
if (digitalReadFast(dataPin) == 1) {
currByte = currByte | (1 << idxBit);
}
[/CODE]

Where do you shift the zero?

Delete that fastrun - it's not needed. Using inline instead would make it a bit faster, maybe.
And disable the interrupts.

Still don't see why this is different from SPI.. but ok. No need for me to know that ;)
 
Last edited:
Where do you shift the zero?

Delete that fastrun - it's not needed. Using inline instead would make it a bit faster, maybe.
And disable the interrupts.

Still don't see why this is different from SPI.. but ok. No need for me to know that ;)

There is no need to shift 0, the currByte value is initialized to 0.
I've disabled the interrupt on CLK line, but CS (gate) interrupt maybe is needed; I can try reading the CS pin too in the while routine, maybe I will not miss anymore the first bit.
This protocol is very similar to SPI, is lighter than SPI. If you connect a SPI receiver to this bus, you cannot receive anything, because is not the same.
This protocol is intended to trasfer data from the master to the slave, so there is no need of the MISO pin, and so the slave must not send the same amount of data to the master (as the classical SPI, that works with shift registers).
 
Yup, got it.

But SPI does not need MISO. It's optional. No need to connect it.
You can use MISO or MOSI or BOTH with SPI.
 
Last edited:
Status
Not open for further replies.
Back
Top