GPIO_GDIR issue

Status
Not open for further replies.

forbiddenera

Well-known member
Thought I'd make a new thread since this is sort of a separate issue and since my original thread my project might not even be using DMA, heh..

edit: I maybe figured out the GPIO_GDIR part, I think it actually turned out to be a quirk in my testing setup not an issue of manipulating that register.

I'm using GPIO6.16-30 to read a parallel memory address from pins, making an 27c256 EPROM emulator (trying). My other thread has more background information if needed. Reading is fine (access bits 16-30 of GPIO_DR)..

I need to switch either GPIO6.16-31 to input/output rather quickly or alternatively switch 8 pins on say, GPIO2/7 to HI-Z/output.

My target device shares address pins 0-7 with the data pins using a latch. Once I've read the address and OE# has been triggered, I want to switch the pins to output and output the required data. This needs to happen very quickly, thus me wanting to directly access the register to modify the direction in one swoop.

As far as I can tell from the manual, in the GPIOn_GDIR register, 0-31 maps to inputs 0-31 of each GPIO. Thus bits 16-31 should control the directions of those pins.

Code:
unsigned GDIR_MASK = 0xFFFF0000;

void pinChangeISR() {
  register long unsigned ISRstart = ARM_DWT_CYCCNT;
  if (OEstate == LOW) {
  //  GPIO6_GDIR &= ~GDIR_MASK; //clear (set pins to input)
    readAddress = (GPIO6_DR >> 16);
    OEstate = HIGH; 
  } else {
    OEstate = LOW;
    // GPIO6_GDIR |= GDIR_MASK; //set (set pins to output)
  }

Reading back the register, I get the bits set that I expect, 16-31 set to 1. When this happens, even though I set the address to input first, I get garbage in readAddress.. Not sure if I need to wait or something?

Also, once GPIO_GDIR is set back to zero, bits 21-25, and 27-31 wouldn't respond, IE, if all pins are pulled high I should get 1111 1111 1111 111x back from the register (last bit=don't care), but I get back 1111 0000 1111 0000 instead, even if all of GPIO_GDIR is set to 0's as it originally was..?

Today with testing, I'm getting back 1111 0000 1111 0010 .. which feels even weirder?! I should note that, this is also in the main loop() -- I could see possibly getting garbage back from reading GPIO_DR right after setting back to input in the ISR at least and maybe having to wait - but the main loop() which cycles at 100ms still reads 1111 0000 1111 0010 several loops later, so something isn't getting set right.

I've also noticed that, after uploading, sometimes (not always), if all pins are set to high, I get back the same sort of response until the pins go low and high again - this is *without* triggering the OE# pin / thus triggering the ISR.

Any ideas on what I'm missing? Does simply changing GDIR affect anything in the MUXC? Do I need to set some other stuff? Is there a better way to do this? Again, I'm simply looking to read those pins then quickly turn them to outputs or at least be able to change 8 other pins quickly from output to HI-Z. Ideally I'd like to do both for different projects, either should be possible.
 
Last edited:
Okay, so.. I swear these types of issues only hit me..? I hope not..

So, something I edited out of the above post because I figured it out, bit #30 was 'sticking' .. reseated my eeprom and seems to be okay.

Now, I tried manipulating OE# myself (the ISR trigger) and when I do that, I can see all pins being set.. if I let the eeprom reader toggle OE# then set the address pins all to 1's, then I get the issue..

But the issue *ONLY* happens if the teensy sets the GDIR on the ISR trigger. If I comment out those two lines, or trigger OE# manually, I don't see the issue.

Now I'm confused if it's a T4.1 issue or an issue with the eeprom reader/writer or both together? Maybe the eeprom reader doesn't like (some of?) the address pins being set to output?

After some more experimenting, if I pull OE# high (read address mode) manually but the pins are already set to high, I get the "masked" bits 1111 0000 1111 0000.. if I set the pins high AFTER pulling OE# high, then I get the proper 1111 1111 1111 1111... So, is there some kind of keeper on those other bits or something I'm missing maybe? This doesn't seem to happen if I pull the pins manually high..so definitely possible a weird interaction between the two devices but..totally confused.

Still looking for some insight/direction on setting the pins up.. I am getting these registers figured out but..still having a tough time..I guess setting GDIR maybe was working then?

edit:

So, I'm using some cheap amazon level shifters between the stuff..

I just swapped a bit.. seems like some of these shifters are acting weird.. because swapping the bit, can now see that bit changing.. but.. I also wonder if maybe something on the channel configuration could be affecting things..?

frustrating as some of these shifters had some channels that just didn't even work even.. gonna try swapping things around and see if I have any progress.

edit: not having much.. swapping shifters doesn't seem to help.. seems like only certain channels doing it.. and definitely related to OE#.. so somehow the shifter or eeprom doesn't like the pins going to output??
 
Last edited:
Given my continued interest in whether a Teensy can emulate an EPROM...

Is your code changing the direction controls on the pins that serve as ADDRESS lines? For any addressed hardware component, its address signals are always input to the chip. It would be the DATA lines that transition between hi-Z and being driven by the EPROM chip when its OE# signal goes active.
 
Given my continued interest in whether a Teensy can emulate an EPROM...

Is your code changing the direction controls on the pins that serve as ADDRESS lines? For any addressed hardware component, its address signals are always input to the chip. It would be the DATA lines that transition between hi-Z and being driven by the EPROM chip when its OE# signal goes active.

Typically, you'd expect this.

On older devices (such as my target device), to save pin counts, the lower address lines are shared with the data lines. The address is pushed into a latch, latched by the target then the target reads from those same pins.

For a *typical* 27c256 emulation, yes, I should be using other pins for data and setting those to Hi-Z/output.

For my final target though, I should be able to output on the lower address lines, as again, they become the data lines once the lower address pins are latched. I'm hoping to be able to do this for two reasons, first, to save pins on the teensy for other use, secondly, to actually prevent reading back the data from the emulator with a traditional eeprom reader via this quirk of the target - although it does make debugging tougher!

This could potentially explain the weirdness I'm seeing with testing with the eeprom reader/writer (willem45) but might be something else too..

Definitely something weird going on. Tried tying the high side of the shifter of one of the pins having issue and with it tied high, if I set all pins to high/low on the willem, the pin tied high changes as well!

Now I have to double check everything on my breadboard.. possibly something pinned wrong I guess.. :(
 
Last edited:
So, if I take an address pin that's having issues, say A12 and put it just to my scope to watch the high/low..

Then I trigger OE# low, which sets GDIR to output..

Then I pull all address pins high..

then I trigger OE# high.. this is where I get 1111 0000 1111 0000.. but now that bit 12 is just connected to my scope and nothing else, I was expecting it to go high.. not the case..

So, about the only thing I can thing are either a weird interaction with the chip reader OR that GPIO1 21-24 and 27-31 have maybe different defaults for drive strength/pull down/keeper that could be affecting me?
 
Given my continued interest in whether a Teensy can emulate an EPROM...

Is your code changing the direction controls on the pins that serve as ADDRESS lines? For any addressed hardware component, its address signals are always input to the chip. It would be the DATA lines that transition between hi-Z and being driven by the EPROM chip when its OE# signal goes active.

For your curiosity and my sanity I just wired up the extra 8 data pins and will try and feed my Willem..give me a few
 
I'm beginning to understand... you are not simulating ust a 27c256/27c512, you are simulating an 27c256/27c512 connected to a multiplexed address/data bus. That is a different animal...

Are you planning to use only OE# to either allow the low 8 bits to be address when high and driving those bits ad data when OE# goes low? That should be OK (though most processor buses have setup/hold requirements which is why they often have ALE signals separated from OE#).

An EPROM reader/writer thinks it's connected to an EPROM, not a device that simulates an EPROM with attached hardware.

So I'm beginning to believe some things based on your description of the
  • The Teensy's 8 high order address lines are connected through a level shifter, since they are always input to the Teensy (and the Teensy is a 3.3V not 5V part).
  • The Teensy's 8 low order address/data lines are connected through a bidirectional level shifter with tri-state capability on the 5V side.
  • Your setup is managing the direction of the level shifter.

The important part here is defining how the level shifter is controlled, to ensure that your Teensy emulation combined with the level shifter meets the timing specs required by the bus that you are attaching to.

A quick observation about OE# going high.... OPnce it goes high and your code drives the pins to input direction, the signal is not being driven by any device, either up or down. It will tend to drift (usually toward ground) over time, but that could take multiple microseconds. The fact that the signal doesn't appear to change immediately after OE# goes high could easily be expected.
 
Given my continued interest in whether a Teensy can emulate an EPROM...

Is your code changing the direction controls on the pins that serve as ADDRESS lines? For any addressed hardware component, its address signals are always input to the chip. It would be the DATA lines that transition between hi-Z and being driven by the EPROM chip when its OE# signal goes active.

I DID IT!#@% I've read back a whole 32k chip with the willem eprom reader. No errors, perfect readback.

Now, will it be fast enough for the ECU and can I share address lines with the ecu..will see..! Gotta load some real data into it now before I can try the ECU again.

*does a dance*

:D yay
 
So, on the scope I'm showing about 160ns between OE# going low and data going high with static data, closer to 180ns when using real data.

Also,

Code:
if (digitalReadFast(35) == HIGH) {

seems to take less cycles than

Code:
if (GPIO7_DR & OE_MASK) {

any ideas why that might be?

any ideas on things I can do to improve ISR latency and GPIO speed? :D

now that the chip reader can read..I feel like I'm close..just need to meet these timing goals heh..
 
I've tried this:

Code:
unsigned SPEED_MASK      = 0b0000'0000'0000'0000'0000'0000'1100'0001;
..
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_00 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_01 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_02 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_03 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_04 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_05 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_06 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_07 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_08 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_09 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_10 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_11 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_12 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_13 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_14 |= SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_00  |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_01 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_02 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_03 |= SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_00 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_01 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_02 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_03 |= SPEED_MASK;

to set the GPIO to 200mhz.. doesn't seem to affect my ISR time..

Code:
void setup() {
  attachInterrupt(digitalPinToInterrupt(35), pinChangeISR, CHANGE);
}

void pinChangeISR() {
  if (digitalReadFast(35) == HIGH) {
    GPIO7_GDIR &= ~OUTPUT_MASK; // disable data pins
  } else { // was high now is low
    //GPIO7_DR = (binaryBytes[currentAddress] & ~(OUTPUT_MASK)); // output data to data pins
   GPIO7_DR = ((255 + (255 << 12)) & OUTPUT_MASK); // output just 1's to data pins
    GPIO7_GDIR |= OUTPUT_MASK; // enable data pins
  }
}

changing the core clock speed with set_arm_clock(), I can get upto about 995mhz without issues (max ~72c) which results in about 112ns between OE# (pin 35) going low and the data being placed on the data pins going high..

any ideas on how to speed this up? interrupt latency? priorities? i was hoping setting the gpio to 200mhz would help..? am I setting IOMUXC_SW_PAD_CTL_PAD_GPIO_* correctly? looks like I am, according to manual, but I didn't see any difference really in slew rate when changing SRE, although a friend said the diff might only be 5-10ns for slew anyway..?
 
Quick comment about the GPIO registers:
Code:
readAddress = (GPIO6_DR >> 16);

If you are trying to read the actual data on the IO pins.
This may or may not work... that is from the Reference Manual
The 32-bit GPIO_DR register stores data that is ready to be driven to the output lines. If
the IOMUXC is in GPIO mode and a given GPIO direction bit is set, then the
corresponding DR bit is driven to the output. If a given GPIO direction bit is cleared, then
a read of GPIO_DR reflects the value of the corresponding signal. Two wait states are
required in read access for synchronization.

I often find it better to use the PSR register. If you look at how for example digitalRead and digitalReadFast work,
You will see in core_pins.h:
Code:
		} else if (pin == 35) {
			return (CORE_PIN35_PINREG & CORE_PIN35_BITMASK) ? 1 : 0;
And those two values are defined in same file: <Depending on which teensy may be different>
#define CORE_PIN35_PINREG GPIO8_PSR
or
#define CORE_PIN35_PINREG GPIO7_PSR


#define CORE_PIN35_BITMASK (1<<(CORE_PIN35_BIT))
 
Quick comment about the GPIO registers:
Code:
readAddress = (GPIO6_DR >> 16);

If you are trying to read the actual data on the IO pins.
This may or may not work... that is from the Reference Manual


I often find it better to use the PSR register. If you look at how for example digitalRead and digitalReadFast work,
You will see in core_pins.h:
Code:
		} else if (pin == 35) {
			return (CORE_PIN35_PINREG & CORE_PIN35_BITMASK) ? 1 : 0;
And those two values are defined in same file: <Depending on which teensy may be different>
#define CORE_PIN35_PINREG GPIO8_PSR
or
#define CORE_PIN35_PINREG GPIO7_PSR


#define CORE_PIN35_BITMASK (1<<(CORE_PIN35_BIT))

It seems to read the addresses just fine. Never had an issue with reading addresses just everything else, haha..but I see that note and maybe PSR is better - maybe this explains some interesting things I was seeing in cycle counts, ie that digitalReadFast was actually faster than reading the bits directly for one bit and not much slower for all bits with a stack of dRF() calls.

Now my only issue is trimming ISR latency so I can respond fast enough. Currently I can crap out data 138ns after the line triggers.. I need to shave at *least* 50-60ns off that. Would prefer to have data out in ~50ns. I found your post and this was going to be my next attempt.

@all... I thought I would sort of answer my self in wondering about ISR Speeds... So I mucked up a test program to get a general idea of differences in overhead...

So did a simple test program to see differences in ISR overhead. What it sort of shows to me, that one can still muck their way through and setup direct ISR if they want, but not sure worth the hassle. Setup where I did one pin as normal attachInterrupt() and had a jumper to another pin that I toggle N times... In the ISR I read pin state and echo it out to another pin. I then look to see delta time between the two pins changing state with Logic Analyzer. I also did with attachInterruptVector to different IO pin (0), swtiched that pin back to use GPIO1, and then tried as well..

Code sort of primitive, but shows some of the delta time differences. Would be better to hook up to external fast ISR generator like an encoder...
Code:
#define IRQ_PIN 0
#define ECHO_PIN 1
#define TRIGGER_PIN 2

#define IRQ2_PIN 3
#define ECHO2_PIN 4
#define TRIGGER2_PIN 5

#define CORE_PIN0_PINREG_SLOW  GPIO1_PSR
#define readFastIRQPin() ((CORE_PIN0_PINREG_SLOW & CORE_PIN0_BITMASK) ? 1 : 0)
uint32_t cycles_per_second = 100;  //
void setup() {
  while (!Serial && millis() < 5000) ;
  Serial.begin(115200);
  pinMode(IRQ_PIN, INPUT);
  pinMode(ECHO_PIN, OUTPUT);
  pinMode(TRIGGER_PIN, OUTPUT);
  digitalWrite(TRIGGER_PIN, LOW);
  Serial.printf("Test IRQ timing:\n    Pins IRQ:%d ECHO:%d TRIGGER: %d\n", IRQ_PIN, ECHO_PIN, TRIGGER_PIN);
  pinMode(IRQ2_PIN, INPUT);
  pinMode(ECHO2_PIN, OUTPUT);
  pinMode(TRIGGER2_PIN, OUTPUT);
  digitalWrite(TRIGGER2_PIN, LOW);
  Serial.printf("    Normal pins: IRQ:%d ECHO:%d TRIGGER: %d\n", IRQ2_PIN, ECHO2_PIN, TRIGGER2_PIN);
  delay(500);

  //---------------------------------------------
  // First lets setup pin 0 to slow mode and direct ISR...
  CCM_CCGR1 |= CCM_CCGR1_GPIO1(CCM_CCGR_ON);
  attachInterruptVector(IRQ_GPIO1_0_15, &pin_isr);
  NVIC_ENABLE_IRQ(IRQ_GPIO1_0_15);
  Serial.println("After Attach"); Serial.flush();
  // I think this will have GPIO1 handle its pin 3
  IOMUXC_GPR_GPR26 = 0xFFFFFFFF;
  Serial.println("After set IOMUXC"); Serial.flush();
  GPIO1_ICR1 = 0x00; // set to 0
  Serial.println("After set ICR1"); Serial.flush();
  GPIO1_GDIR &= ~0x08;  // Make sure set as input in GPIO1
  GPIO1_EDGE_SEL = 0x08; // set to 0
  GPIO1_ISR = 0xffff;
  Serial.println("After set ISR1"); Serial.flush();
  GPIO1_IMR = 0x08;
  Serial.println("After set GPIO"); Serial.flush();

  //---------------------------------------------
  // Next setup pin 3 to use attach interrupt in normal mode
  attachInterrupt(IRQ2_PIN, &pin2_isr, CHANGE);
}

volatile uint32_t irq_count = 0;

void pin_isr(void) {
  digitalWriteFast(ECHO_PIN, digitalReadFast(IRQ_PIN));
  irq_count++;
  GPIO1_ISR = 0x08; // clear the IRQ
  asm("dsb");
}

void pin2_isr(void) {
  digitalWriteFast(ECHO2_PIN, digitalReadFast(IRQ2_PIN));
  irq_count++;
}

void loop() {
  // put your main code here, to run repeatedly:
  Serial.printf("Enter cycles per second default(%d):", cycles_per_second);
  while (!Serial.available()) ;
  uint32_t cps = 0;
  int ch;
  while ((ch = Serial.read()) != -1) {
    if ((ch >= '0') && (ch <= '9')) cps = cps * 10 + ch - '0';
  }
  if (cps) {
    cycles_per_second = cps;
  }
  uint32_t delay_per_cycle = 1000000 / (cycles_per_second * 2);

  irq_count = 0;
  elapsedMicros em = 0;
  for (uint32_t i = 0; i < cycles_per_second; i++) {
    digitalWriteFast(TRIGGER_PIN, HIGH);
    delayMicroseconds(delay_per_cycle);
    digitalWriteFast(TRIGGER_PIN, LOW);
    delayMicroseconds(delay_per_cycle);
  }

  uint32_t delta_time = em;
  Serial.printf("\nDirect IRQs processed:  %u dt: %d calc:%d\n", irq_count,
                delta_time, delay_per_cycle * 2 * cycles_per_second);

  // Now do normal way

  irq_count = 0;
  em = 0;
  for (uint32_t i = 0; i < cycles_per_second; i++) {
    digitalWriteFast(TRIGGER2_PIN, HIGH);
    delayMicroseconds(delay_per_cycle);
    digitalWriteFast(TRIGGER2_PIN, LOW);
    delayMicroseconds(delay_per_cycle);
  }

  delta_time = em;
  Serial.printf("Normal IRQs processed:  %u dt: %d calc:%d\n", irq_count,
                delta_time, delay_per_cycle * 2 * cycles_per_second);

}

Note: The differences in elapsed micros times between direct and Normal, also gives an indication of how much more time was taken processing the ISR, as the normal loop is not running when the ISR code is running...
Code:
Test IRQ timing:
    Pins IRQ:0 ECHO:1 TRIGGER: 2
    Normal pins: IRQ:3 ECHO:4 TRIGGER: 5
After Attach
After set IOMUXC
After set ICR1
After set ISR1
After set GPIO
Enter cycles per second default(100):
Direct IRQs processed:  200 dt: 1000004 calc:1000000
Normal IRQs processed:  200 dt: 1000005 calc:1000000
Enter cycles per second default(100):
Direct IRQs processed:  200 dt: 1000004 calc:1000000
Normal IRQs processed:  200 dt: 1000005 calc:1000000
Enter cycles per second default(100):
Direct IRQs processed:  2000 dt: 1000040 calc:1000000
Normal IRQs processed:  2000 dt: 1000043 calc:1000000
Enter cycles per second default(1000):

Could show some Logic Analyzer output, to show some differences, but see differences like:
My direct way the delta time between the IO pin changing and my echo was something like: 55-66ns (looking at 500mhz so ...)
The way using attachInterrupt something like: 180-190ns

So again the simple answer is yes you can process pin change interrupts faster, but probably in majority of cases it aint worth it! As this is a really fast processor!
 
Fastest so far (from INPUT pin triggered low to data outputting as shown on scope) is about 75ns with static data and 100ns with actual data (at 900mhz).. the app note for ISR latency and your testing shows I should be able to do a bit better though.. not sure what I'm missing.. although I'm measuring on a scope - looks like you calculated your timings?

Code:
inline void pinChangeISR() {
  if (digitalReadFast(35) == HIGH) {
    GPIO7_GDIR &= ~OUTPUT_MASK; // disable data pins
  } else { // was high now is low
    GPIO7_DR = ((binaryBytes[GPIO6_PSR >> 16] + (binaryBytes[GPIO6_PSR >> 16] << 12)) & OUTPUT_MASK); // actual data
  // GPIO7_DR = ((255 + (255 << 12)) & OUTPUT_MASK); // static data
   
    GPIO7_GDIR |= OUTPUT_MASK; // enable data pins 
  }
  GPIO2_ISR &= ~0b0000'0000'0000'0000'0001'0000'0000'1000; // clear the IRQ
 asm("dsb");
}

..
in setup()
..
  // set gpio to 200mhz ..?

  const unsigned SPEED_MASK      = 0b0000'0000'0000'0000'0000'0000'1100'0001;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_00 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_01 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_02 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_03 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_04 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_05 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_06 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_07 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_08 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_09 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_10 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_11 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_12 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_13 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_14 |= SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_00 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_01 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_02 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_03 |= SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_00 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_01 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_02 |= SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_03 |= SPEED_MASK;

..

   CCM_CCGR0 |= CCM_CCGR0_GPIO2(CCM_CCGR_ON); // should set gpio 2 clock on..?

  attachInterruptVector(IRQ_GPIO2_16_31, &pinChangeISR);
  NVIC_ENABLE_IRQ(IRQ_GPIO2_16_31); 
  NVIC_SET_PRIORITY(IRQ_GPIO2_16_31, 0);

  IOMUXC_GPR_GPR27 &= ~0b0001'0000'0000'0000'0000'0000'0000'0000;

  //GPIO2_ICR1 = 0x00; // set to 0 (edge_sel overrides)
  //GPIO2_GDIR &=  0b0001'0000'0000'0000'0000'0000'0000'0000; 

  GPIO2_GDIR   &=  ~0b0001'0000'0000'0000'0000'0000'0000'0000; // make sure pin 28 is input

  GPIO2_EDGE_SEL |= 0b0001'0000'0000'0000'0000'0000'0000'0000; // pin 28 edge sel


  //GPIO2_ISR = 0xffff; // why setting status register??

  GPIO2_IMR |= 0b0001'0000'0000'0000'0000'0000'0000'0000; // mask pin 28 interrupt

My goal is under 90ns with data, preferably ~70ns..! I'm pretty close as it is..but..the last bit feels like miles away.
 
Given my continued interest in whether a Teensy can emulate an EPROM...

Is your code changing the direction controls on the pins that serve as ADDRESS lines? For any addressed hardware component, its address signals are always input to the chip. It would be the DATA lines that transition between hi-Z and being driven by the EPROM chip when its OE# signal goes active.

Teensy can emulate an eprom at ~80ns so far. Working in my target and my chip reader.

Thanks to everyones help.. woot
 
Holy cow this is interesting I wish I had less projects :D
I need 120ns or faster for my vintage Atari ST/e. :cool:
 
GPIO reads take longer than writes. If you can eliminate as many reads as possible, it will help. If you know the GPIO7_GDIR isn't changed except by the ISR, you can read it once at startup and compute static values to write. Maybe the same with GPIO2_ISR. Also if you can read and save GPIO6_PSR once instead of reading it twice. Could save 10-20 nsec.
 
Though your ISR is likely already cached given that it is probably being executed frequently in your tests, there are are some additional tools that may help with speed.

About two thirds of the way down the Teensy 4.1 web page at https://www.pjrc.com/store/teensy41.html are hints about how to guarantee that code and data are placed in predictably fast regions of memory. Look for "Static allocation keywords", particularly FASTRUN. That section of the page describes how the default memory allocation of statically declared structures are in DTCM (data tightly coupled memory). The FASTRUN allocation keyword places executable code (ISRs!) in ITCM (instruction tightly coupled memory).

ITCM and DTCM are inaccessible to other components (DMA, USB, Ethernet, etc) which means the Cortex-M7 core does not need to arbitrate for access to those regions.

Also, it is possible to set interrupt priority for interrupt sources. In the ARM NVIC, higher priority is associated with lower-numbered priority values. There are several examples the teensy4 directory (I searched for "NVIC_SET_PRIORITY"). If there's no contention for interrupt service, priorioty won't matter that much. However, setting a high priority (thus a lower number) decreases the probability of your ISR being delayed by some other interrupt source, lsuch as the SYSTICK timer (which fires at 1 kHz, so it happens often enough to be a possible issue in your application).

It's unnecessary to "inline" an ISR, since ISRs are by definition asynchronous to the normal path of execution.

...And congratulations on the 80 ns! I come from a world where I was Sooo proud of writing an ISR that serviced a synchronous serial device (Intel 8274 MPSC) on an 8 MHz 8088 in 125 usec... :)
 
GPIO reads take longer than writes. If you can eliminate as many reads as possible, it will help. If you know the GPIO7_GDIR isn't changed except by the ISR, you can read it once at startup and compute static values to write. Maybe the same with GPIO2_ISR. Also if you can read and save GPIO6_PSR once instead of reading it twice. Could save 10-20 nsec.

Thanks, I'll definitely take this into account.

I assumed with reading GPIO6_PSR twice within a single operation that the compiler would just optimize that out - but I haven't checked any assembly - I did originally try reading it into a variable and saving it, I forget whether it saved time or not..

As for GPIO7_GDIR, yes, it's only changed in the ISR, at least the bits that apply - however, I'm never reading this register anywhere, just setting it in order to gate the data pins - my particular target requires this as data0-7/address0-7 are shared - address fed to latch, latched, OE# pulled low to enable eprom (teensy) outputs, then data read by target.

Also, GPIO2_ISR isn't read anywhere either - it's written to clear the interrupt flag, if this isn't done, the interrupt loops.

I'm definitely open to trying just about anything to save a few ns though - while I've gotten it to work, this is while doing *nothing* else with the T4.1 at the moment, which sort of defeats the purpose (emulator kinda needs to be able to be updated in real time, at least for my application)
 
I assumed with reading GPIO6_PSR twice within a single operation that the compiler would just optimize that out
It's declared volatile, which tells the compiler to not optimize it out but read it every time.

As for GPIO7_GDIR, yes, it's only changed in the ISR, at least the bits that apply - however, I'm never reading this register anywhere, just setting it in order to gate the data pins

The &= and |= will do a read/write sequence. A plain = will only do a write.
 
Though your ISR is likely already cached given that it is probably being executed frequently in your tests, there are are some additional tools that may help with speed.

About two thirds of the way down the Teensy 4.1 web page at https://www.pjrc.com/store/teensy41.html are hints about how to guarantee that code and data are placed in predictably fast regions of memory. Look for "Static allocation keywords", particularly FASTRUN. That section of the page describes how the default memory allocation of statically declared structures are in DTCM (data tightly coupled memory). The FASTRUN allocation keyword places executable code (ISRs!) in ITCM (instruction tightly coupled memory).

ITCM and DTCM are inaccessible to other components (DMA, USB, Ethernet, etc) which means the Cortex-M7 core does not need to arbitrate for access to those regions.

I saw FASTRUN somewhere before but couldn't get it to work in my brief testing - but I think I did it wrong (the definition link in my IDE didn't immediately pop up, leading me to think it was undefined, thus I tried to define it again, I just checked again and waited longer for the IDE to figure it out and it is indeed defined) .. definitely will give this a shot.

I wasn't sure if I was making use of the TCM features, figured it might help if I was though.. still getting my head around this MCU but definitely made some progress in this last week..

Also, it is possible to set interrupt priority for interrupt sources. In the ARM NVIC, higher priority is associated with lower-numbered priority values. There are several examples the teensy4 directory (I searched for "NVIC_SET_PRIORITY"). If there's no contention for interrupt service, priorioty won't matter that much. However, setting a high priority (thus a lower number) decreases the probability of your ISR being delayed by some other interrupt source, lsuch as the SYSTICK timer (which fires at 1 kHz, so it happens often enough to be a possible issue in your application).

I have attempted to set the priority, it didn't seem to help with glitching or latency.. this is basically my setup right now..

Changing the GPIO speed & slew rate didn't seem to change anything noticeably either..?

Code:
inline void pinRising() {
    GPIO7_GDIR &= ~OUTPUT_MASK; // disable data pins
    GPIO2_ISR |= 0b0000'0000'0001'0000'0000'0000'0000'0000; // clear the IRQ by "status flags are cleared y writing a 1 to the corresponding bit pos"
    asm("dsb");    
}

inline void pinFalling() {
    GPIO7_DR = ((binaryBytes[GPIO6_PSR >> 16] + (binaryBytes[GPIO6_PSR >> 16] << 12)) & OUTPUT_MASK);
    GPIO7_GDIR |= OUTPUT_MASK; // enable data pins 
    GPIO2_ISR |= 0b0001'0000'0000'0000'0000'0000'0000'1000; // clear the IRQ by "status flags are cleared y writing a 1 to the corresponding bit pos"
    asm("dsb");    
}

setup() {
    set_arm_clock(995000000); //975mhz
    GPIO7_GDIR |= OUTPUT_MASK; // enable data pins as output

  // try and set GPIO pads to highest speed
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_00 = SPEED_MASK; // address 0-14
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_01 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_02 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_03 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_04 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_05 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_06 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_07 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_08 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_09 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_10 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_11 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_12 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_13 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B1_14 = SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_00 = SPEED_MASK; // data 0-7
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_01 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_02 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_03 = SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_00 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_01 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_02 = SPEED_MASK;
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_03 = SPEED_MASK;

  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_11 = SPEED_MASK; // oe
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B1_12 = SPEED_MASK; // oe
  
  CCM_CCGR0 |= CCM_CCGR0_GPIO2(CCM_CCGR_ON); // should set gpio 2 clock on

  attachInterruptVector(IRQ_GPIO2_16_31, &pinFalling);
  NVIC_ENABLE_IRQ(IRQ_GPIO2_16_31);

  attachInterruptVector(IRQ_GPIO2_0_15, &pinRising);
  NVIC_ENABLE_IRQ(IRQ_GPIO2_0_15); 

  NVIC_SET_PRIORITY(IRQ_GPIO2_16_31, 0);
  NVIC_SET_PRIORITY(IRQ_GPIO2_0_15, 1);

  IOMUXC_GPR_GPR27 &= ~0b0001'0000'0000'0000'0000'1000'0000'0000;
  
  GPIO2_ICR1 =  0b1010'1010'1010'1010'1010'1010'1010'1010; // rising edge for EVERYONE
  GPIO2_ICR2 =  0b1111'1111'1111'1111'1111'1111'1111'1111; // falling edge for EVERYONE

  GPIO2_IMR |= 0b0001'0000'0000'0000'0000'1000'0000'0000; // mask 2.28 ,2.11
}


It's unnecessary to "inline" an ISR, since ISRs are by definition asynchronous to the normal path of execution.

Wasn't sure about this..been a bit confused about inline, seems like many other people are as well, thus not finding the best information on it, also haven't dug too deep into that.

...And congratulations on the 80 ns! I come from a world where I was Sooo proud of writing an ISR that serviced a synchronous serial device (Intel 8274 MPSC) on an 8 MHz 8088 in 125 usec... :)

Thanks! I wouldn't have been able to do it without the help of you guys, PJRC for creating the teensy 4.1 and a guy on my discord who's already made an emulator for this target (but using external memory that the ECU/target read directly, the MCU just updating that ram so just have to make sure you're not writing the same byte it's reading probably).

I was looking at some other stuff in the manual that I thought might be able to be used for this, ie memory controller stuff/dma/flexio stuff, there's an example for implementing an 8080 bus, but I'm not sure how feasible any of those ideas are..seems like flexio is just a shifter setup which doesn't help for byte-by-byte parallel, if I was clocking in a set amount of bits/bytes over parallel, maybe, but I don't ever know what the next address will be. DMA could *maybe* be triggered by OE# somehow, writing the input address to a pointer to the data and triggering another dma to do the output but that feels a lot more complicated and crazy, while I originally thought it might be the way to go, I'm glad I was steered away from that, definitely more complicated and no guarantee the dma can respond any faster.. but at least dma wouldn't contend with other interrupts.

As far as using a memory controller, perhaps I could make it write out the bytes tricking it into thinking it's writing to some parallel sram or something. Not even sure this is possible though, but maybe as a 15-bit address 8-bit data NOR..?

https://www.nxp.com/docs/en/application-note/AN12051.pdf

i.MXRT introduces a new IP, SEMC, to external memory interfaces. SEMC is a multi-standard memorycontroller optimized for both high-performance and low pin-count. SEMC supports multiple externalmemories (SDRAM, Raw NAND, PNOR, PSRAM and 8080 display) in the same application withshared address and data pins.Table 4 shows the SEMC pin mux usage for different external memoriesand 8080 display interface

But even if the MCU could do it, the pins I'd need don't seem to be exposed. I guess SEMC pins map to EMC pins in ALT0 - I'd need at least EMC_01 -> EMC-07, EMC_30 -> EMC_36 .. Teensy exposes EMC_04:08,22,24:29,31,32,36,37.. And still, I'd be tricking the SEMC but it is an interesting idea..?
 
It's declared volatile, which tells the compiler to not optimize it out but read it every time.

Thanks, I admit my c/c++ is a little rusty -- I thought GPIO6_PSR was just a macro to the memory address, not a defined variable..?

The &= and |= will do a read/write sequence. A plain = will only do a write.

Hm, you make a point (again rusty).. For GPIO2_ISR, I could probably overwrite no problem, however, for GPIO7_GDIR, I only want to set certain bits to output without touching the others. At this point, I guess I'm sure nothing else needs to be an output so for now I suppose I can overwrite, but in the future this might not be the case.. any suggestions there?
 
Holy cow this is interesting I wish I had less projects :D
I need 120ns or faster for my vintage Atari ST/e. :cool:

I know the feeling, I've put other stuff behind this week because I've been obsessing over this. Truth be told, I've wanted to build an emulator for a very long time.

You should definitely be able to get up and running with your Atari and a Teensy. I was definitely thinking that another use could be for gaming systems and the like.
 
Though your ISR is likely already cached given that it is probably being executed frequently in your tests, there are are some additional tools that may help with speed.

About two thirds of the way down the Teensy 4.1 web page at https://www.pjrc.com/store/teensy41.html are hints about how to guarantee that code and data are placed in predictably fast regions of memory. Look for "Static allocation keywords", particularly FASTRUN. That section of the page describes how the default memory allocation of statically declared structures are in DTCM (data tightly coupled memory). The FASTRUN allocation keyword places executable code (ISRs!) in ITCM (instruction tightly coupled memory).

FASTRUN didn't seem to make a difference. :(
 
It's declared volatile, which tells the compiler to not optimize it out but read it every time.


Code:
    register uint32_t address = GPIO6_PSR >> 16;
    GPIO7_DR = ((binaryBytes[address] + (binaryBytes[address] << 12)) & OUTPUT_MASK);

vs.

Code:
    GPIO7_DR = ((binaryBytes[GPIO6_PSR >> 16] + (binaryBytes[GPIO6_PSR >> 16] << 12)) & OUTPUT_MASK);

does seem a little bit faster now (down to maybe 60ns?), thought I tested it before with adverse effects but I was also timing using cycle counter at that point..now I'm only trusting the scope :)
 
Last edited:
Is it just me or shouldn't the slew rate seem a bit tighter?

View attachment 25096

25ns/div .. taking like 25ns to change states..? Slope gets slightly steeper without a load on the pin but not much..? I can get it down to about 20ns..?

From https://www.nxp.com/docs/en/nxp/data-sheets/IMXRT1060CEC.pdf page 38/39 - the slowest 'output pad transition time' listed is ~5ns .. input transition time is max 25ns..?

shrug..

Also having a weird issue now.. the ISR hangs unless I clear bits 4 and 28? I should only have to clear 28?

Code:
in pinFalling

GPIO2_ISR |= 0b0001'0000'0000'0000'0000'0000'0000'1000; // clear the IRQ by "status flags are cleared y writing a 1 to the corresponding bit pos"

Code:
in pinRising

GPIO2_ISR |= 0b0000'0000'0001'0000'0000'0000'0000'0000; // clear the IRQ by "status flags are cleared y writing a 1 to the corresponding bit pos"

seems to work for pin 10.. also getting confused because sometimes it seems like the bits count from the left, other times from the right?

what's the best way to clear the ISR? I thought clearing in this state originally was setting to 0, but that caused hangs and in the manual it says "When the active condition has been detected, the corresponding bit remains set until cleared by software.Status flags are cleared by writing a 1 to the corresponding bit position." ..

you'd think setting all to 1 would clear all but this seems to hang as well..? hmm setting it to 0xFFFFFFFF in pinFalling seems okay but in pinRising cuases an issue heh

~40ns now @995mhz .. not bad.. 70ns at stock clock.. works with my target at stock clock..

now I gotta see if I can shave off any more (think I'm getting close to the limit) and try and implement the other features that make this usable (like actually modifying the emulated data heh)
 
Last edited:
Status
Not open for further replies.
Back
Top