Teensy 4.1 PSRAM Random Access Latency

nherzing

New member
My question is around latency limitations of the PSRAM on the Teensy 4.1 in regards to random access.

I was hoping to achieve worst case sub-microsecond tiny random access reads (reading a single byte) from the PSRAM. The majority of the time, I experience sub-microsecond reads (well below that limit) but I do experience the occasional read that takes particularly long. My understanding from reading the datasheets and startup.c is that there is some caching/prefetching/other complicating factors that make it non-trivial to immediately tell how long worst case random access read should take.

My question (for someone who understands this far deeper than I do) is two-fold:

1. Are worst case sub-microsecond tiny random access reads from PSRAM feasible?
2. If so, any suggestions on how one might achieve this? I'm guessing it would require a rewrite of the FlexSPI configuration to optimize short, random reads?

I wrote a little script to get basic timing data on sequential vs random access reads. Hopefully it's helpful to understand the types of reads I'm talking about. You can see that, on average, the random access reads are well below a microsecond but there are outliers in there that are causing issues for my use case (interfacing with an externally clocked system). You can also see that sequential access is an order of magnitude faster than random access on the PSRAM.

I'm a software dev with little hardware experience so sorry if I'm missing something obvious. Happy to provide any additional details that might be helpful. Thanks!

Code:
// Output running locally for 10000 reads:
//Random access
//Fast RAM: 221 us
//PSRAM: 6950 us
//Sequential access
//Fast RAM: 202 us
//PSRAM: 319 us

EXTMEM uint8_t extmem_data[0x10000];
uint8_t data[0x10000];

uint32_t memory_test() {
  while (!Serial) ;
  uint32_t result[10]; // prevent optimizing the loops away

  uint32_t num_samples = 100000;
  uint32_t num_iters = 10000;
  uint32_t *idxes = malloc(num_samples);

  // fill in random indices to access
  for (int i = 0; i < num_samples; i++) {
    idxes[i] = random(0x10000);
  }

  Serial.println("Random access");
  elapsedMicros took = 0;
  for (int i = 0; i < num_iters; i++) {
    result[i % 10] = data[idxes[i]];
  }
  unsigned long res = took;
  Serial.printf("Fast RAM: %d\n", res);

  took = 0;
  for (int i = 0; i < num_iters; i++) {
    result[i % 10] = extmem_data[idxes[i]];
  }
  res = took;
  Serial.printf("PSRAM: %d\n", res);

  Serial.println("Sequential access");
  took = 0;
  for (int i = 0; i < num_iters; i++) {
    result[i % 10] = data[i];
  }
  res = took;
  Serial.printf("Fast RAM: %d\n", res);

  took = 0;
  for (int i = 0; i < num_iters; i++) {
    result[i % 10] = extmem_data[i];
  }
  res = took;
  Serial.printf("PSRAM: %d\n", res);  

  return result[5];
}

void setup() {
  memory_test();
}

void loop() {
  // put your main code here, to run repeatedly:

}
 
There are tests posted to invalidate the cache that would allow seeing the true uncached timing behavior. Seems that was in Paul's PSRAM test sketch posted on the forum and on his github.

There is a 32KB cache area used across the External pad chips ( PSRAM or FLASH ) and the upper 256KB of on chip RAM2 in the 1062 - is may also cover the 'boot' flash where code resides?

The interface to the QSPI PSRAM is controlled by the 1062 processor. It chooses what resides in the cache over time and also how much to read ahead when a byte is requested. The block size or read ahead will assist with sequential access as subsequent bytes will already be local to the 1062. When it comes to Random access when only using some part of the block read then requesting the next block - the next read may be delayed as the prior read completes in some fashion? Perhaps the amount of read ahead is specified as the chip is initialized?

Not having dealt with the low level init others may have the answer - or looking at the code called in startup.c against the 1062 Ref Manual would have that answer.

<edit> : PJRC PSRAM test code : github.com/PaulStoffregen/teensy41_psram_memtest/blob/master/teensy41_psram_memtest.ino
> uses arm_dcache_flush_delete()
 
You should use the cycle counter ARM_DWT_CYCCNT for measuring such short times.

Those tests should be done with interrupts disabled. If you're occasionally seeing a much longer time, it could be due to an interrupt occurring during the measurement.
 
You should use the cycle counter ARM_DWT_CYCCNT for measuring such short times.

Those tests should be done with interrupts disabled. If you're occasionally seeing a much longer time, it could be due to an interrupt occurring during the measurement.

Good point Paul - that's twice I've skipped mentioning the really cool ARM_DWT_CYCCNT ... since I didn't look at the posted code for the generic info I gave.

See this post for example of use if needed: pjrc.com/threads/62385-Timer-interrupts-gt-1MHz
> On T_4.1 the ARM_DWT_CYCCNT is already running.
 
It will be interesting to see how well your numbers come out.

As mentioned the same cache I believe is used for more or less everything other than the ITCM/DTCM areas of memory.

Will be good to see if you can determine when you are getting cache hits and misses.

Sort of a secondary note: I do know that for example DMA out of this region is slower than both from PROGMEM and DMAMEM...

In another thread while I was trying to localize down a bug which turned out in SPI library, I was doing dma operation to output an image from memory to an RA8876...
I tested outputs from the flash memory, than DMAMEM and PSRAM... Could not do the image from lower memory as it would not fit.

Here is a logic analyzer output, showing the outputs. I did DMAMEM twice as to see if maybe after the first output it would fix the data for the second (it did not).
attachment.php


All four output groups shown are doing the same output, only difference it the pointer to the data... As you can see the last one took a lot longer to finish and it was the PSRAM.
Note the image is 243800*2 bytes long
 
@KurtE, et al.

Just remembered something that may affect the PSRAM performance as compared to DMAMEM or PROGMEM. Right now the PSRAM is set to default to 88Mhz in startup.c. However, we tested this at 132 Mhz and seems to work without issue. To change the PSRAM clock add this to your in your setup before accessing the PSRAM it may make a difference:
Code:
	  //Reset clock to 132 Mhz
	  CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_OFF);
	  CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
		  | CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz
	  CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);
 
Thanks everyone for the helpful suggestions. I dug in a bit more to understand the performance of different access patterns and to understand how data gets fetched from PSRAM.

It seems data is fetched in 32 byte bursts from the PSRAM. This lines up with the data sheet for one of the compatible PSRAM chips. You can observe this by looking at timing of sequential reads. Every 32nd read takes about 350 cycles (per ARM_DWT_CYCCNT) but subsequent reads of the next 31 bytes take only 11 cycles.

Looking at random access reads, I expected to see worst case 350 cycle reads since it would rarely hit the cache. Instead, I frequently saw up to double that. I don't know what explains this. Any thoughts?

In my quest for the fastest, worst case random access read, I ended up handwriting code to manually use the IP Command interface (Section 27.5.9). This gave me the most consistent timing for random reads at around 330 cycles. Almost all of the time (~290) is spent spin waiting for FLEXSPI_INTR_IPRXWA. There's obviously a correlation between that time and the PSRAM clock. Any suggestions on improving this or speeding it up would be appreciated.

This is sadly still too slow for my application of using the Teensy to emulate a banked memory controller clocked at 1MHz. I may need a different approach.

My full code is included below. (timing code and the manual FlexSPI code)


Code:
#include <Arduino.h>

EXTMEM uint8_t extmem_data[0x10000];

inline static uint8_t flexspi2_read(uint32_t addr) {
  FLEXSPI2_IPCR0 = addr;
  FLEXSPI2_IPCR1 = FLEXSPI_IPCR1_ISEQID(5);
  
  FLEXSPI2_IPCMD = FLEXSPI_IPCMD_TRG;

  while (!(FLEXSPI2_INTR & FLEXSPI_INTR_IPRXWA)) ;

  uint32_t data = FLEXSPI2_RFDR0;
  FLEXSPI2_INTR = FLEXSPI_INTR_IPCMDDONE | FLEXSPI_INTR_IPRXWA;
  return data;  
}

uint32_t spi_test(int num_iters) {
  uint8_t result[10];
  volatile uint32_t cycles = 0;

  uint32_t *idxes = (uint32_t *)malloc(num_iters);

  // fill in random indices to access
  for (int i = 0; i < num_iters; i++) {
    idxes[i] = random(0x10000);
  }  

  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;

    result[i % 10] = flexspi2_read(idxes[i]);

    uint32_t res = ARM_DWT_CYCCNT - cycles;
    Serial.printf("%d: (%d) %d\n", i, idxes[i], res);
    sei();
  }  
  Serial.println("Done!");

  return result[5];
}

uint32_t rand_test(uint8_t *data, int num_iters) {
  uint8_t result[10];
  volatile uint32_t cycles = 0;

  uint32_t *idxes = (uint32_t *)malloc(num_iters);

  // fill in random indices to access
  for (int i = 0; i < num_iters; i++) {
    idxes[i] = random(0x10000);
  }  

  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;

    result[i % 10] = data[idxes[i]];

    uint32_t res = ARM_DWT_CYCCNT - cycles;
    Serial.printf("%d: (%d) %d\n", i, idxes[i], res);
    sei();
  }  
  Serial.println("Done!");

  return result[5];
}

uint32_t seq_test(uint8_t *data, int num_iters) {
  uint8_t result[10];
  volatile uint32_t cycles = 0;

  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;
    result[i % 10] = data[i];
    uint32_t res = ARM_DWT_CYCCNT - cycles;

    Serial.printf("%d: %d\n", i, res);
    sei();

  }  
  Serial.println("Done!");

  return result[5];
}

void setup() {
  while (!Serial) ;

  const float clocks[4] = {396.0f, 720.0f, 664.62f, 528.0f};
  const float frequency = clocks[(CCM_CBCMR >> 8) & 3] / (float)(((CCM_CBCMR >> 29) & 7) + 1);
  Serial.printf("CCM_CBCMR=%08X (%.1f MHz)\n", CCM_CBCMR, frequency);

  for (int i = 0; i < 0x10000; i++) {
    extmem_data[i] = i;
  }
  
  Serial.println("Seq Test");
  arm_dcache_flush_delete((void *)extmem_data, 0x10000);  
  seq_test(extmem_data, 0x100);

  Serial.println("Rand Test");
  arm_dcache_flush_delete((void *)extmem_data, 0x10000);  
  rand_test(extmem_data, 0x100);

  Serial.println("SPI TEST");
  arm_dcache_flush_delete((void *)extmem_data, 0x10000);    
  spi_test(0x100);
}

void loop() {}
 
@KurtE, et al.

Just remembered something that may affect the PSRAM performance as compared to DMAMEM or PROGMEM. Right now the PSRAM is set to default to 88Mhz in startup.c. However, we tested this at 132 Mhz and seems to work without issue. To change the PSRAM clock add this to your in your setup before accessing the PSRAM it may make a difference:
Code:
	  //Reset clock to 132 Mhz
	  CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_OFF);
	  CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
		  | CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz
	  CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);

Thanks Mike,

I tried it with the test program that outputs an image using DMA to RA8876, and it appears like it works...
Before the change that page took about 279ms to output the image. With the change it cut it down to 222ms.
Note: Time to out from FLASHMEM and PROGMEM: is in the range of about 115ms
 
@KurtE
Thanks Kurt. At least it helped in performance. Wonder if there is anything else that could speed things up with PSRAM?


@nherzing ....

I reset the PSRAM clock to 132Mhz per post #6 and it reduced the cycles from 331 down to 271.

Code:
Seq Test: ~11 cycles, on  32byte boundaries ~250
Rand Test: ~9 to 464
SPI Test: max 271, guess on average - 269
 
Thanks everyone for the helpful suggestions. I dug in a bit more to understand the performance of different access patterns and to understand how data gets fetched from PSRAM.

It seems data is fetched in 32 byte bursts from the PSRAM. This lines up with the data sheet for one of the compatible PSRAM chips. You can observe this by looking at timing of sequential reads. Every 32nd read takes about 350 cycles (per ARM_DWT_CYCCNT) but subsequent reads of the next 31 bytes take only 11 cycles.
...

32 byte read on the first has to complete before the 2nd random read does another 32 byte read to complete. It seems it should return in the middle.

Test would perhaps be better to have the for loop store the time value into an array parallel to idxes[] and move printing of those arrays to outside in another loop?

Also putting the results array in : uint32_t *idxes = malloc(num_samples); versus RAM1 fast low memory (global alloc) isn't helping as that RAM2 is slower.

Oh yeah - and there is the chip clocking speed setup 88 .vs. 132.
 
@KurtE, et al.

Just remembered something that may affect the PSRAM performance as compared to DMAMEM or PROGMEM. Right now the PSRAM is set to default to 88Mhz in startup.c. However, we tested this at 132 Mhz and seems to work without issue. To change the PSRAM clock add this to your in your setup before accessing the PSRAM it may make a difference:
Code:
      //Reset clock to 132 Mhz
      CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_OFF);
      CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
          | CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz
      CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);
holy crap this is old but this right here saved me so much damb time trying to figure out why my programs were getting same performance as an esp32s3. the esp32s3 has same psram speed as teensy default and clocking it to 132mhz made my prgram work way way faster due to less memory latency now that the processor could actually get the data quicker to actually work on it
 
Revisiting the original problem 4+ years later...

Looking at code from a different project, something very obvious occurred to me: why not bitbang the PSRAM instead of using FlexSPI?
I took the code from earlier posts and modified it:
Code:
#include <Arduino.h>
#define PIN_PSRAM_D3        54
#define PIN_PSRAM_D2        50
#define PIN_PSRAM_D1        49
#define PIN_PSRAM_D0        52
#define PIN_PSRAM_CLK       53
#define PIN_PSRAM_CS_n      48
#define PSRAM_RESET_VALUE     0x01000000
#define PSRAM_CLK_HIGH        0x02000000
static void PSRAM_Write_Clk_Cycle(uint8_t d) {
  GPIO9_DR = (d & 0xF0) << 22;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
  GPIO9_DR = (d & 0x0F) << 26;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
}
static uint8_t PSRAM_Read_Clk_Cycle() {
  GPIO9_DR_CLEAR = PSRAM_CLK_HIGH;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
  uint8_t d = (GPIO9_DR >> 22) & 0xF0;
  GPIO9_DR_CLEAR = PSRAM_CLK_HIGH;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
  d |= (GPIO9_DR >> 26) & 0x0F;
  return d;
}
static void PSRAM_Configure() {
  delayMicroseconds(200);
  pinMode(PIN_PSRAM_CLK, OUTPUT);
  pinMode(PIN_PSRAM_CS_n, OUTPUT);
  pinMode(PIN_PSRAM_D3, INPUT);
  pinMode(PIN_PSRAM_D2, INPUT);
  pinMode(PIN_PSRAM_D1, INPUT);
  pinMode(PIN_PSRAM_D0, OUTPUT);
  GPIO9_DR = PSRAM_RESET_VALUE;
  delayMicroseconds(1);
  
  // enter QPI mode - send SPI cmd 0x35
  PSRAM_Write_Clk_Cycle(0x00);
  PSRAM_Write_Clk_Cycle(0x11);
  PSRAM_Write_Clk_Cycle(0x01);
  PSRAM_Write_Clk_Cycle(0x01);
  GPIO9_DR = PSRAM_RESET_VALUE;
  GPIO9_GDIR = 0x3F000000;
}
static uint8_t PSRAM_Read8(const uint8_t* c) {
  // Send Command: Quad Read (slow) 0x0B
  PSRAM_Write_Clk_Cycle(0x0B);
  // Send 24-bit address
  uint32_t offset = (uint32_t)c;
  PSRAM_Write_Clk_Cycle(offset >> 16);
  PSRAM_Write_Clk_Cycle(offset >> 8);
  PSRAM_Write_Clk_Cycle(offset);
  // Four clocks of hi-Z
  GPIO9_GDIR = 0x03000000;
  PSRAM_Read_Clk_Cycle();
  PSRAM_Read_Clk_Cycle();
  uint8_t d = PSRAM_Read_Clk_Cycle();
  // prepare to send next command
  GPIO9_DR = PSRAM_RESET_VALUE;
  GPIO9_GDIR = 0x3F000000;
  return d;
}
static EXTMEM uint8_t extmem_data[0x10000];
static uint32_t cycles_to_ns(uint32_t cycles) {
  return (uint32_t)(1000000000llu * cycles / F_CPU_ACTUAL);
}
uint32_t rand_test(const uint8_t* src, int num_iters) {
  uint8_t result[10];
  uint32_t cycles = 0;
  uint32_t *idxes = (uint32_t *)alloca(num_iters*sizeof(uint32_t));
  // fill in random indices to access
  for (int i = 0; i < num_iters; i++) {
    idxes[i] = random(sizeof(extmem_data));
  }  
  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;
    result[i % 10] = PSRAM_Read8(src+idxes[i]);
    uint32_t res = ARM_DWT_CYCCNT - cycles;
    sei();
    Serial.printf("%d(%08X):\t%d cycles", i, idxes[i], res);
    if (result[i % 10] != ((uint8_t)idxes[i] ^ 0x55)) {
      digitalWriteFast(LED_BUILTIN, HIGH);
      Serial.print(" !");
    }
    Serial.println();
  }  
  Serial.println("Done!");
  return result[5];
}
uint32_t seq_test(const uint8_t* src, int num_iters) {
  uint8_t result[10];
  uint32_t cycles = 0;
  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;
    result[i % 10] = PSRAM_Read8(src+i);
    uint32_t res = ARM_DWT_CYCCNT - cycles;
    sei();
    Serial.printf("%d: %d cycles", i, res);
    if (result[i % 10] != ((uint8_t)i ^ 0x55)) {
      digitalWriteFast(LED_BUILTIN, HIGH);
      Serial.print(" !");
    }
    Serial.println();
  }  
  Serial.println("Done!");
  return result[5];
}
void setup() {
  size_t i;
  while (!Serial) ;
  for (i = 0; i < sizeof(extmem_data); i++) {
    extmem_data[i] = i ^ 0x55;
  }
  arm_dcache_flush_delete((void *)extmem_data, i);
  PSRAM_Configure();
  pinMode(LED_BUILTIN,OUTPUT);
  digitalWriteFast(LED_BUILTIN, LOW);
  
  Serial.println("Seq Test");
  seq_test(extmem_data, 0x100);
  Serial.println("Rand Test");
  rand_test(extmem_data, 0x100);
}
void loop() {}
The results:
Code:
Seq Test
0: 142 cycles
1: 142 cycles
2: 142 cycles
3: 142 cycles
4: 142 cycles
5: 142 cycles
6: 142 cycles
7: 142 cycles
8: 142 cycles
9: 142 cycles
10: 142 cycles
11: 142 cycles
12: 142 cycles
13: 142 cycles
14: 142 cycles
15: 142 cycles
16: 142 cycles
17: 142 cycles
18: 142 cycles
19: 142 cycles
20: 142 cycles
21: 142 cycles
22: 142 cycles
23: 142 cycles
24: 142 cycles
25: 142 cycles
26: 142 cycles
27: 142 cycles
28: 142 cycles
29: 142 cycles
30: 142 cycles
31: 142 cycles
32: 142 cycles
33: 142 cycles
34: 142 cycles
35: 142 cycles
36: 142 cycles
37: 142 cycles
38: 142 cycles
39: 142 cycles
40: 142 cycles
41: 142 cycles
42: 142 cycles
43: 142 cycles
44: 142 cycles
45: 142 cycles
46: 142 cycles
47: 142 cycles
48: 142 cycles
49: 142 cycles
50: 142 cycles
51: 142 cycles
52: 142 cycles
53: 142 cycles
54: 142 cycles
55: 142 cycles
56: 142 cycles
57: 142 cycles
58: 142 cycles
59: 142 cycles
60: 142 cycles
61: 142 cycles
62: 142 cycles
63: 142 cycles
64: 142 cycles
65: 142 cycles
66: 142 cycles
67: 142 cycles
68: 142 cycles
69: 142 cycles
70: 142 cycles
71: 142 cycles
72: 142 cycles
73: 142 cycles
74: 142 cycles
75: 142 cycles
76: 142 cycles
77: 142 cycles
78: 142 cycles
79: 142 cycles
80: 142 cycles
81: 142 cycles
82: 142 cycles
83: 142 cycles
84: 142 cycles
85: 142 cycles
86: 142 cycles
87: 142 cycles
88: 142 cycles
89: 142 cycles
90: 142 cycles
91: 142 cycles
92: 142 cycles
93: 142 cycles
94: 142 cycles
95: 142 cycles
96: 142 cycles
97: 142 cycles
98: 142 cycles
99: 142 cycles
100: 142 cycles
101: 142 cycles
102: 142 cycles
103: 142 cycles
104: 142 cycles
105: 142 cycles
106: 142 cycles
107: 142 cycles
108: 142 cycles
109: 142 cycles
110: 142 cycles
111: 142 cycles
112: 142 cycles
113: 142 cycles
114: 142 cycles
115: 142 cycles
116: 142 cycles
117: 142 cycles
118: 142 cycles
119: 142 cycles
120: 142 cycles
121: 142 cycles
122: 142 cycles
123: 142 cycles
124: 142 cycles
125: 142 cycles
126: 142 cycles
127: 142 cycles
128: 142 cycles
129: 142 cycles
130: 142 cycles
131: 142 cycles
132: 142 cycles
133: 142 cycles
134: 142 cycles
135: 142 cycles
136: 142 cycles
137: 142 cycles
138: 142 cycles
139: 142 cycles
140: 142 cycles
141: 142 cycles
142: 142 cycles
143: 142 cycles
144: 142 cycles
145: 142 cycles
146: 142 cycles
147: 142 cycles
148: 142 cycles
149: 142 cycles
150: 142 cycles
151: 142 cycles
152: 142 cycles
153: 142 cycles
154: 142 cycles
155: 142 cycles
156: 142 cycles
157: 142 cycles
158: 142 cycles
159: 142 cycles
160: 142 cycles
161: 142 cycles
162: 142 cycles
163: 142 cycles
164: 142 cycles
165: 142 cycles
166: 142 cycles
167: 142 cycles
168: 142 cycles
169: 142 cycles
170: 142 cycles
171: 142 cycles
172: 142 cycles
173: 142 cycles
174: 142 cycles
175: 142 cycles
176: 142 cycles
177: 142 cycles
178: 142 cycles
179: 142 cycles
180: 142 cycles
181: 142 cycles
182: 142 cycles
183: 142 cycles
184: 142 cycles
185: 142 cycles
186: 142 cycles
187: 142 cycles
188: 142 cycles
189: 142 cycles
190: 142 cycles
191: 142 cycles
192: 142 cycles
193: 142 cycles
194: 142 cycles
195: 142 cycles
196: 142 cycles
197: 142 cycles
198: 142 cycles
199: 142 cycles
200: 142 cycles
201: 142 cycles
202: 142 cycles
203: 142 cycles
204: 142 cycles
205: 142 cycles
206: 142 cycles
207: 142 cycles
208: 142 cycles
209: 142 cycles
210: 142 cycles
211: 142 cycles
212: 142 cycles
213: 142 cycles
214: 142 cycles
215: 142 cycles
216: 142 cycles
217: 142 cycles
218: 142 cycles
219: 142 cycles
220: 142 cycles
221: 142 cycles
222: 142 cycles
223: 142 cycles
224: 142 cycles
225: 142 cycles
226: 142 cycles
227: 142 cycles
228: 142 cycles
229: 142 cycles
230: 142 cycles
231: 142 cycles
232: 142 cycles
233: 142 cycles
234: 142 cycles
235: 142 cycles
236: 142 cycles
237: 142 cycles
238: 142 cycles
239: 142 cycles
240: 142 cycles
241: 142 cycles
242: 142 cycles
243: 142 cycles
244: 142 cycles
245: 142 cycles
246: 142 cycles
247: 142 cycles
248: 142 cycles
249: 142 cycles
250: 142 cycles
251: 142 cycles
252: 142 cycles
253: 142 cycles
254: 142 cycles
255: 142 cycles
Done!
Rand Test
0(0000CE42):    142 cycles
1(00005EFB):    142 cycles
2(0000B19F):    142 cycles
3(000056E0):    142 cycles
4(0000A759):    142 cycles
5(0000C881):    142 cycles
6(0000CE5A):    142 cycles
7(00009681):    142 cycles
8(00002166):    142 cycles
9(0000B1A1):    142 cycles
10(0000C30E):   142 cycles
11(0000095C):   142 cycles
12(0000B14E):   142 cycles
13(00009EB4):   142 cycles
14(00005ADA):   142 cycles
15(0000D4EC):   142 cycles
16(0000DE2F):   142 cycles
17(00000FF5):   142 cycles
18(0000A560):   142 cycles
19(00006F7E):   142 cycles
20(0000D08A):   142 cycles
21(00003DED):   142 cycles
22(0000CE3B):   142 cycles
23(000096CA):   142 cycles
24(0000B92B):   142 cycles
25(0000CED5):   142 cycles
26(00002582):   142 cycles
27(00007A69):   142 cycles
28(0000C51D):   142 cycles
29(0000F3C3):   142 cycles
30(0000BE84):   142 cycles
31(00000413):   142 cycles
32(00009A0E):   142 cycles
33(000010A6):   142 cycles
34(0000426A):   142 cycles
35(00004A10):   142 cycles
36(000098B3):   142 cycles
37(0000193C):   142 cycles
38(0000E6B4):   142 cycles
39(0000634E):   142 cycles
40(0000D69A):   142 cycles
41(00002480):   142 cycles
42(0000704F):   142 cycles
43(00007161):   142 cycles
44(0000D706):   142 cycles
45(0000E882):   142 cycles
46(0000B517):   142 cycles
47(000038F4):   142 cycles
48(000034CA):   142 cycles
49(0000E676):   142 cycles
50(00004EBA):   142 cycles
51(0000A484):   142 cycles
52(0000F4E2):   142 cycles
53(000027DC):   142 cycles
54(0000F7C9):   142 cycles
55(0000E166):   142 cycles
56(0000044F):   142 cycles
57(000011A5):   142 cycles
58(00009E07):   142 cycles
59(0000FD8C):   142 cycles
60(00000A8E):   142 cycles
61(00001836):   142 cycles
62(000082D1):   142 cycles
63(00009C97):   142 cycles
64(000094DC):   142 cycles
65(00002C70):   142 cycles
66(00007F1B):   142 cycles
67(0000FE0F):   142 cycles
68(0000C938):   142 cycles
69(0000AE8F):   142 cycles
70(00005AFB):   142 cycles
71(00004B91):   142 cycles
72(00005BB0):   142 cycles
73(0000AE20):   142 cycles
74(0000BBB7):   142 cycles
75(00000E9F):   142 cycles
76(00002790):   142 cycles
77(00007A62):   142 cycles
78(0000D36C):   142 cycles
79(00007081):   142 cycles
80(00005C19):   142 cycles
81(00007D11):   142 cycles
82(000016C4):   142 cycles
83(0000BA32):   142 cycles
84(00003047):   142 cycles
85(00009404):   142 cycles
86(0000A6EF):   142 cycles
87(0000A4FB):   142 cycles
88(0000737D):   142 cycles
89(0000488F):   142 cycles
90(0000B8F2):   142 cycles
91(00004AC3):   142 cycles
92(00005ACE):   142 cycles
93(0000CBCB):   142 cycles
94(0000A5F9):   142 cycles
95(0000A5E2):   142 cycles
96(0000B9E0):   142 cycles
97(000057E3):   142 cycles
98(0000052D):   142 cycles
99(0000CFB7):   142 cycles
100(0000274E):  142 cycles
101(00007EE0):  142 cycles
102(0000B75F):  142 cycles
103(0000F08C):  142 cycles
104(0000856D):  142 cycles
105(0000CB68):  142 cycles
106(00001CD3):  142 cycles
107(00008540):  142 cycles
108(0000481F):  142 cycles
109(0000F2D5):  142 cycles
110(0000AAF4):  142 cycles
111(0000B831):  142 cycles
112(0000D399):  142 cycles
113(0000EABE):  142 cycles
114(0000A174):  142 cycles
115(0000C882):  142 cycles
116(0000D886):  142 cycles
117(000067BF):  142 cycles
118(000056B3):  142 cycles
119(000042B5):  142 cycles
120(00009F07):  142 cycles
121(0000B13E):  142 cycles
122(000089C2):  142 cycles
123(00003752):  142 cycles
124(0000248B):  142 cycles
125(0000447E):  142 cycles
126(0000E870):  142 cycles
127(000023A6):  142 cycles
128(00007502):  142 cycles
129(0000F547):  142 cycles
130(00003D45):  142 cycles
131(000087A0):  142 cycles
132(00003B23):  142 cycles
133(0000AC37):  142 cycles
134(00005317):  142 cycles
135(00002A7E):  142 cycles
136(0000B909):  142 cycles
137(00003926):  142 cycles
138(0000011F):  142 cycles
139(0000C73E):  142 cycles
140(0000BBC7):  142 cycles
141(00003BBB):  142 cycles
142(000081ED):  142 cycles
143(000002F9):  142 cycles
144(000049C5):  142 cycles
145(00003BE5):  142 cycles
146(00004B9B):  142 cycles
147(0000C72D):  142 cycles
148(00006002):  142 cycles
149(00005D81):  142 cycles
150(0000ED0F):  142 cycles
151(00009676):  142 cycles
152(0000583F):  142 cycles
153(0000BD2B):  142 cycles
154(00005635):  142 cycles
155(0000BB90):  142 cycles
156(0000EC7F):  142 cycles
157(0000A15A):  142 cycles
158(0000272F):  142 cycles
159(0000B014):  142 cycles
160(0000113C):  142 cycles
161(0000AB3B):  142 cycles
162(0000C41B):  142 cycles
163(0000D055):  142 cycles
164(00008434):  142 cycles
165(00008A8A):  142 cycles
166(000082C5):  142 cycles
167(0000835B):  142 cycles
168(0000CC50):  142 cycles
169(0000B144):  142 cycles
170(0000078F):  142 cycles
171(0000739E):  142 cycles
172(00008F4C):  142 cycles
173(0000033D):  142 cycles
174(0000A18A):  142 cycles
175(0000A89F):  142 cycles
176(00006B96):  142 cycles
177(00007485):  142 cycles
178(00000015):  142 cycles
179(00006980):  142 cycles
180(00008993):  142 cycles
181(000036FE):  142 cycles
182(000092E6):  142 cycles
183(000054C1):  142 cycles
184(00008849):  142 cycles
185(00007B2E):  142 cycles
186(00003E95):  142 cycles
187(0000C48B):  142 cycles
188(000091F6):  142 cycles
189(0000E400):  142 cycles
190(0000E324):  142 cycles
191(00007AE1):  142 cycles
192(000071B4):  142 cycles
193(000000F3):  142 cycles
194(00006580):  142 cycles
195(0000E81B):  142 cycles
196(00007B03):  142 cycles
197(00001FDD):  142 cycles
198(000002EE):  142 cycles
199(0000922D):  142 cycles
200(0000D02A):  142 cycles
201(00007BE5):  142 cycles
202(00002AA4):  142 cycles
203(0000B679):  142 cycles
204(0000E55F):  142 cycles
205(0000EDDA):  142 cycles
206(0000A80F):  142 cycles
207(00008A0C):  142 cycles
208(00001FE7):  142 cycles
209(0000B3B5):  142 cycles
210(00003CB8):  142 cycles
211(00008B42):  142 cycles
212(0000D9B2):  142 cycles
213(00004A06):  142 cycles
214(0000D851):  142 cycles
215(0000C0C5):  142 cycles
216(00000140):  142 cycles
217(00002312):  142 cycles
218(0000757C):  142 cycles
219(0000517D):  142 cycles
220(00001AA7):  142 cycles
221(000002CF):  142 cycles
222(0000A3F3):  142 cycles
223(0000BB1B):  142 cycles
224(00000853):  142 cycles
225(00009119):  142 cycles
226(0000365E):  142 cycles
227(00007CAF):  142 cycles
228(0000F5FC):  142 cycles
229(000088E7):  142 cycles
230(0000FB26):  142 cycles
231(0000BA3B):  142 cycles
232(0000AD86):  142 cycles
233(00005B05):  142 cycles
234(0000B9AA):  142 cycles
235(0000725F):  142 cycles
236(0000DDAB):  142 cycles
237(00003928):  142 cycles
238(0000AFA0):  142 cycles
239(00005334):  142 cycles
240(0000B0B5):  142 cycles
241(000048AB):  142 cycles
242(0000EF83):  142 cycles
243(0000B32A):  142 cycles
244(0000BBAC):  142 cycles
245(00005070):  142 cycles
246(0000F972):  142 cycles
247(0000B9A6):  142 cycles
248(00004167):  142 cycles
249(0000F917):  142 cycles
250(000083C7):  142 cycles
251(0000B701):  142 cycles
252(0000B158):  142 cycles
253(00000963):  142 cycles
254(00006F45):  142 cycles
255(000038BA):  142 cycles
Done!
That's a fixed 142 cycles for any location. At 600MHz that's equivalent to a fixed access time of 236.667 nanoseconds.
It also works at 720MHz, which means the access time would be 197.22 nanoseconds - over 5 times smaller than OP's target of 1 microsecond.
 
Revisiting the original problem 4+ years later...

Looking at code from a different project, something very obvious occurred to me: why not bitbang the PSRAM instead of using FlexSPI?
I took the code from earlier posts and modified it:
Code:
#include <Arduino.h>
#define PIN_PSRAM_D3        54
#define PIN_PSRAM_D2        50
#define PIN_PSRAM_D1        49
#define PIN_PSRAM_D0        52
#define PIN_PSRAM_CLK       53
#define PIN_PSRAM_CS_n      48
#define PSRAM_RESET_VALUE     0x01000000
#define PSRAM_CLK_HIGH        0x02000000
static void PSRAM_Write_Clk_Cycle(uint8_t d) {
  GPIO9_DR = (d & 0xF0) << 22;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
  GPIO9_DR = (d & 0x0F) << 26;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
}
static uint8_t PSRAM_Read_Clk_Cycle() {
  GPIO9_DR_CLEAR = PSRAM_CLK_HIGH;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
  uint8_t d = (GPIO9_DR >> 22) & 0xF0;
  GPIO9_DR_CLEAR = PSRAM_CLK_HIGH;
  GPIO9_DR_SET = PSRAM_CLK_HIGH;
  d |= (GPIO9_DR >> 26) & 0x0F;
  return d;
}
static void PSRAM_Configure() {
  delayMicroseconds(200);
  pinMode(PIN_PSRAM_CLK, OUTPUT);
  pinMode(PIN_PSRAM_CS_n, OUTPUT);
  pinMode(PIN_PSRAM_D3, INPUT);
  pinMode(PIN_PSRAM_D2, INPUT);
  pinMode(PIN_PSRAM_D1, INPUT);
  pinMode(PIN_PSRAM_D0, OUTPUT);
  GPIO9_DR = PSRAM_RESET_VALUE;
  delayMicroseconds(1);
  
  // enter QPI mode - send SPI cmd 0x35
  PSRAM_Write_Clk_Cycle(0x00);
  PSRAM_Write_Clk_Cycle(0x11);
  PSRAM_Write_Clk_Cycle(0x01);
  PSRAM_Write_Clk_Cycle(0x01);
  GPIO9_DR = PSRAM_RESET_VALUE;
  GPIO9_GDIR = 0x3F000000;
}
static uint8_t PSRAM_Read8(const uint8_t* c) {
  // Send Command: Quad Read (slow) 0x0B
  PSRAM_Write_Clk_Cycle(0x0B);
  // Send 24-bit address
  uint32_t offset = (uint32_t)c;
  PSRAM_Write_Clk_Cycle(offset >> 16);
  PSRAM_Write_Clk_Cycle(offset >> 8);
  PSRAM_Write_Clk_Cycle(offset);
  // Four clocks of hi-Z
  GPIO9_GDIR = 0x03000000;
  PSRAM_Read_Clk_Cycle();
  PSRAM_Read_Clk_Cycle();
  uint8_t d = PSRAM_Read_Clk_Cycle();
  // prepare to send next command
  GPIO9_DR = PSRAM_RESET_VALUE;
  GPIO9_GDIR = 0x3F000000;
  return d;
}
static EXTMEM uint8_t extmem_data[0x10000];
static uint32_t cycles_to_ns(uint32_t cycles) {
  return (uint32_t)(1000000000llu * cycles / F_CPU_ACTUAL);
}
uint32_t rand_test(const uint8_t* src, int num_iters) {
  uint8_t result[10];
  uint32_t cycles = 0;
  uint32_t *idxes = (uint32_t *)alloca(num_iters*sizeof(uint32_t));
  // fill in random indices to access
  for (int i = 0; i < num_iters; i++) {
    idxes[i] = random(sizeof(extmem_data));
  }  
  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;
    result[i % 10] = PSRAM_Read8(src+idxes[i]);
    uint32_t res = ARM_DWT_CYCCNT - cycles;
    sei();
    Serial.printf("%d(%08X):\t%d cycles", i, idxes[i], res);
    if (result[i % 10] != ((uint8_t)idxes[i] ^ 0x55)) {
      digitalWriteFast(LED_BUILTIN, HIGH);
      Serial.print(" !");
    }
    Serial.println();
  }  
  Serial.println("Done!");
  return result[5];
}
uint32_t seq_test(const uint8_t* src, int num_iters) {
  uint8_t result[10];
  uint32_t cycles = 0;
  for (int i = 0; i < num_iters; i++) {
    cli()
    cycles = ARM_DWT_CYCCNT;
    result[i % 10] = PSRAM_Read8(src+i);
    uint32_t res = ARM_DWT_CYCCNT - cycles;
    sei();
    Serial.printf("%d: %d cycles", i, res);
    if (result[i % 10] != ((uint8_t)i ^ 0x55)) {
      digitalWriteFast(LED_BUILTIN, HIGH);
      Serial.print(" !");
    }
    Serial.println();
  }  
  Serial.println("Done!");
  return result[5];
}
void setup() {
  size_t i;
  while (!Serial) ;
  for (i = 0; i < sizeof(extmem_data); i++) {
    extmem_data[i] = i ^ 0x55;
  }
  arm_dcache_flush_delete((void *)extmem_data, i);
  PSRAM_Configure();
  pinMode(LED_BUILTIN,OUTPUT);
  digitalWriteFast(LED_BUILTIN, LOW);
  
  Serial.println("Seq Test");
  seq_test(extmem_data, 0x100);
  Serial.println("Rand Test");
  rand_test(extmem_data, 0x100);
}
void loop() {}
The results:
Code:
Seq Test
0: 142 cycles
1: 142 cycles
2: 142 cycles
3: 142 cycles
4: 142 cycles
5: 142 cycles
6: 142 cycles
7: 142 cycles
8: 142 cycles
9: 142 cycles
10: 142 cycles
11: 142 cycles
12: 142 cycles
13: 142 cycles
14: 142 cycles
15: 142 cycles
16: 142 cycles
17: 142 cycles
18: 142 cycles
19: 142 cycles
20: 142 cycles
21: 142 cycles
22: 142 cycles
23: 142 cycles
24: 142 cycles
25: 142 cycles
26: 142 cycles
27: 142 cycles
28: 142 cycles
29: 142 cycles
30: 142 cycles
31: 142 cycles
32: 142 cycles
33: 142 cycles
34: 142 cycles
35: 142 cycles
36: 142 cycles
37: 142 cycles
38: 142 cycles
39: 142 cycles
40: 142 cycles
41: 142 cycles
42: 142 cycles
43: 142 cycles
44: 142 cycles
45: 142 cycles
46: 142 cycles
47: 142 cycles
48: 142 cycles
49: 142 cycles
50: 142 cycles
51: 142 cycles
52: 142 cycles
53: 142 cycles
54: 142 cycles
55: 142 cycles
56: 142 cycles
57: 142 cycles
58: 142 cycles
59: 142 cycles
60: 142 cycles
61: 142 cycles
62: 142 cycles
63: 142 cycles
64: 142 cycles
65: 142 cycles
66: 142 cycles
67: 142 cycles
68: 142 cycles
69: 142 cycles
70: 142 cycles
71: 142 cycles
72: 142 cycles
73: 142 cycles
74: 142 cycles
75: 142 cycles
76: 142 cycles
77: 142 cycles
78: 142 cycles
79: 142 cycles
80: 142 cycles
81: 142 cycles
82: 142 cycles
83: 142 cycles
84: 142 cycles
85: 142 cycles
86: 142 cycles
87: 142 cycles
88: 142 cycles
89: 142 cycles
90: 142 cycles
91: 142 cycles
92: 142 cycles
93: 142 cycles
94: 142 cycles
95: 142 cycles
96: 142 cycles
97: 142 cycles
98: 142 cycles
99: 142 cycles
100: 142 cycles
101: 142 cycles
102: 142 cycles
103: 142 cycles
104: 142 cycles
105: 142 cycles
106: 142 cycles
107: 142 cycles
108: 142 cycles
109: 142 cycles
110: 142 cycles
111: 142 cycles
112: 142 cycles
113: 142 cycles
114: 142 cycles
115: 142 cycles
116: 142 cycles
117: 142 cycles
118: 142 cycles
119: 142 cycles
120: 142 cycles
121: 142 cycles
122: 142 cycles
123: 142 cycles
124: 142 cycles
125: 142 cycles
126: 142 cycles
127: 142 cycles
128: 142 cycles
129: 142 cycles
130: 142 cycles
131: 142 cycles
132: 142 cycles
133: 142 cycles
134: 142 cycles
135: 142 cycles
136: 142 cycles
137: 142 cycles
138: 142 cycles
139: 142 cycles
140: 142 cycles
141: 142 cycles
142: 142 cycles
143: 142 cycles
144: 142 cycles
145: 142 cycles
146: 142 cycles
147: 142 cycles
148: 142 cycles
149: 142 cycles
150: 142 cycles
151: 142 cycles
152: 142 cycles
153: 142 cycles
154: 142 cycles
155: 142 cycles
156: 142 cycles
157: 142 cycles
158: 142 cycles
159: 142 cycles
160: 142 cycles
161: 142 cycles
162: 142 cycles
163: 142 cycles
164: 142 cycles
165: 142 cycles
166: 142 cycles
167: 142 cycles
168: 142 cycles
169: 142 cycles
170: 142 cycles
171: 142 cycles
172: 142 cycles
173: 142 cycles
174: 142 cycles
175: 142 cycles
176: 142 cycles
177: 142 cycles
178: 142 cycles
179: 142 cycles
180: 142 cycles
181: 142 cycles
182: 142 cycles
183: 142 cycles
184: 142 cycles
185: 142 cycles
186: 142 cycles
187: 142 cycles
188: 142 cycles
189: 142 cycles
190: 142 cycles
191: 142 cycles
192: 142 cycles
193: 142 cycles
194: 142 cycles
195: 142 cycles
196: 142 cycles
197: 142 cycles
198: 142 cycles
199: 142 cycles
200: 142 cycles
201: 142 cycles
202: 142 cycles
203: 142 cycles
204: 142 cycles
205: 142 cycles
206: 142 cycles
207: 142 cycles
208: 142 cycles
209: 142 cycles
210: 142 cycles
211: 142 cycles
212: 142 cycles
213: 142 cycles
214: 142 cycles
215: 142 cycles
216: 142 cycles
217: 142 cycles
218: 142 cycles
219: 142 cycles
220: 142 cycles
221: 142 cycles
222: 142 cycles
223: 142 cycles
224: 142 cycles
225: 142 cycles
226: 142 cycles
227: 142 cycles
228: 142 cycles
229: 142 cycles
230: 142 cycles
231: 142 cycles
232: 142 cycles
233: 142 cycles
234: 142 cycles
235: 142 cycles
236: 142 cycles
237: 142 cycles
238: 142 cycles
239: 142 cycles
240: 142 cycles
241: 142 cycles
242: 142 cycles
243: 142 cycles
244: 142 cycles
245: 142 cycles
246: 142 cycles
247: 142 cycles
248: 142 cycles
249: 142 cycles
250: 142 cycles
251: 142 cycles
252: 142 cycles
253: 142 cycles
254: 142 cycles
255: 142 cycles
Done!
Rand Test
0(0000CE42):    142 cycles
1(00005EFB):    142 cycles
2(0000B19F):    142 cycles
3(000056E0):    142 cycles
4(0000A759):    142 cycles
5(0000C881):    142 cycles
6(0000CE5A):    142 cycles
7(00009681):    142 cycles
8(00002166):    142 cycles
9(0000B1A1):    142 cycles
10(0000C30E):   142 cycles
11(0000095C):   142 cycles
12(0000B14E):   142 cycles
13(00009EB4):   142 cycles
14(00005ADA):   142 cycles
15(0000D4EC):   142 cycles
16(0000DE2F):   142 cycles
17(00000FF5):   142 cycles
18(0000A560):   142 cycles
19(00006F7E):   142 cycles
20(0000D08A):   142 cycles
21(00003DED):   142 cycles
22(0000CE3B):   142 cycles
23(000096CA):   142 cycles
24(0000B92B):   142 cycles
25(0000CED5):   142 cycles
26(00002582):   142 cycles
27(00007A69):   142 cycles
28(0000C51D):   142 cycles
29(0000F3C3):   142 cycles
30(0000BE84):   142 cycles
31(00000413):   142 cycles
32(00009A0E):   142 cycles
33(000010A6):   142 cycles
34(0000426A):   142 cycles
35(00004A10):   142 cycles
36(000098B3):   142 cycles
37(0000193C):   142 cycles
38(0000E6B4):   142 cycles
39(0000634E):   142 cycles
40(0000D69A):   142 cycles
41(00002480):   142 cycles
42(0000704F):   142 cycles
43(00007161):   142 cycles
44(0000D706):   142 cycles
45(0000E882):   142 cycles
46(0000B517):   142 cycles
47(000038F4):   142 cycles
48(000034CA):   142 cycles
49(0000E676):   142 cycles
50(00004EBA):   142 cycles
51(0000A484):   142 cycles
52(0000F4E2):   142 cycles
53(000027DC):   142 cycles
54(0000F7C9):   142 cycles
55(0000E166):   142 cycles
56(0000044F):   142 cycles
57(000011A5):   142 cycles
58(00009E07):   142 cycles
59(0000FD8C):   142 cycles
60(00000A8E):   142 cycles
61(00001836):   142 cycles
62(000082D1):   142 cycles
63(00009C97):   142 cycles
64(000094DC):   142 cycles
65(00002C70):   142 cycles
66(00007F1B):   142 cycles
67(0000FE0F):   142 cycles
68(0000C938):   142 cycles
69(0000AE8F):   142 cycles
70(00005AFB):   142 cycles
71(00004B91):   142 cycles
72(00005BB0):   142 cycles
73(0000AE20):   142 cycles
74(0000BBB7):   142 cycles
75(00000E9F):   142 cycles
76(00002790):   142 cycles
77(00007A62):   142 cycles
78(0000D36C):   142 cycles
79(00007081):   142 cycles
80(00005C19):   142 cycles
81(00007D11):   142 cycles
82(000016C4):   142 cycles
83(0000BA32):   142 cycles
84(00003047):   142 cycles
85(00009404):   142 cycles
86(0000A6EF):   142 cycles
87(0000A4FB):   142 cycles
88(0000737D):   142 cycles
89(0000488F):   142 cycles
90(0000B8F2):   142 cycles
91(00004AC3):   142 cycles
92(00005ACE):   142 cycles
93(0000CBCB):   142 cycles
94(0000A5F9):   142 cycles
95(0000A5E2):   142 cycles
96(0000B9E0):   142 cycles
97(000057E3):   142 cycles
98(0000052D):   142 cycles
99(0000CFB7):   142 cycles
100(0000274E):  142 cycles
101(00007EE0):  142 cycles
102(0000B75F):  142 cycles
103(0000F08C):  142 cycles
104(0000856D):  142 cycles
105(0000CB68):  142 cycles
106(00001CD3):  142 cycles
107(00008540):  142 cycles
108(0000481F):  142 cycles
109(0000F2D5):  142 cycles
110(0000AAF4):  142 cycles
111(0000B831):  142 cycles
112(0000D399):  142 cycles
113(0000EABE):  142 cycles
114(0000A174):  142 cycles
115(0000C882):  142 cycles
116(0000D886):  142 cycles
117(000067BF):  142 cycles
118(000056B3):  142 cycles
119(000042B5):  142 cycles
120(00009F07):  142 cycles
121(0000B13E):  142 cycles
122(000089C2):  142 cycles
123(00003752):  142 cycles
124(0000248B):  142 cycles
125(0000447E):  142 cycles
126(0000E870):  142 cycles
127(000023A6):  142 cycles
128(00007502):  142 cycles
129(0000F547):  142 cycles
130(00003D45):  142 cycles
131(000087A0):  142 cycles
132(00003B23):  142 cycles
133(0000AC37):  142 cycles
134(00005317):  142 cycles
135(00002A7E):  142 cycles
136(0000B909):  142 cycles
137(00003926):  142 cycles
138(0000011F):  142 cycles
139(0000C73E):  142 cycles
140(0000BBC7):  142 cycles
141(00003BBB):  142 cycles
142(000081ED):  142 cycles
143(000002F9):  142 cycles
144(000049C5):  142 cycles
145(00003BE5):  142 cycles
146(00004B9B):  142 cycles
147(0000C72D):  142 cycles
148(00006002):  142 cycles
149(00005D81):  142 cycles
150(0000ED0F):  142 cycles
151(00009676):  142 cycles
152(0000583F):  142 cycles
153(0000BD2B):  142 cycles
154(00005635):  142 cycles
155(0000BB90):  142 cycles
156(0000EC7F):  142 cycles
157(0000A15A):  142 cycles
158(0000272F):  142 cycles
159(0000B014):  142 cycles
160(0000113C):  142 cycles
161(0000AB3B):  142 cycles
162(0000C41B):  142 cycles
163(0000D055):  142 cycles
164(00008434):  142 cycles
165(00008A8A):  142 cycles
166(000082C5):  142 cycles
167(0000835B):  142 cycles
168(0000CC50):  142 cycles
169(0000B144):  142 cycles
170(0000078F):  142 cycles
171(0000739E):  142 cycles
172(00008F4C):  142 cycles
173(0000033D):  142 cycles
174(0000A18A):  142 cycles
175(0000A89F):  142 cycles
176(00006B96):  142 cycles
177(00007485):  142 cycles
178(00000015):  142 cycles
179(00006980):  142 cycles
180(00008993):  142 cycles
181(000036FE):  142 cycles
182(000092E6):  142 cycles
183(000054C1):  142 cycles
184(00008849):  142 cycles
185(00007B2E):  142 cycles
186(00003E95):  142 cycles
187(0000C48B):  142 cycles
188(000091F6):  142 cycles
189(0000E400):  142 cycles
190(0000E324):  142 cycles
191(00007AE1):  142 cycles
192(000071B4):  142 cycles
193(000000F3):  142 cycles
194(00006580):  142 cycles
195(0000E81B):  142 cycles
196(00007B03):  142 cycles
197(00001FDD):  142 cycles
198(000002EE):  142 cycles
199(0000922D):  142 cycles
200(0000D02A):  142 cycles
201(00007BE5):  142 cycles
202(00002AA4):  142 cycles
203(0000B679):  142 cycles
204(0000E55F):  142 cycles
205(0000EDDA):  142 cycles
206(0000A80F):  142 cycles
207(00008A0C):  142 cycles
208(00001FE7):  142 cycles
209(0000B3B5):  142 cycles
210(00003CB8):  142 cycles
211(00008B42):  142 cycles
212(0000D9B2):  142 cycles
213(00004A06):  142 cycles
214(0000D851):  142 cycles
215(0000C0C5):  142 cycles
216(00000140):  142 cycles
217(00002312):  142 cycles
218(0000757C):  142 cycles
219(0000517D):  142 cycles
220(00001AA7):  142 cycles
221(000002CF):  142 cycles
222(0000A3F3):  142 cycles
223(0000BB1B):  142 cycles
224(00000853):  142 cycles
225(00009119):  142 cycles
226(0000365E):  142 cycles
227(00007CAF):  142 cycles
228(0000F5FC):  142 cycles
229(000088E7):  142 cycles
230(0000FB26):  142 cycles
231(0000BA3B):  142 cycles
232(0000AD86):  142 cycles
233(00005B05):  142 cycles
234(0000B9AA):  142 cycles
235(0000725F):  142 cycles
236(0000DDAB):  142 cycles
237(00003928):  142 cycles
238(0000AFA0):  142 cycles
239(00005334):  142 cycles
240(0000B0B5):  142 cycles
241(000048AB):  142 cycles
242(0000EF83):  142 cycles
243(0000B32A):  142 cycles
244(0000BBAC):  142 cycles
245(00005070):  142 cycles
246(0000F972):  142 cycles
247(0000B9A6):  142 cycles
248(00004167):  142 cycles
249(0000F917):  142 cycles
250(000083C7):  142 cycles
251(0000B701):  142 cycles
252(0000B158):  142 cycles
253(00000963):  142 cycles
254(00006F45):  142 cycles
255(000038BA):  142 cycles
Done!
That's a fixed 142 cycles for any location. At 600MHz that's equivalent to a fixed access time of 236.667 nanoseconds.
It also works at 720MHz, which means the access time would be 197.22 nanoseconds - over 5 times smaller than OP's target of 1 microsecond.

That's an impressive bit of code!

How's does that compare to sequential qspi reads?
 
Sequential reading is much simpler/faster because the address only needs to be sent at the beginning. Then as long as you keep clocking the PSRAM (two instructions) it will keep delivering nibbles from sequential locations.

What really piques my interest with this is that a) it can use any pins, at the same speed providing they're on the same GPIO port and b) it would be possible to access multiple PSRAM chips in parallel to increase the bitwidth. Technically you can do that with FlexSPI too but the pins aren't available on the Teensy.
 
So, that would allow to use PSRAM also with T4.0, or do I mis-undertand the flexibility?
 
Only like any other SPI device. You have to manually call functions to read and write to it rather than it being memory-mapped.
 
You have to manually call functions to read and write to it rather than it being memory-mapped.
Which would be not a problem for a fast data logging program with producer/consumer queue to buffer latencies of uSD cards.
 
Just remembered something that may affect the PSRAM performance as compared to DMAMEM or PROGMEM. Right now the PSRAM is set to default to 88Mhz in startup.c. However, we tested this at 132 Mhz and seems to work without issue.
Thx for (re)posting @mjs513.
Just ran this (132.9 MHz) with the PJRC PSRAM test and times are faster:
AFTER at 132: test ran for 25.52 seconds
BEFORE at 88: test ran for 36.43 seconds
57 Write Read passes 8MB - where 44 need calc of next number on write and read.
 
Which would be not a problem for a fast data logging program with producer/consumer queue to buffer latencies of uSD cards.
OK, missed that this were Read tests, so using this for buffering would require read/write access.
 
You can write using this method as well (writing is faster than reading because there are no delay cycles required), I just didn't include the code for it in the example.
You can't use this method at the same time as FlexSPI access because the pins have to be assigned as either GPIOs or FlexSPI. It would also cause problems with cache coherency.
 
missed that this were Read tests
Same here. p#18 was done looking to dual test the native access with the bitbang. Didn't find write_func(), but ran the test as posted, then found native access returning bad data just doing the fixed pattern write as coded in the PJRC example.

Current conversion of PJRC test sketch with added faster clock and BitBang ReadTest so far:
 
Crosspost ...
You can write using this method as well (writing is faster than reading because there are no delay cycles required), I just didn't include the code for it in the example.
You can't use this method at the same time as FlexSPI access because the pins have to be assigned as either GPIOs or FlexSPI. It would also cause problems with cache coherency.
'delay cycles' ... That explains why the DOCS and perf show write being 4X(?) faster.
I wondered about the cache being undercut on return - that and the changes from configure.

Can you post code for write?
 
Back
Top