T4.1 castellated memory extension and memory plug in..

KurtE

Senior Member+
Thought I would mention I have been playing around with a second version castellated breakout board, that brings out the underneath memory pins as well as the Center pins like ON/OFF, program and brings them out in a way that makes it possible to plug into breadboard.

They are/were quick and dirty.

It solders onto the bottom of the T4.1 like:
IMG_1304.jpg
IMG_1303.jpg

I then had a quick and dirty small board that can hold SPRAM and FLASH with the same pins as below:
IMG_1302.jpg

While I was getting ready to solder that memory board figured out I sort of screwed up and it plugs in to the extension with the memory chips pointing down to the T4.1...

Sort of like:
IMG_1301.jpg

Sorry for quick and dirty photo, sort of out of focus...

I had a FLASH chip on it as well, but sort of screwed up soldering that one (which is why I choose one that I have a few of)... Also setup with Logic Analyzer plugged in to it as.
so far not working very well :(

I did verify with the HiLow Test that all of the IO pins are getting to the right places.

But when I first tried running the MTP test program it did try to create two new logical disks, but showed them with no free space :confused:

So did a quick and dirty program:
Code:
extern "C" {
  extern uint8_t external_psram_size;
}
EXTMEM uint32_t extmem_array[8192*1024/4];
uint32_t array_max;
void setup() {
  while (!Serial && millis() < 5000);
  Serial.begin(115200);
  delay(250);
  Serial.printf("External PSRAM size: %u\n", external_psram_size );

  array_max = (external_psram_size * 1024*1024) / sizeof(uint32_t);
  Serial.printf("Array Max: %u\n", array_max); Serial.flush();
}
uint32_t loop_counter = 0;
void loop() {
  loop_counter++;
  digitalToggleFast(13);
  Serial.printf("Starting pass: %u\n", loop_counter); Serial.flush();
  for (uint32_t i = 0; i < array_max; i++) {
    if ((i & 0xffff) == 0xffff) Serial.print(".");
    extmem_array[i] = loop_counter + i;
  }

  for (uint32_t i = 0; i < array_max; i++) {
    if (extmem_array[i] != (loop_counter+i)) Serial.printf("Fail(%x) %x != %x\n", i, extmem_array[i], loop_counter);
  }
}
The program runs fine on another T4.1 with 16mb of psram.

But when run on this one...
Code:
External PSRAM size: 8
Array Max: 2097152
Starting pass: 1
................................Fail(3b) 9203003c != 1
Fail(3c) a20340c1 != 1
Fail(3d) 40c1 != 1
Fail(3e) 0 != 1
Fail(3f) 0 != 1
Fail(70) 77777777 != 1
Fail(71) 77777777 != 1
Fail(72) 77777777 != 1
Fail(73) 77777777 != 1
Fail(74) 77777777 != 1
...

I am getting tons of errors.

Have hooked up to Logic Analyzer using a user supplied Quad analyzer. Which maybe needs some work. This one knows very little so you have to tell it if you are in single, Dual or quad mode.

It does detect the memory:
screenshot.jpg

It looks like it wrote the first couple of Quad SPI packets out OK:
screenshot2.jpg

But then the first page holding index 3c... It errors on, the Quad analyzer did not see the right command, and gives a strange error:
screenshot.jpg
So this may not turn out to be a great test setup...

May try lowering the FlexSPI2 SPI speed to see if it happier.
 
Thanks frank,

Yes - I started off without the logic Analyzer...

I have also tried slowing down the FlexSPI, but so far not much luck.

I need to double check wiring again.
 
Interesting would be to see scope images of the signal. Did you populate the capacitors? Maybe the impedance of the chips power supply is too high without them.
 
Thanks,

I have since added two caps. A decouple .1uf and another 10uf to make sure there is power available...

I have also been playing some with the Quad Analyzer plug in as I was seeing some Quad commands 0xEB which I believe is quad read...
But then some 0x38 commands which the analyzer was not processing, but I have since tried to add as it I believe is quad Write. And the Quad reads have a delay byte but the quad writes don't...

But I am still seeing strange (semi consistent strange data).

Current code:
Code:
extern "C" {
  extern uint8_t external_psram_size;
}
EXTMEM uint32_t extmem_array[8192 * 1024 / 4];
uint32_t array_max;
void setup() {
  while (!Serial && millis() < 5000);
  Serial.begin(115200);
  delay(250);
  Serial.printf("External PSRAM size: %u\n", external_psram_size );

  array_max = (external_psram_size * 1024 * 1024) / sizeof(uint32_t);
  Serial.printf("Array Max: %u\n", array_max); Serial.flush();
  pinMode(13, OUTPUT);
  pinMode(0, OUTPUT);
}
uint32_t loop_counter = 0;
void loop() {
  loop_counter++;
  digitalToggleFast(13);
  Serial.printf("Starting pass: %u\n", loop_counter); Serial.flush();
  for (uint32_t i = 0; i < array_max; i++) {
    if ((i & 0xffff) == 0xffff) Serial.print(".");
    extmem_array[i] = loop_counter + i;
  }
  Serial.println();
  uint32_t last_error_index = 0xffffffff; //
  uint32_t count_errors = 0;
  for (uint32_t i = 0; i < array_max; i++) {
    if (extmem_array[i] != (loop_counter + i)) {
      if (i != (last_error_index + 1)) {
        digitalWriteFast(0, HIGH);
        Serial.printf("    Fail(%x) %x != %x\n", i, extmem_array[i], loop_counter + i);
      }
      last_error_index = i;
      count_errors++;
    } else if (last_error_index != 0xffffffff) {
      digitalWriteFast(0, LOW);
      Serial.printf("    Last Error Index: %x\n", last_error_index);
      last_error_index = 0xffffffff;
    }
  }
  if (count_errors) Serial.printf("Total Error: %u\n", count_errors);
  delay(10);
}
So I am iterating over the 8mb with uint32_t values... First I loop through and write all of the values and then I come back and read them all to verify I get the expected values...

It is sort of interesting that the first set operations that go out to the PSRAM are a bunch of reads:
screenshot.jpg
Again I should probably try touching up soldering and the like... Maybe not great connection...
It is only after maybe 10-15 clusters do I see one with I think is a write operation:

screenshot2.jpg

Note: the last operation was to read from the block 0x7FFFE0 which looks like the end of the 8MB block... And it looked like it read all 0s...

Again probably something I am doing wrong and probably not worth going much farther with this.
 
Question to self and others (especially those who have done more with FlexSPI...

I tried and resoldered some of the pin connections yesterday, plus I also retried with my original castellated extended T4.1 board with the other single chip with PSRAM and still had similar failures.

Note: I tried this at a slower FlexSPI2 speed, like maybe around 66mhz still failing.

So wondering more about FlexSPI2 configuration settings:

I believe most of the IO pins are configured with strong drive and the like.
Not sure about the DQS pin: IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_23 = 0x110F9; // keeper, strong drive, max speed, hyst
As far as I know that pin is not exposed anywhere and only reference I see is to set the drive speed and set it to mode 8...

I am also wondering about the CS timing?
Code:
	FLEXSPI2_INTEN = 0;
	FLEXSPI2_FLSHA1CR0 = 0x2000; // 8 MByte
	FLEXSPI2_FLSHA1CR1 = FLEXSPI_FLSHCR1_CSINTERVAL(2)
		| FLEXSPI_FLSHCR1_TCSH(3) | FLEXSPI_FLSHCR1_TCSS(3);
	FLEXSPI2_FLSHA1CR2 = FLEXSPI_FLSHCR2_AWRSEQID(6) | FLEXSPI_FLSHCR2_AWRSEQNUM(0)
		| FLEXSPI_FLSHCR2_ARDSEQID(5) | FLEXSPI_FLSHCR2_ARDSEQNUM(0);
But I think I am just throwing darts!
 
Morning @KurtE
Was just going through the thread and you new message popped up. Why back when when we did PSRAM as a standalone library after Paul gave us the starting code and Frank B expanded on it we were using the following which is the same as you posted.
Code:
	  FLEXSPI2_INTEN = 0;
....
	  FLEXSPI2_FLSHA1CR1 = FLEXSPI_FLSHCR1_CSINTERVAL(2)
		| FLEXSPI_FLSHCR1_TCSH(3) | FLEXSPI_FLSHCR1_TCSS(3);
	  FLEXSPI2_FLSHA1CR2 = FLEXSPI_FLSHCR2_AWRSEQID(6) | FLEXSPI_FLSHCR2_AWRSEQNUM(0)
		| FLEXSPI_FLSHCR2_ARDSEQID(5) | FLEXSPI_FLSHCR2_ARDSEQNUM(0);

Ok now- here's the interesting thing for the the EMC_23 pin. You have :
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_23 = 0x110F9; // keeper, strong drive, max speed, hyst
but I was using:
Code:
IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_23 = 0x10E1; // keeper, medium drive, max speed
can't answer whether it being exposed or not or needed to internal use. But anyway made me look at other settings, which are different than the ones we were using way back when:
Code:
	  // initialize pins
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_22 = 0xB0E1; // 100K pullup, medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_23 = 0x10E1; // keeper, medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_24 = 0xB0E1; // 100K pullup, medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_25 = 0x00E1; // medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_26 = 0x70E1; // 47K pullup, medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_27 = 0x70E1; // 47K pullup, medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_28 = 0x70E1; // 47K pullup, medium drive, max speed
	  IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_29 = 0x70E1; // 47K pullup, medium drive, max speed
Been awhile since I went through the manual so don't know the differences off the top of my head. I think these were changed the very original ones as well. But you could give it a try just out of curiosity.
 
Is alway even the first pass wrong?
You can try the slowest speed possible. If it still fails, there is something different happening.
could you post the layout of your boards?

p.s. I'm running your code here - 3000 passes now, no errors.
 
Thanks guys,

I have also run it on a few other boards where the memory chip(s) are directly soldered to bottom of T4.1 and no problems.

If you have diptrace, the design files for the boards I put up earlier on github: https://github.com/KurtE/Teensy3.1-Breakout-Boards/tree/master/T41 Castellated-Memory-breakout

The castellated board looks like:
screenshot.jpg

The dual memory one looks like:
screenshot2.jpg

The single memory chip one which I have a slightly updated design one with better markings looked like:
screenshot3.jpg
 
Quick update. Looks like I have a short between pins 49 and 51 on the castellated board...

Not sure why it did not show up earlier. At least the HiloTest is showing it when going to 3.3v, will check next if GND as well...
 
Quick update. Looks like I have a short between pins 49 and 51 on the castellated board...

Not sure why it did not show up earlier. At least the HiloTest is showing it when going to 3.3v, will check next if GND as well...

Well, glad you found a potential issue but not good it a short on the castellated board. Hope you can get it fixed.
 
I think I have it fixed but not sure it helped. Will play more tomorrow. I am not sure how likely it will be to get it to work. But fingers crossed.
 
So it is a soldering issue not a problem with the board itself - other than the close proximity of the metal bits?

Good luck with the 'fix'
 
Thanks, I was able to get the short to not be a short... And the HiLowTest showed I had the right signals... This time I checked both Hi and Low...
Still does not work right on this board nor my earlier one where I have a single memory chip plugged into breadboard with maybe 4" jumpers and it also fails...

I am probably pretty close to punt time. As mainly only for testing purposes, but I thought it might be a good time to again go through some of the stuff in the configure_external_ram() function and get a better idea of the settings.

I already slowed the clock down some:
Code:
	CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
		| CCM_CBCMR_FLEXSPI2_PODF(7) | CCM_CBCMR_FLEXSPI2_CLK_SEL(3); // 88 MHz
	CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);
I wondered about the pin drive strengths, but so far they are strong drive and High speed...
I have been wondering about:
Code:
	IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_23 = 0x110F9; // keeper, strong drive, max speed, hyst
	IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_23 = 8 | 0x10; // ALT1 = FLEXSPI2_A_DQS
As far as I know that pins is not exposed by the T4.1... But I don't even know enough here to be dangerous :D
That is for example with:
Code:
	FLEXSPI2_MCR0 = (FLEXSPI2_MCR0 & ~(FLEXSPI_MCR0_AHBGRANTWAIT_MASK
		 | FLEXSPI_MCR0_IPGRANTWAIT_MASK | FLEXSPI_MCR0_SCKFREERUNEN
		 | FLEXSPI_MCR0_COMBINATIONEN | FLEXSPI_MCR0_DOZEEN
		 | FLEXSPI_MCR0_HSEN | FLEXSPI_MCR0_ATDFEN | FLEXSPI_MCR0_ARDFEN
		 | FLEXSPI_MCR0_RXCLKSRC_MASK | FLEXSPI_MCR0_SWRESET))
		| FLEXSPI_MCR0_AHBGRANTWAIT(0xFF) | FLEXSPI_MCR0_IPGRANTWAIT(0xFF)
		| [COLOR="#FF0000"]FLEXSPI_MCR0_RXCLKSRC(1)[/COLOR] | FLEXSPI_MCR0_MDIS;

Again what does the setting RCCLKSRC(1) do?
01b - Dummy Read strobe generated by FlexSPI Controller and loopback from DQS pad.
But again what DQS Pad?

Many more of these registers and fields to go through!

Just to get a basic understanding of things like:
How these registers and others control that map the Address used in your program to the address on the RAM chip... And controls when things are read from the chip and written to the chip. Yes I know that some of this is controlled by the settings:
Code:
	FLEXSPI2_FLSHA1CR0 = 0x2000; // 8 MByte
	FLEXSPI2_FLSHA1CR1 = FLEXSPI_FLSHCR1_CSINTERVAL(3)
		| FLEXSPI_FLSHCR1_TCSH(5) | FLEXSPI_FLSHCR1_TCSS(5);
	FLEXSPI2_FLSHA1CR2 = FLEXSPI_FLSHCR2_AWRSEQID(6) | FLEXSPI_FLSHCR2_AWRSEQNUM(0)
		| FLEXSPI_FLSHCR2_ARDSEQID(5) | FLEXSPI_FLSHCR2_ARDSEQNUM(0);

But was again interesting that the example sketch I did started off writing values to the full 8MB chip. And yet the firs several QSPI operations were to read in stuff from the chip? why? Maybe to reload the cache?

Again lots to learn. But not sure how far to take it as it is mostly for my own understanding.
 
KurtE said:
But was again interesting that the example sketch I did started off writing values to the full 8MB chip. And yet the firs several QSPI operations were to read in stuff from the chip? why? Maybe to reload the cache?
When I was looking at that yesterday it didn't make a lot of sense to me. Looked like in the configure function it tested the JEDEC (so that should be 2 reads) for each position of the ram. But then it just called off sm_malloc functions. I really couldn't find where the read/writes were coming from for the PSRAM on startup or ...
 
Hi again, looking at my one with breadboard...
Ran slightly different sketch:
Code:
extern "C" {
  extern uint8_t external_psram_size;
}
void setup() {
  while (!Serial && millis() < 5000);
  Serial.begin(115200);
  delay(250);
  Serial.printf("External PSRAM size: %u\n", external_psram_size );

  pinMode(13, OUTPUT);
  pinMode(0, OUTPUT);
}
uint32_t loop_count = 0;
void loop() {
  // Yes I am leaking...
  for (uint8_t i = 0; i < 5; i++) {
    digitalWriteFast(0, HIGH);
    loop_count++;
    uint8_t *p = extmem_malloc(1024);
    uint8_t *pp = p;
    for (uint16_t j = 0; j < 1024; j++) {
      *pp++ = (j & 0xff);
    }
    digitalWriteFast(0, LOW);
    digitalToggleFast(13);
    delay(2);
    uint16_t count_errors = 0;
    pp = p;
    digitalWriteFast(0, HIGH);
    for (uint16_t j = 0; j < 1024; j++) {
      if (*pp++ != (j & 0xff)) {
        count_errors++;
      }
    }
    digitalWriteFast(0, LOW);
    delay(3);
    Serial.printf("Loop:%u addr:%x errors:%u start: %x %x %x %x\n", loop_count,
                  (uint32_t)p, count_errors, p[0], p[1], p[2], p[3]);
  }
  Serial.println("Hit any key to repeat");
  while (Serial.read() == -1) ;
  while (Serial.read() != -1) ;
}
So it allocates one K at a time and fills it with 0,1,2...


It can run once sort of from power up on this machine
Code:
External PSRAM size: 8
Loop:1 addr:7000000c errors:0 start: 0 1 2 3
Loop:2 addr:7000042c errors:0 start: 0 1 2 3
Loop:3 addr:7000084c errors:0 start: 0 1 2 3
Loop:4 addr:70000c6c errors:0 start: 0 1 2 3
Loop:5 addr:7000108c errors:0 start: 0 1 2 3
Hit any key to repeat
I will try increasing the size...
There is a bunch of code that runs at the start there that looks like it reads lots of data, it is sort of interesting I am seeing some small blips on CS pin. Could be just slightly lower but the analyzer is seeing it as well. So will look at its configuration as well.
screenshot.jpg
But only when you get to the smaller section at the right above do you see where it is actually doing what the code is asking for. Still need to figure out why most of the operations I believe are reads?
 
Quick update: to the above.

I believe that large section of QSPI operations in the above logic analyzer output is due to the code in startup.c which is zeroing out memory:
That is:
Code:
		sm_set_pool(&extmem_smalloc_pool, &_extram_end,
			external_psram_size * 0x100000 -
			((uint32_t)&_extram_end - (uint32_t)&_extram_start),
			[COLOR="#FF0000"]1[/COLOR], NULL);
That 1 is flag to zero memory... So it then does: memset(spool->pool, 0, spool->pool_size);
So in this case it is zeroing about 8mb of memory...
 
Quick update: to the above.

I believe that large section of QSPI operations in the above logic analyzer output is due to the code in startup.c which is zeroing out memory:
That is:
Code:
		sm_set_pool(&extmem_smalloc_pool, &_extram_end,
			external_psram_size * 0x100000 -
			((uint32_t)&_extram_end - (uint32_t)&_extram_start),
			[COLOR="#FF0000"]1[/COLOR], NULL);
That 1 is flag to zero memory... So it then does: memset(spool->pool, 0, spool->pool_size);
So in this case it is zeroing about 8mb of memory...

Saw that as well. Got lost after that as I couldn't visualize how sm_malloc was interfacing with PSRAM/FLEXSPI commands for read/write? Just didn't see the linkage between the two
 
Me too! Again maybe there are some wires that are not making good contact or??? But with my older extended board have been running the test again:
My current hacked up version:
Code:
extern "C" {
  extern uint8_t external_psram_size;
}
void setup() {
  while (!Serial && millis() < 5000);
  Serial.begin(115200);
  delay(250);
  Serial.printf("External PSRAM size: %u\n", external_psram_size );

  pinMode(13, OUTPUT);
  pinMode(0, OUTPUT);
}
#define BLOCK_SIZE 8192
uint32_t loop_count = 0;
void loop() {
  // Yes I am leaking...
  for (uint8_t i = 0; i < 5; i++) {
    digitalWriteFast(0, HIGH);
    loop_count++;
    uint8_t *p = extmem_malloc(BLOCK_SIZE);
    if (!p) {
      Serial.println("Ran out of memory");
      break;
    }
    uint8_t *pp = p;
    for (uint16_t j = 0; j < BLOCK_SIZE; j++) {
      *pp++ = (j & 0xff);
    }
    digitalWriteFast(0, LOW);
    digitalToggleFast(13);
    delay(2);
    uint16_t count_errors = 0;
    pp = p;
    digitalWriteFast(0, HIGH);
    for (uint16_t j = 0; j < BLOCK_SIZE; j++) {
      if (*pp++ != (j & 0xff)) {
        count_errors++;
      }
    }
    digitalWriteFast(0, LOW);
    delay(3);
    Serial.printf("Loop:%u addr:%x errors:%u start: %x %x %x %x\n", loop_count,
                  (uint32_t)p, count_errors, p[0], p[1], p[2], p[3]);
  }
  Serial.println("Hit any key to repeat");
  while (Serial.read() == -1) ;
  while (Serial.read() != -1) ;
}
Sometimes if I hit CR a few times it will hang/crash sometimes on the first set of 5 allocations will fail:
Like this run:
Code:
External PSRAM size: 8
Loop:1 addr:7000000c errors:0 start: 0 1 2 3
Loop:2 addr:70002028 errors:0 start: 0 1 2 3
Loop:3 addr:70004044 errors:0 start: 0 1 2 3
Note currently on this board doing a reset and it will not see the PSRAM... But power off and on does get it..

Again we have two main blobs:
screenshot.jpg
The first blob takes about 426ms

If we look at the second blob, which is the stuff of the actual sketch:
You see a few short areas where it did the first few 8k allocations and then it looked like did tons of stuff for the 4th one and then it just stopped. Did not return to main code.
screenshot2.jpg

Again it could just be noisy lines or...
The last valid stuff looked like a quad write (0x38) to address 0x20c0 with values 98 99 9a...

The other interesting part about this hang/crash is if you look toward the end here:
screenshot3.jpg
Most of that activity on FLEXSPI2 is without CS pin that is you see here time like 1.4ms of lots of stuff going on and not to our CS pin... ?

Note: I should mention neither of the extended boards pass the teensy41_psram_memtest sketch.
The fail pretty quickly. so probably time to punt, unless I can figure out either a) hardware not hooked up properly or someplace where timing can not handle delay:
Code:
EXTMEM Memory Test, 8 Mbyte
 CCM_CBCMR=F5AE8304 (66.0 MHz)
testing with fixed pattern 5A698421
 Error at 700000A0, read BEAAE8CE but expected 5A698421
 
p#19 test loop code works here for 8 MB repeatedly on T_4.1 with QSPI NAND and a PSRAM.

Must be some wiring or connectivity oddity or the chip compromised?

I can confirm the LEAKING:
Loop:1077 addr:20273228 errors:0 start: 0 1 2 3
Loop:1078 addr:20275230 errors:0 start: 0 1 2 3
Loop:1079 addr:20277238 errors:0 start: 0 1 2 3
Loop:1080 addr:20279240 errors:0 start: 0 1 2 3
Loop:1081 addr:2027b248 errors:0 start: 0 1 2 3
Ran out of memory
Ran out of memory
Ran out of memory
Ran out of memory
Ran out of memory

With this code edit:
Code:
...
[B]  delay(100);
  return;
[/B]  Serial.println("Hit any key to repeat");
...
 
Thanks guys,

I tried the old stuff, but it fails, but then I did not disable the new stuff first. Might try it again with the start in startup.c bypassed.

Again maybe just punt as you have most memory working well.. So maybe not much need to easily test.

Or could do a more compact setup, although then it becomes completely dedicated to one chip as well...

Something like:
screenshot.jpg

Or could maybe see if room for small breakout pads and have breakout that plugs directly in... Although not sure that would work much better than current one...
 
@KurtE
Totally a loss at this point. Something definitely odd going on. For my Flash QSPI tests I used a kludged setup where I solder wires from the Flash pad to a set of headers for the flash chip and it seems to working fine. So I would have thought a castellated board would perform a lot better.
 
@mjs513 - Me too. I have probably done something stupid. I may try again. First I may try some mix and match of components.

That is with the Dual one (currently on PSRAM) plug into breadboard and see if it works with SPI?

Or maybe set aside and try again later.
 
Back
Top