Teensy 4.1 Beta Test

Hi @KurtE, thanks for the pinout card (#94). I think I've found an error.

You've got digital pins 40 & 41 on the same GPIO 1.21.

T4.1-Cardlike.jpg

I believe 40 is 1.20 and 41 is 1.21
 
I've updated the pinout card on msg #3. No substantial changes, just formatting and color.

Still composing the info for the back side. Hope to have it up in the next day or two. Everything is running slow here, as Robin & I keep struggling to run PJRC without employees (due to Oregon's covid-19 shelter in place order).
 
Hi @Paul,

I am going through and doing a quick review of your card versus my card/tables...

Found I had issues with my PWM markings, which I correct on mine.
But I think you SPI markings for 38-41 got shifted a couple pins should be on 36-39?
That is I believe 39 is alternate to MISO1 not 41


My currently updated one....
T4.1-Cardlike.jpg
 
But I think you SPI markings for 38-41 got shifted a couple pins should be on 36-39?

Yes, you're right. I've updated the card on msg #3.

It was also incorrect on the earlier not-color version, which we printed and included with the 23 pre-production boards that shipped over the last several days. Hopefully if anyone using those boards tries to access SPI on those pins, they'll see this message or notice it's been fixed on the copy on msg #3.
 
Hi @Paul,

I am going through and doing a quick review of your card versus my card/tables...

Found I had issues with my PWM markings, which I correct on mine.
But I think you SPI markings for 38-41 got shifted a couple pins should be on 36-39?
That is I believe 39 is alternate to MISO1 not 41
Hmmm, you have the flash/psram pads underneath the Teensy 4.1 as being alternates for SPI2 (i.e. MISO2, MOSI2, SCLK2) which is the SPI bus used for the micro-SD card. Does this mean that if you were to want to have a sketch that reads from both the SD card and the attached QSPI memory that you would have to keep switching the pins between the two?
 
Does this mean that if you were to want to have a sketch that reads from both the SD card and the attached QSPI memory that you would have to keep switching the pins between the two?

Nope, there's no conflict here.

The SD card is accessed using SDIO (native 4 bit data + cmd). SPI2 isn't used for the SD card. But if no SD card is present, those pins can be used as GPIO or SPI2 or Serial5... that is, if you're able to solder tiny wires to the exposed parts of those SMT pads on the SD socket.

SPI2 isn't used for the QSPI chips either (even though those pins could be used for SPI2 if memory chips aren't installed). The QSPI memory accessed with FlexSPI2, which is completely different and separate from all 3 of the normal SPI ports. It's also separate from FlexSPI1 which is used for the main program memory. The FlexSPI ports are designed to map memory chips directly into the ARM's address space. FlexSPI is much faster than regular SPI. It's triggered automatically by when the processor uses its bus to access memory to fill or flush 32-byte L1 cache lines. The FlexSPI ports also feature another layer of caching between the processor's L1 cache and the actual QSPI memory, so the bus isn't tied up while the QSPI memory transfer happens. FlexSPI is a completely different type of peripheral than the regular SPI ports.

This chip is loaded with so much incredibly powerful hardware that it's hard to keep track of so much capability....
 
The QSPI memory accessed with FlexSPI2, which is completely different and separate from all 3 of the normal SPI ports. It's also separate from FlexSPI1 which is used for the main program memory. The FlexSPI ports are designed to map memory chips directly into the ARM's address space. They are very fast and feature another layer of caching between the processor's L1 cache and the actual QSPI memory. FlexSPI is a completely different type of peripheral than the regular SPI ports.

Does this mean, one can access the additional RAM chip directly via memory access?
 
Nope, there's no conflict here.

The SD card is accessed using SDIO (native 4 bit data + cmd). SPI2 isn't used for the SD card. But if no SD card is present, those pins can be used as GPIO or SPI2 or Serial5... that is, if you're able to solder tiny wires to the exposed parts of those SMT pads on the SD socket.

Or you use something like this: https://www.sparkfun.com/products/9419

SPI2 isn't used for the QSPI chips either (even though those pins could be used for SPI2 if memory chips aren't installed). The QSPI memory accessed with FlexSPI2, which is completely different and separate from all 3 of the normal SPI ports. It's also separate from FlexSPI1 which is used for the main program memory. The FlexSPI ports are designed to map memory chips directly into the ARM's address space. FlexSPI is much faster than regular SPI. It's triggered automatically by when the processor uses its bus to access memory to fill or flush 32-byte L1 cache lines. The FlexSPI ports also feature another layer of caching between the processor's L1 cache and the actual QSPI memory, so the bus isn't tied up while the QSPI memory transfer happens. FlexSPI is a completely different type of peripheral than the regular SPI ports.

This chip is loaded with so much incredibly powerful hardware that it's hard to keep track of so much capability....

Thanks for the information. And I suspect we will find subtle caching bugs in the future as more people use the memories.
 
And I suspect we will find subtle caching bugs in the future as more people use the memories.

For sure :) We already found a little problem with DMA if the used area does not align to a cache-line and the rest is in use by other things.
 
Last edited:
Does this mean, one can access the additional RAM chip directly via memory access?

Yes. If you solder the PSRAM chip to the bottom side and run the init code (which will soon become part of the core library's startup code), you get 8 megabytes of memory at 0x70000000 to 0x707FFFFF.

You can use EXTMEM on variables to cause them to be allocated in that memory. It's already baked into the default linker script and core lib headers.

I'm planning to add a malloc_extmem() function which will allocate on a heap in that memory, but will automatically fall back to DMAMEM if the PSRAM chip isn't present.
 
As Paul notes there will be a malloc_extmem() added - but first testing direct access for first pass testing looked like the following: github.com/PaulStoffregen/teensy41_extram/blob/master/extRAM_t4/examples/flashtest4/flashtest4.ino#L431

Does this mean, one can access the additional RAM chip directly via memory access?
Tested with 8 and 32 bit mem pointer access to write and verify with read of a value '24'. That was interleaved with full pass of the value '42' to verify the 8 MB RAM was fully read and written. And the test starts on odd byte boundary and there was no crash on any boundary with 4 byte access:
Code:
[B][COLOR="#FF0000"]uint32_t *ptrERAM_32 = (uint32_t *)0x70000001l;[/COLOR]  // Set to ERAM
const uint32_t  [B][COLOR="#FF0000"]sizeofERAM_32[/COLOR][/B] = 0x7FFFFF / sizeof( ptrERAM_32 ); // sizeof free RAM in uint32_t units.
[/B]
void check24() {
  uint32_t ii;
  uint32_t jj = 0, kk = 0;
  Serial.print("\n    ERAM ========== memory map ===== array* ======== check24() : WRITE !!!!\n");
  Serial.printf("\t\tERAM length 0x%X element size of %d\n", sizeofERAM_32, sizeof( ptrERAM_32[0] ));
  my_us = 0;
[B]  for ( ii = 0; ii < sizeofERAM_32; ii++ ) {
    [COLOR="#FF0000"]ptrERAM_32[ii] = 24;[/COLOR]
[/B]  }
  Serial.printf( "\t took %lu elapsed us\n", (uint32_t)my_us );
  Serial.print("    ERAM ============================ check24() : COMPARE !!!!\n");
  my_us = 0;
  for ( ii = 0; ii < sizeofERAM_32; ii++ ) {
[B]    [COLOR="#FF0000"]if ( 24 != ptrERAM_32[ii] )[/COLOR] {
[/B]      if ( kk != 0 ) {
        Serial.printf( "\t+++ Good Run of %u {bad @ %u}\n", kk, ii );
        kk = 0;
      }
      if ( jj < 100 )
        Serial.printf( "%3u=%8u\n", ii, ptrERAM_32[ii] );
      jj++;
    }
    else {
      kk++;
    }
  }
  Serial.printf( "\t took %lu elapsed us\n", (uint32_t)my_us );
  if ( 0 == jj )
    Serial.printf( "Good, " );
  else
    Serial.printf( "Failed to find 24 in ERAM %d Times", jj );
  Serial.printf( "\tFound 24 in ERAM %X Times\n", sizeofERAM_32 - jj );
}

@mjs513 has just made the github code cleaner and that current code isn't yet here - but the full run of 8 MB was fast.
Here is one set of results from earlier time - for 2M uint32_t write and reads:
ERAM ========== memory map ================== check24() : WRITE !!!!
ERAM length 0x1FFFFF element size of 4
took 226904 elapsed us
ERAM ============================ check24() : COMPARE !!!!
took 207897 elapsed us
Good, Found 24 in ERAM 1FFFFF Times
 
Yes. If you solder the PSRAM chip to the bottom side and run the init code (which will soon become part of the core library's startup code), you get 8 megabytes of memory at 0x70000000 to 0x707FFFFF.
Great, Now I only need to receive my T4.1 beta, which seems still in transit.
 
@defragster - how does EXTMEM access compare to normal or DMAMEM access in terms of speed?

Also, can EXTMEM arrays be initialized unlike DMAMEM ones?
 
@easone - So far there is no initialized data that can go into the EXTMEM section.

Speed - My guess is it is slower as it is 4 bit SPI communications versus direct within the chip.. BUT like DMAMEM there is caching, so speed may really depend on how often a cache is used versus needing to go to real memory.

And then there is all of other interesting caveats. Like I have a version of our ILI9488_t3 library, that can use the FLASHMEM for the frame buffer.
Now doing this allowed me to have each pixel use a 32 bit value instead of 16 bit. And with this I could store out the pixels which output 18 bit colors (using 24 bits of SPI data) in a format, that when I now do DMA updates of the screen. I now can simply start up a DMA operation where I transfer 32 bits at a time, but the SPI register is configured to only output 24 bits of it. And I can update the whole screen without having to process interrupts and transfer and convert the frame buffer to another set of buffers one chunk at a time... So which one is faster? Again... It depends :D
 
As Paul notes there will be a malloc_extmem() added - but first testing direct access for first pass testing looked like the following: github.com/PaulStoffregen/teensy41_extram/blob/master/extRAM_t4/examples/flashtest4/flashtest4.ino#L431

Tested with 8 and 32 bit mem pointer access to write and verify with read of a value '24'. That was interleaved with full pass of the value '42' to verify the 8 MB RAM was fully read and written. And the test starts on odd byte boundary and there was no crash on any boundary with 4 byte access:
Code:
….

@mjs513 has just made the github code cleaner and that current code isn't yet here - but the full run of 8 MB was fast.
Here is one set of results from earlier time - for 2M uint32_t write and reads:

The function @defraster is reference has been incorporated into flashtest6.ino in the extRAM_t4 library examples directory:
Code:
[COLOR="#FF0000"]#if 0
uint32_t *ptrERAM_32 = (uint32_t *)0x70000001l;  // Set to ERAM
const uint32_t  sizeofERAM_32 = 0x7FFFFF / sizeof( ptrERAM_32 ); // sizeof free RAM in uint32_t units.
#else
uint8_t *ptrERAM_32 = (uint8_t *)0x70000001l;  // Set to ERAM
const uint32_t  sizeofERAM_32 = 0x7FFFFE / sizeof( ptrERAM_32[0] ); // sizeof free RAM in uint32_t units.
#endif[/COLOR]
void check24() {
  uint32_t ii;
  uint32_t jj = 0, kk = 0;
  Serial.print("\n    ERAM ========== memory map ===== array* ======== check24() : WRITE !!!!\n");
  Serial.printf("\t\tERAM length 0x%X element size of %d\n", sizeofERAM_32, sizeof( ptrERAM_32[0] ));
  my_us = 0;
  for ( ii = 0; ii < sizeofERAM_32; ii++ ) {
    ptrERAM_32[ii] = 24;
  }
  Serial.printf( "\t took %lu elapsed us\n", (uint32_t)my_us );
  Serial.print("    ERAM ============================ check24() : COMPARE !!!!\n");
  my_us = 0;
  for ( ii = 0; ii < sizeofERAM_32; ii++ ) {
    if ( 24 != ptrERAM_32[ii] ) {
      if ( kk != 0 ) {
        Serial.printf( "\t+++ Good Run of %u {bad @ %u}\n", kk, ii );
        kk = 0;
      }
      if ( jj < 100 )
        Serial.printf( "%3u=%8u\n", ii, ptrERAM_32[ii] );
      jj++;
    }
    else {
      kk++;
    }
  }
  Serial.printf( "\t took %lu elapsed us\n", (uint32_t)my_us );
  if ( 0 == jj )
    Serial.printf( "Good, " );
  else
    Serial.printf( "Failed to find 24 in ERAM %d Times", jj );
  Serial.printf( "\tFound 24 in ERAM %X Times\n", sizeofERAM_32 - jj );
}
Just change the if 0 to if 1 to test for uint32's
 
The function @defraster is reference has been incorporated into flashtest6.ino in the extRAM_t4 library examples directory:
Code:
...
Just change the if 0 to if 1 to test for uint32's

Yes there it is in 'flashtest6' under the SPIFFS area. :: github.com/PaulStoffregen/teensy41_extram/blob/master/extRAM_SPIFFS_t4/examples/flashtest6/flashtest6.ino#L304

BTW - FrankB ported in a working (subset) copy of SPIFFS ( SPI File system ) and used on ESP's that works to good end on Both QSPI chips FLASH and PSRAM in that code tree.
 
@defragster - how does EXTMEM access compare to normal or DMAMEM access in terms of speed?

I'm not sure how meaningful this is, but here are times (us) for 32 KB buffer (aligned 32) memcpy() from various memories to DTCM on T4.1
Code:
        from DTCM 15 us
        from OCRAM 55 us
        from PROGMEM 558 us
        from 0x70000000 ERAM 1115 us @49.5MHz  780 us @133MHz   EXTMEM
        from 0x71000000 EFLASH 1905 us @49.5MHz  780 us @133MHz
 
As for RAM2(OCRAM), the question "How fast is it?" is not as easy as it seems :)
Accesses to both additional(+optional) chips are cached, by the same cache as RAM2. So the answer is: It depends.
It depends on the locality of the accessed data. If you don't need more than 32kB in a given time, it will fit into the the cache.
In this case, it is about as fast as RAM1 (First access may be slower).
If you access more, then.. you have to measure the speed for every individual case, and there is no general answer.
The cache uses write-back.
(Edit: Not sure if it uses the same read-ahead as the instruction-flash-chip?)
Edit: yes it does.
 
Last edited:
Looking at @manitou table for ERAM there are 256 chunks of 32KB in 8 MB - multiplying that memcpy() 780 us nearly matches the p#164 for the check24() time in a for() loop of 207897 us.

Indeed the on chip '32KB data' cache will be selectively used for RAM2(OCRAM) and ERAM&EFLASH with QSPI so localized access will have benefit.

re : "subtle caching bugs in the future as more people use the memories. ":
> Hopefully the path has been blazed and tested well enough between direct access for tests ( 24 and 42 and also tested 'alpha string writes' ) and functional SPIFFS [ERAM and EFLASH] plus the display usage of ERAM ( Where the only issue is required cache purge for DMA access )
> All the memories are 32 bit pointer referenced and the 1062 MCU handles all the access and cache fill/selection.
> After init of the FLEXSPI for the QSPI devices the MCU handles all the I/O as Paul noted p#158 - there is no software interface for access R/W of the PSRAM and READ of the FLASH *{post #180}

The FLASH chips on the early beta were the standard Winbond NOR type - getting larger NAND type Flash working is TBD. Flash write does have a driver/system/PJRC call to coordinate.
 
Last edited:
So to be sure I understand things.

There are two flash pads, one of which is large enough for my W25Q128JVSIQ chips and is between pins 35-37 and 30-28. Given this is flash memory, it should persist between reboots. Is this correct? One way to use this seems to be with a SPIFFS filesystem mounted on that memory. I imagine using with a file system container abstraction is the best way to use this memory, rather than just an 8MB area of persistent memory (though I could imagine use cases as RAW memory).

The other flash pad that is next to the back end of the Teensy 4.1 is for normal volatile flash (between pins 33-35 and 32-30). Is this right? For volatile memory, I suspect it is best used as normal RAM memory, hence using the EXTMEM macro to control static allocation.

Do the two pads have different fixed addresses, or if have both chips soldered in, is the second chip just after the first.

For the volatile memory, is it cleared before the sketch starts, or would we need to clear it?

My search skills don't seem to up to par, and I can't find either of the chips Paul mentioned for the psram in one of the early posts. Is there a better search term to use for this to buy a hand full of these chips? So far, I haven't anything on ebay, digikey, or amazon. I'm probably just using the wrong term, but it would be helpful if somebody could point out some links to buy the chip.
 
@MichaelMeissner
Easy answer first:
1, I got the chips that Paul mentions from alliexpress:
ESPRESSIF ESP-PSRAM64H : https://www.aliexpress.com/item/4000242457828.html?spm=a2g0s.9042311.0.0.40f64c4dx51k2V
IPS6404LSQ IPS6404L-SQ-SPN 3.3V SOP8 64Mbit SQPI PSRAM: Doesn't seem to be available anymore - brings me back to the ESPRESSIF link

For the volatile memory, is it cleared before the sketch starts, or would we need to clear it?
You will have to initialize it when you first run it unless Paul incorporates that into the auto-detect routine. The library is set up so you can do an it.

I suspect it is best used as normal RAM memory, hence using the EXTMEM macro to control static allocation.
Since we didn't have that macro available we set up the examples in the lib so you can use spiffs on the either or both of the chips. Or direct memory write to volatite memory, also setup functions that you write structured data like the FRAM-i2c library. So you have a few choices.

There are two flash pads, one of which is large enough for my W25Q128JVSIQ chips and is between pins 35-37 and 30-28. Given this is flash memory, it should persist between reboots.
You know I never really tried it that way but you would think. Will have to try it.

EDIT: Just tried it yep it persists - was curious.
 
Last edited by a moderator:
So to be sure I understand things.

There are two flash pads, one of which is large enough for my W25Q128JVSIQ chips and is between pins 35-37 and 30-28. Given this is flash memory, it should persist between reboots. Is this correct? One way to use this seems to be with a SPIFFS filesystem mounted on that memory. I imagine using with a file system container abstraction is the best way to use this memory, rather than just an 8MB area of persistent memory (though I could imagine use cases as RAW memory).

Frank has implemented SPIFFS on the external FLASH,see https://github.com/PaulStoffregen/teensy41_extram I even built a tftp server using the SPIFFS as part of ethernet testing, see last table in post #108
 
Back
Top