Teensy 4.1 Beta Test

That's the expected result if you have one PSRAM chip. To get the last test to print the same number, you need to have 2 PSRAM chips installed.
 
That's the expected result if you have one PSRAM chip. To get the last test to print the same number, you need to have 2 PSRAM chips installed.

Ah ok - so looks like all is well. Have to remove the initialization code for PSRAM in the lib? maybe and see what happens. Right now both are still compatible for 1 flash and 1 PSRAM chip. Have to figure out how to address if you have 2 flash chips installed?
 
Also may have to change the ILI9488_t3 library on detecting if it should use extram or not.

I've added a C namespace variable to the startup code, so software can check how much PSRAM is actually installed.

https://github.com/PaulStoffregen/cores/commit/a41500ec8eba072749d2e6c1bed48081dda8d39b

Here's the test program updated to print the PSRAM size.

Code:
EXTMEM volatile int myint;
  
extern "C" uint8_t external_psram_size;

void setup()
{
        while (!Serial) ;
        Serial.println("External RAM test");

        Serial.printf("PSRAM size is %d Mbyte\n", external_psram_size);
        
        myint = 123456;
        Serial.printf("myint = %u\n", myint); // reads from cache
        arm_dcache_flush_delete((void *)&myint, sizeof(myint)); // write cache to memory
        Serial.printf("myint = %u\n", myint); // reads from actual memory
        
        volatile int *pint = (volatile int *)0x707FDCB4; // near end of first 8MB chip
        *pint = 12345678;
        Serial.printf("*pint = %u\n", *pint); // reads from cache
        arm_dcache_flush_delete((void *)pint, sizeof(int)); // write cache to memory
        Serial.printf("*pint = %u\n", *pint); // reads from actual memory
        
        pint = (volatile int *)0x70CFDCB4; // near middle of second 8MB chip
        *pint = 6541230;
        Serial.printf("*pint = %u\n", *pint); // reads from cache
        arm_dcache_flush_delete((void *)pint, sizeof(int)); // write cache to memory
        Serial.printf("*pint = %u\n", *pint); // reads from actual memory
}

void loop()
{
}
 
I've added a C namespace variable to the startup code, so software can check how much PSRAM is actually installed.

https://github.com/PaulStoffregen/cores/commit/a41500ec8eba072749d2e6c1bed48081dda8d39b

Here's the test program updated to print the PSRAM size.
.....….

Thanks Paul. Think there is one more change you can make at this point. Currently you have the psram clock at 49.5 Mhz:
Code:
	// turn on clock  (TODO: increase clock speed later, slow & cautious for first release)
	CCM_CBCMR = (CCM_CBCMR & (CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
		| CCM_CBCMR_FLEXSPI2_PODF(7) | CCM_CBCMR_FLEXSPI2_CLK_SEL(0); // 49.5 MHz

the lib has it setup to run at 132mhz. We have been using this for quite awhile with no problems for both the Flash and the PSRAM so maybe change to:
Code:
	  CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
			| CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz

Just tested with the PSRAM code removed from lib and just using the initialization from startup - it makes a noticeable difference!
 
Just downloaded latest startup.c and ran the test sketch:
Code:
External RAM test

PSRAM size is 8 Mbyte
myint = 123456
myint = 123456
*pint = 12345678
*pint = 12345678
*pint = 6541230
*pint = 4294967295
I did make the clock change I mentioned so it works no problem.

I did notice something though is EXTMEM new. Guess so but just saw your notes:
Code:
		// TODO: zero uninitialized EXTMEM variables
		// TODO: copy from flash to initialize EXTMEM variables
		// TODO: set up for malloc_extmem()
 
The motivation for these changes is the pattern of orders we're seeing people place. It's pretty clear many people are buying 2 PSRAM chips to pair with each Teensy 4.1.

Does this actually mean, i can place two ram chips on the teensy 4.1 instead of 1 ram and 1 flash?
 
Guess so but just saw your notes:

Yup. 1.52 needs to release ASAP, so all that stuff is going to be left until 1.53.


Does this actually mean, i can place two ram chips on the teensy 4.1 instead of 1 ram and 1 flash?

Yes. You can use 2 RAM chips for 16 MB.


Currently you have the psram clock at 49.5 Mhz

Yes, I know, but for this release I want to leave the clock at this most conservative setting. We just shipped a lot of hardware to distributors and end users, so I'm feeling overly cautious right now.

My general plan is to release 1.53 in about 4 to 6 weeks. There's so much that's been put on hold, like the hardware serial fifo latency issue. I want to merge & implement as much as possible over the next few weeks. So we won't be on 1.52 for the normal ~3 months.
 
@PaulStoffregen
Removed the PSRAM initialization from the library and now FLASH is not working. Going to have to sort that out.

Yes, I know, but for this release I want to leave the clock at this most conservative setting. We just shipped a lot of hardware to distributors and end users, so I'm feeling overly cautious right now.

My general plan is to release 1.53 in about 4 to 6 weeks. There's so much that's been put on hold, like the hardware serial fifo latency issue. I want to merge & implement as much as possible over the next few weeks. So we won't be on 1.52 for the normal ~3 months.
Got it

EDIT: Due to get a couple T4.1's on Friday so I can install 2 PSRAMs and give it a test.
 
Removed the PSRAM initialization from the library and now FLASH is not working. Going to have to sort that out.

Are you initializing all the FlexSPI2 LUTs you're using? The startup code is only programming the LUTs used for the RAM commands.

I'll take a look at the code this afternoon if it's still unsolved.
 
PJRC Order with fresh pair of T_4.1's to be delivered today (Wed) not Friday! Glad only paid for cheap shipping :)

Alternate Magjacks $2.07 versus prior $2.55

Native Eth NTP sample ran overnight - still responding to ping - most <1ms - some as 12 or 20 or 9 or 4 or 5 ms. Group of 20 pings :: Minimum = 0ms, Maximum = 12ms, Average = 1ms

@defragster - typically you use the accelerometer just to get the roll and pitch angles. Did see a tech note on getting yaw https://www.nxp.com/docs/en/application-note/AN3461.pdf never really tried it with just an accel.

Did not see that. I must have something on hand I collected with 6 DOF or better.

@mjs513 - will have to rewrite the SDIO log sample now for two PSRAM's if that is why people bought 2, not just assuming they'd break one - or get a second in advance? Will have to see why the SD Fat Beta code refused to write 1 full MB per call() - or have it do it in batches. And there if there is any coherency issue. Assumed that would work - just didn't expect it would be first use?

Yes, going to be a busy couple of days.

TD 1.52 b5 downloaded ...
 
I've added a C namespace variable to the startup code, so software can check how much PSRAM is actually installed.

https://github.com/PaulStoffregen/cores/commit/a41500ec8eba072749d2e6c1bed48081dda8d39b

Here's the test program updated to print the PSRAM size.
Thanks Paul, but my issue is more that I want to compile the library differently depending on if their is external memory. The current hack is, in the
ILI9488_t3 header file has:
Code:
#if __has_include(<extRAM_t4.h>) && defined(ARDUINO_TEENSY41)
//#include <extRAM_t4.h>
#define ENABLE_EXT_DMA_UPDATES  // This is only valid for those T4.1 which have external memory. 
//#pragma message "ILI9488_t3h -  extRAM_T4 enabled EXT DMA frame buffer"
#endif
So it detected this and all of the library code changed to use 4 bytes instead of 2, DMA update code completely different...

So again looks like for libraries like this, need a different approach, which I have been thinking of doing anyway... But more work.
 
Will we have the SPIFFS stuff in 1.52 or will it be still a separate library?

I am wondering for 1.53, whether we need a class that sits between the audio readers (i.e. RAW, WAV, etc.) and the actual filesystem (SD, SD-Fat Beta, SPIFFS, SerialFlash, etc.). That way, you can easily slide in new filesystems without having to make new versions of the readers.

I also recall there were places in SD.h that if you edited the source, it could add speed-ups. If we could clone the library so that it has those speedups, it might be useful for those people who don't have 20 different Teensy release directories installed. I tend to now hate any installed s/w that forces you to edit the source to get the defaults (like Adafruit used to do), instead of using different constructors.

If need be, we can add weak references to allow the user better control. But this is all 2.53 stuff, not 2.52.

Of course if we do changes to the readers probably need to update the GUI to reflect these changes.
 
Last edited:
Any benchmark of external flash and PSRAM, by the way?

Here are some times for memcpy() for 32KB from various memories (data is aligned(32) but NOT cached) to DTCM.
Code:
        from DTCM 15 us
        from OCRAM 55 us
        from PROGMEM 558 us
        from 0x70000000 ERAM 1115 us @49.5MHz  780 us @132MHz  about 336 mbs
        from 0x71000000 EFLASH 1905 us @49.5MHz  780 us @132MHz
With QSPI clock @132mhz, max possible 4-bit wide transfer is 528 mbs ;) "guaranteed not to exceed". PSRAM data sheet says max SPI CLK is 133MhZ. (84MHz max if burst crosses page boundary.)

If data is cached, things can be faster. For the follow-on memcpy() where source data has been cached, OCRAM 15 us, PROGMEM 381 us, ERAM 31 us, and EFLASH 506 us. WEIRD: I'm afraid i don't understand the cache at least for the ERAM. If I add my memcpy tests to Paul's ERAM test sketch and do the two timed memcpy's before Paul's tests where i do a dcache_delete before first memcpy then I see the times reported above (780us then 31 us), BUT if i do the memcpy tests after Paul's test the 2nd memcpy takes 532 us??

SPIFFS file system on external Flash is faster than onboard microSD with SD lib doing file read(). See last table in this post
 
Last edited:
Will we have the SPIFFS stuff in 1.52 or will it be still a separate library?

I am wondering for 1.53, whether we need a class that sits between the audio readers (i.e. RAW, WAV, etc.) and the actual filesystem (SD, SD-Fat Beta, SPIFFS, SerialFlash, etc.). That way, you can easily slide in new filesystems without having to make new versions of the readers.

...

AFAIK - SPIFFS and eFlash access are going to be external. The startup activate of PSRAM was on Paul's list to enable - as new to B5 there won't be further integration or test before TD 1.52 releases for arrival of new T_4.1's.

Future SD.h interface layer would be cool as is could be 10X faster.
 
@Paul - I know it is probably too late for this release of Teensyduino, but still wonder if it would make sense to add a simple heap allocater that can take in some indicator of which heap, maybe flags...
Maybe something like Windows HeapAlloc? Obviously for quick and dirty. Maybe have something like:
myMem = HeapAlloc(HEAP_EXTMEM, 0, count_bytes);

And don't have any implementations for free or the like. That way thing like libraries who wish to allocate buffers, can maybe simply call this if they know there is external memory and can grab a chunk without having to resort to some fixed address and hoping no else will need that address.

Again probably too late for this release.

Also again back to earlier about others may want to use the bottom pads for other capabilities. Wonder if your init code should be under some #if not sure of good name, but could be like:
Code:
FLASHMEM void configure_external_ram()
{
#if ! defined(TEENSY41_EXCLUDE_EXTERNAL_RAM)
	// initialize pins
	IOMUXC_SW_PAD_CTL_PAD_GPIO_EMC_22 = 0xB0E1; // 100K pullup, medium drive, max speed
...
Then maybe we hack up ways to handle it... Example boards.txt add a menu item for this, or maybe define a new board type which is copy of Teensy 4.1`, which is identical but different name like: Teensy 4.1 (No Ext mem) which then passes in this flag. Although that might cause other issues.
 
AFAIK - SPIFFS and eFlash access are going to be external. The startup activate of PSRAM was on Paul's list to enable - as new to B5 there won't be further integration or test before TD 1.52 releases for arrival of new T_4.1's.

Future SD.h interface layer would be cool as is could be 10X faster.

Yep, I expect SPIFFS and eFlash to be externally set up. It would be nice if the library shipped with the Teensydunio 1.52, but if it doesn't, I can use the external library.

I have a horrid mix of library files (where I do git/svn updates as I think about them). I need to switch to using the library package manager for the released files.

Similarly, it would be nice if there was some way to figure out if 1 or 2 psram chips are loaded. I'm sure it would be useful for display library buffers. There may be a way, I just have delved into 1.52-beta5 yet.
 
@Paul - I know it is probably too late for this release of Teensyduino, but still wonder if it would make sense to add a simple heap allocater that can take in some indicator of which heap, maybe flags...
Maybe something like Windows HeapAlloc? Obviously for quick and dirty. Maybe have something like:
myMem = HeapAlloc(HEAP_EXTMEM, 0, count_bytes);

And don't have any implementations for free or the like. That way thing like libraries who wish to allocate buffers, can maybe simply call this if they know there is external memory and can grab a chunk without having to resort to some fixed address and hoping no else will need that address.

Again probably too late for this release.

In theory the _malloc_r interface in newlib allows you to create a base pointer for doing malloc like allocations:

Code:
2.23 malloc, realloc, free—manage memory
Synopsis

#include <stdlib.h>
void *malloc(size_t nbytes);
void *realloc(void *aptr, size_t nbytes);
void *reallocf(void *aptr, size_t nbytes);
void free(void *aptr);

void *memalign(size_t align, size_t nbytes);

size_t malloc_usable_size(void *aptr);

void *_malloc_r(void *reent, size_t nbytes);
void *_realloc_r(void *reent, 
    void *aptr, size_t nbytes);
void *_reallocf_r(void *reent, 
    void *aptr, size_t nbytes);
void _free_r(void *reent, void *aptr);

void *_memalign_r(void *reent,
    size_t align, size_t nbytes);

size_t _malloc_usable_size_r(void *reent, void *aptr);

Description
These functions manage a pool of system memory.

Use malloc to request allocation of an object with at least nbytes bytes of storage available. If the space is available, malloc returns a pointer to a newly allocated block as its result.

If you already have a block of storage allocated by malloc, but you no longer need all the space allocated to it, you can make it smaller by calling realloc with both the object pointer and the new desired size as arguments. realloc guarantees that the contents of the smaller object match the beginning of the original object.

Similarly, if you need more space for an object, use realloc to request the larger size; again, realloc guarantees that the beginning of the new, larger object matches the contents of the original object.

When you no longer need an object originally allocated by malloc or realloc (or the related function calloc), return it to the memory storage pool by calling free with the address of the object as the argument. You can also use realloc for this purpose by calling it with 0 as the nbytes argument.

The reallocf function behaves just like realloc except if the function is required to allocate new storage and this fails. In this case reallocf will free the original object passed in whereas realloc will not.

The memalign function returns a block of size nbytes aligned to a align boundary. The align argument must be a power of two.

The malloc_usable_size function takes a pointer to a block allocated by malloc. It returns the amount of space that is available in the block. This may or may not be more than the size requested from malloc, due to alignment or minimum size constraints.

The alternate functions _malloc_r, _realloc_r, _reallocf_r, _free_r, _memalign_r, and _malloc_usable_size_r are reentrant versions. The extra argument reent is a pointer to a reentrancy structure.

If you have multiple threads of execution which may call any of these routines, or if any of these routines may be called reentrantly, then you must provide implementations of the __malloc_lock and __malloc_unlock functions for your system. See the documentation for those functions.

These functions operate by calling the function _sbrk_r or sbrk, which allocates space. You may need to provide one of these functions for your system. _sbrk_r is called with a positive value to allocate more space, and with a negative value to release previously allocated space if it is no longer required. See Stubs.


Returns
malloc returns a pointer to the newly allocated space, if successful; otherwise it returns NULL. If your application needs to generate empty objects, you may use malloc(0) for this purpose.

realloc returns a pointer to the new block of memory, or NULL if a new block could not be allocated. NULL is also the result when you use ‘realloc(aptr,0)’ (which has the same effect as ‘free(aptr)’). You should always check the result of realloc; successful reallocation is not guaranteed even when you request a smaller object.

free does not return a result.

memalign returns a pointer to the newly allocated space.

malloc_usable_size returns the usable size.
 
installed TD 1.52 B5 - that went well. I deleted ALL of Teensy\AVR first to overwrite that in exisiting IDE 1.8.12.

Just downloaded and installed the new core for Teensy4 and was running some tests as well. Just using your quick and dirty test sketch I get the following:
Code:
External RAM test

myint = 123456
myint = 123456
*pint = 12345678
*pint = 12345678
*pint = 6541230
*pint = 4294967295
Not sure about that last value which should be a repeat of 6541230

EDIT: Just going through the lib as well seems that even though you have the psram initialization it still works with the initialization that is in the library.

I ran that too. With 1 PSRAM chip that is expected - the write works to cache - but dump of cache in code voids that value when it is not backed by a chip.

The TWO PSRAMS are now CONSECUTIVE/Contiguous in memory space - thus the MOVE for eFlash noted.

Then I went to IIRC the most recent PSRAM (aka SD log test) - added 2nd #ifdef to remove extRAM_t4.h lib usage.

So that relies on PJRC eRAM init before setup() :: IT WORKS
> It is slower ( Paul: understood need to prevent massive new issues on new hardware+broad feature )
> Exact same fail at :: skipped 9 2757, 55140, 70090000, 1 kk==16384 ????
-- that shows it persists with PJRC init and was not caused by speed.
-- Paul showed the way to arm_dcache_flush_delete() - I'll do that and update.
 
Are you initializing all the FlexSPI2 LUTs you're using? The startup code is only programming the LUTs used for the RAM commands.

I'll take a look at the code this afternoon if it's still unsolved.

Ok figured out what is causing the issue on initialization. Sorry it took so long - too many interruptions and I wanted to double check a few things. The lines highlighted in red seem to be causing the flash not to initialize. If I change 0x2000 to 0x4000 flash will initialize no problem.
Code:
	[COLOR="#FF0000"]FLEXSPI2_FLSHA1CR0 = 0x2000[/COLOR]; // 8 MByte
	FLEXSPI2_FLSHA1CR1 = FLEXSPI_FLSHCR1_CSINTERVAL(2)
		| FLEXSPI_FLSHCR1_TCSH(3) | FLEXSPI_FLSHCR1_TCSS(3);
	FLEXSPI2_FLSHA1CR2 = FLEXSPI_FLSHCR2_AWRSEQID(6) | FLEXSPI_FLSHCR2_AWRSEQNUM(0)
		| FLEXSPI_FLSHCR2_ARDSEQID(5) | FLEXSPI_FLSHCR2_ARDSEQNUM(0);

	[COLOR="#FF0000"]FLEXSPI2_FLSHA2CR0 = 0x2000;[/COLOR] // 8 MByte

Also just doubled check with USPS looks like will get the T4.1s today.
 
Yes, the main flash runs at a low speed, too (T4 + T4.1)
Not sure if that can be changed at runtime.

Yep just gave it a test - but I turned the clock off first then back on:
Code:
	  CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK))
		  | CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz
	  CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON);
so I added back into the library. But you can add it into the sketch if you don't want to use the library. Remember also affects flash.
 
Back
Top