LittleFS port to Teensy/SPIFlash

Okay - my scan of the doc didn't miss the read ID command info ...
looking for more data sheet info took me to digikey. Cypress makes a same 8-DIP at 25 MHz for $2.70 less - forgot this ROHM unit was $7.10 - and it has JEDEC ID info.

I broke one of the 2Mb's out - DOH - forgot the size of 0.10" pins compared to what we've been doing. Had figured the long legs would be a problem ... but so far apart.

Will have to commit it to a perf board - and map 6 edge pins to the ones I used before - then see if it works the same to return an ID.
 
Okay - my scan of the doc didn't miss the read ID command info ...
looking for more data sheet info took me to digikey. Cypress makes a same 8-DIP at 25 MHz for $2.70 less - forgot this ROHM unit was $7.10 - and it has JEDEC ID info.

I broke one of the 2Mb's out - DOH - forgot the size of 0.10" pins compared to what we've been doing. Had figured the long legs would be a problem ... but so far apart.

Will have to commit it to a perf board - and map 6 edge pins to the ones I used before - then see if it works the same to return an ID.

Tell me about - soldering those 0.10" pins are a challenge. About as small as I want to go with a soldering iron! But getting better at it :)

@KurtE - @defragster - @Paul
Getting ready to push an update to the NAND library:
  • Deleted deviceErase since lowlevelFormat and is working without issue
  • Deleted current wait functions and reverted back to using @Paul's wait function used in the rest of the library
  • Fixed logic errors for the M02 code
  • Cleaned up .h file for unused functions.

Did some regression testing for the N01, N02 and M02 chips of course.

EDIT: Updated has been pushed. Still can't get the M02 to work faster than 55Mhz in SPI mode.

EDIT2: Just as a reminder
 
Last edited:
Sounds like it is coming along.

Will have to try out the current stuff.

Was curious about my setup with the QSPI being brought out and why I can not get the chip to read. So also hooked it up to Logic Analyzer... But currently no QSPI analyzer, but there is a user created version for this... So trying to build... But needed to update VS, then needed MFC, still missing some header file... Maybe I should have just used their prebuilt version ;)
 
Afternoon @KurtE
Maybe a bad connection? I have been testing some of chips including the M02 on breakout I put together for QSPI. Curious. Which chip are using. M02?

To be honest... would try the prebuilt version first for qspi. Mainly because I am lazy.

Other than Ecc not sure what’s next with memory
 
@KurtE - @defragster
Did just test the 512MB, 1GB(n01) and 2GB (n02) on the memory board with a T3.5. Running a couple of tests from LFSIntegrity received no errors and everything seemed operational. Just another sanity check.
 
Almost Afternoon - Sorry I was outside doing some other stuff.

I was able to get the SPIFlash Analyzer to build and setup pointer to built version. Need to see if I have the pins defined right for which ones are D2 versus D3...

I am not sure if it is working now or not, but here is a screen shot showing the first communications to the CHIP with both SPI and the SPI Flash analyzers turned on.

screenshot.jpg
 
@KurtE
Looks like it is working from what I can say for the JEDEC anyway.

MOSI is showing the command 9F which is the read JEDEC command and then it reading 3 bytes from the MISO line which is 0xEF, 0x40 and 0x18. Which is correct for the W25Q128JV.:
Code:
	uint8_t buf[4] = {0x9F, 0, 0, 0};
	port->beginTransaction(SPICONFIG);
	digitalWrite(pin, LOW);
	port->transfer(buf, 4);
	digitalWrite(pin, HIGH);
	port->endTransaction();
Now QSPI is going to be another challenge :)
 
Yep - It is complaining about both my PSRAM and the QPI... So maybe need to see if I have something cross wired or the like:
The next couple outputs look like still in SPI:
screenshot2.jpg

And then it does a Quad Read operation:
View attachment 23006
But neither chip appears to want to talk over QSPI right now... Maybe need to check out my connections again.
screenshot3.jpg
 
Last edited:
A couple of things. Both PSRAM and QPI use a mix of 2 pin IO (IO0 and IO1) and 4 pin (IO0, IO1, IO2 and IO3) commands. For instance JEDEC and read/writeregister only use IO0/IO1, READ/WRITES use IO0-4. So what you are seeing makes perfect sense.

Wondering when you brought out the lines you got CS lines connected or crossed?
 
Thanks, the one analyzer is supposed to understand the SPI, Dual SPI, and Quad SPI of these chips.
Or in Particular the type of chip I selected was WINBOND.
screenshot2.jpgscreenshot.jpg

Put two images in as could not see whole options.

Note: my 7 channels shown are in pin number order 48-54 Which I believe are:
(48-CS for PSRAM, 49-D1, 50-D2, 51-CS for Flash, 52-D0, 53-SCK, 54-D3 )
Which I need to ring out again.

It might also be the overhead of extra wires and the like. I notice a difference in the init for the PSRAM if both chips are plugge din or not.
Wonder if I should try a slower SPI speed?
 
Good luck getting LA to see QSPI Kurt. Had thought about suggesting that after logging off last night to see if there is some long pause or anomaly that explains QSPI perf not being any faster than SPI than it is.

Doing the _Program summary in response to FrankB's note doubled my wonder. Something isn't right with the results of that test ( like the QSPI data rates observed ) and I'm not seeing how the little 32KB cache could be accounting for the difference. There are I/O series in the LFSintegrity test that individually/repeatedly exceed 32KB - but then the next thing is a wholly different 'file' stored in a whole new place. And there is no transfer speed diff between large and small as indicated in this one set of I/O:

Code:
...
:: /S_file.txt  PRO_DISK +++ Add [sz 0 add 74752] @KB/sec 464.36 {462.02}  ++ S   Verify /S_file.txt 74752B  @KB/sec 2785.82 
:: /T_file.txt  PRO_DISK +++ Add [sz 0 add 78848] @KB/sec 466.49 {464.04}  ++ T   Verify /T_file.txt 78848B  @KB/sec 2792.17 
:: /U_file.txt  PRO_DISK +++ Add [sz 0 add 82944] @KB/sec 457.76 {455.33}  ++ U   Verify /U_file.txt 82944B  @KB/sec 2787.38 
:: /V_file.txt  PRO_DISK +++ Add [sz 0 add 87040] @KB/sec 464.96 {462.36}  ++ V   Verify /V_file.txt 87040B  @KB/sec 2782.61 
:: /W_file.txt  PRO_DISK +++ Add [sz 0 add 91136] @KB/sec 460.45 {457.83}  ++ W   Verify /W_file.txt 91136B  @KB/sec 2772.45 
:: /X_file.txt  PRO_DISK +++ Add [sz 0 add 95232] @KB/sec 464.77 {462.02}  ++ X   Verify /X_file.txt 95232B  @KB/sec 2789.05 
:: /Y_file.txt  PRO_DISK +++ Add [sz 0 add 99328] @KB/sec 467.30 {464.45}  ++ Y   Verify /Y_file.txt 99328B  @KB/sec 2786.67 
:: /Z_file.txt  PRO_DISK +++ Add [sz 0 add 103424] @KB/sec 472.37 {469.37}  ++ Z   Verify /Z_file.txt 103424B  @KB/sec 2771.20 
:: /A_file.txt  PRO_DISK +++ Add [sz 0 add 41984] @KB/sec 465.70 {458.11}  ++ A   Verify /A_file.txt 41984B  @KB/sec 2758.84 
:: /B_file.txt  PRO_DISK +++ Add [sz 5120 add 5120] @KB/sec 378.50 {347.76}  ++ B   Verify /B_file.txt 10240B  @KB/sec 2529.02 
:: /C_file.txt  PRO_DISK +++ Add [sz 9216 add 9216] @KB/sec 419.48 {397.91}  ++ C   Verify /C_file.txt 18432B  @KB/sec 2674.41 
:: /D_file.txt  PRO_DISK +++ Add [sz 13312 add 13312] @KB/sec 365.71 {354.19}  ++ D   Verify /D_file.txt 26624B  @KB/sec 2734.87 
:: /E_file.txt  PRO_DISK +++ Add [sz 17408 add 17408] @KB/sec 443.08 {433.84}  ++ E   Verify /E_file.txt 34816B  @KB/sec 2748.99 
:: /F_file.txt  PRO_DISK +++ Add [sz 21504 add 21504] @KB/sec 434.70 {427.34}  ++ F   Verify /F_file.txt 43008B  @KB/sec 2757.45 
:: /G_file.txt  PRO_DISK +++ Add [sz 25600 add 25600] @KB/sec 448.83 {442.11}  ++ G   Verify /G_file.txt 51200B  @KB/sec 2763.98 
:: /H_file.txt  PRO_DISK +++ Add [sz 29696 add 29696] @KB/sec 447.75 {441.89}  ++ H   Verify /H_file.txt 59392B  @KB/sec 2783.52 
:: /I_file.txt  PRO_DISK +++ Add [sz 33792 add 33792] @KB/sec 464.10 {458.42}  ++ I   Verify /I_file.txt 67584B  @KB/sec 2767.68 
:: /J_file.txt  PRO_DISK +++ Add [sz 37888 add 37888] @KB/sec 463.77 {458.60}  ++ J   Verify /J_file.txt 75776B  @KB/sec 2778.53 
:: /K_file.txt  PRO_DISK +++ Add [sz 41984 add 41984] @KB/sec 451.15 {446.63}  ++ K   Verify /K_file.txt 83968B  @KB/sec 2781.59 
:: /L_file.txt  PRO_DISK +++ Add [sz 46080 add 46080] @KB/sec 448.00 {443.84}  ++ L   Verify /L_file.txt 92160B  @KB/sec 2784.71 
:: /M_file.txt  PRO_DISK +++ Add [sz 50176 add 50176] @KB/sec 456.43 {452.37}  ++ M   Verify /M_file.txt 100352B  @KB/sec 2776.99 
:: /N_file.txt  PRO_DISK +++ Add [sz 54272 add 54272] @KB/sec 450.88 {447.13}  ++ N   Verify /N_file.txt 108544B  @KB/sec 2782.39 
...

Files of 0 size get over 32KB written - then it re-hits prior files and extends them with larger than 32KB writes.
This is a simple ROOTONLY test and the {braced} numbers show there is little overhead from the LFS reducing the KB/sec writes (not like when files are in subdirs).
Maybe there is a good reason with inherent MCU limitation? or maybe there is some tuning needed?

Looking at the similar lines in this prior page post #798 shows this 'disk' on _Program is only as fast as SPI NAND and not as fast as QSPI NAND to Write (_program write has extra overhead to function), though faster on Read. Being the PCB _Program Flash it isn't going to be open to LA viewing and write has added overhead with dropping interrupts - but maybe looking at the QSPI chips will show something? So not suggesting _Program compare is relevant Apples to Apples - it is just still on SerMon screen and shows I/O performed in the iterations.
 
@KurtE
Depending what sketch you are using you might have different speeds set up. If I remember right PSRAM/QPI is running at 88Mhz and if you are running SPI in from LittleFS its using 30Mhz. Are you testing the FLASH on the T4.1 in QSPI mode and one using SPI on a breakout board? I been testing some of the chips on that breakout board I wired up using wirewrap wire to the FLASH QSPI pins - probably about 2-3in long. For SPI have a couple different wirings but wires tend to be long.

@defragster - hoping that maybe with Kurt looking at the LA with LittleFS he may see something funny going on with my coding when he hits the NAND chips. DOn't remember the timings on the QSPI on the for 128 chip though. Wondering now that you are mentioning file sizes you are showing are all less than 1 block (128K) in size which is the specified block size. The prog size is 2048B. Wondering if since its all in 1 block the timing is all the same regardless of the file size? Just flow of thoughts here.
 
Maybe switch the cache off too see if there is a difference?
There is also a read-ahead that influences things , but I would have to read the manual if it is part of the cache or not.

For deleting a block in flash the wait time may be the biggest part.
Good night - it is midnight here :)
 
Just ran a quick test using a N01 Flash:
Total time to erase to the 1Gbytes is:
Code:
 Done Formatting Low Level in 4240999 us.
or 4.2 seconds from 1024 blocks of 128K. Looks like total wait time for each block erase is about 1,057us or 1.1 ms. Does that help.
 
Also - this post from another thread ... with FrankB's reply - WRT to p#736 - there is overhead using the QSPI hardware - where a shared bus will impose waits ... and the inherent MCU limit suggested there? Seems saw or did a PSRAM transfer rate MB/sec - though don't recall what that was.

I have managed to get it working but the cpu usage is very high.
The cpu usage using program memory is ~35%, while when using psram, it is ~52%

Yes, the bus to the PSRAM and the addressing is slowing things down.
For the Beta, the timing was a very conserative, too. Don't know if that is still the case.
 
Just ran a quick test using a N01 Flash:
Total time to erase to the 1Gbytes is:
Code:
 Done Formatting Low Level in 4240999 us.
or 4.2 seconds from 1024 blocks of 128K. Looks like total wait time for each block erase is about 1,057us or 1.1 ms. Does that help.

Help with? I've seen that the NANDs format much faster.
 
@defragster
Bad wording since I didn't really explain. Wondering why the M02 can only reliably work at 55Mhz when using SPI as opposed to 75Mhz say.
 
@defragster
Bad wording since I didn't really explain. Wondering why the M02 can only reliably work at 55Mhz when using SPI as opposed to 75Mhz say.

Okay, thought I missed something.
That is interesting that chip can't clock to spec. Doing the multistep process in p#719 {with 'F'ormat} does the 55 MHz clock result in slower KB's/sec?

Will be interesting when/if KurtE can get LA reliably connected, maybe he can see what the clock rate and transfers really are for a clue? Amazing how reliable the others are, even if pushing the clock doesn't seem to help.

Would be interesting to have a CPU% figure during this? Not sure how that would work as there isn't a central loop during file I/O. Not sure if a timer _isr would miss clocks when busy as an indication? Maybe a replacement yield() call counting calls/sec?
 
just for ref:

Code:
QSPI_NAND bigFile() write :: Big write KBytes per second 2730.20 
   ... and :: Big read&compare KBytes per second 2440.80 

QSPI_RAM bigFile() write :: Big write KBytes per second 4238.77 
   ... and :: Big read&compare KBytes per second 2517.31

Looks like I might change the 's' delete 2MB files to just do a block READ without verify of the 1,000 2048 blocks used to write it to get best speed for ref.

Also with a yield(){ CNT++ } a 10ms intervalTimer interrupts and tracks the calls to that yield() and prints when non-zero on the second ( 100 hits and added MAX tracking of those 100 samples ).
Code:
That is showing 100K to 600K calls per second during QSPI_NAND activity
>>  yps=589897 [mx=660400]
> That is a useless open ended measure for CPU% - but shows how often yield() is called during disk I/O ( except RAM )
> Took loop() yield() calls out of the loop with while(1) loopX()
 
@defragster
I was reading Paul's post in the beta5 thread and remembered that I put together a benchmark sketch based off of the SDFat benchmark sketch. So decided to run in on the N01 chip:

Code:
LittleFS Test
Device ID: 0xEFAA21
attempting to mount existing media
started
FILE_SIZE_MB = 5
BUF_SIZE = 2048 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2889.69,2269,600,708
2891.36,2269,600,708
2886.36,2269,600,709
2894.71,2268,600,707
2896.39,2269,600,707
2916.67,1795,600,702
2914.97,1795,600,702
2918.37,1795,600,701
2873.08,8194,600,712
2531.22,8495,652,809

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
18936.24,903,101,107
18936.24,903,101,107
18936.24,903,101,107
19008.24,903,101,108
18936.24,903,101,107
18936.24,903,101,108
18936.24,903,101,107
18936.24,903,101,107
18936.24,903,101,107
18936.24,903,101,108

Done
Not sure on its correctness but interesting thing is the latency deltas between reads and writes. But in a sense guess it does make sense.

Heres the sketch if you are interested:
Code:
#include "Streaming.h"
#define cout Serial

//#include <LittleFS.h>
#include <LittleFS_NAND.h>

//LittleFS_QSPIFlash myfs;
//LittleFS_RAM myfs;
//LittleFS_Program myfs;
//LittleFS_SPIFlash myfs;
LittleFS_QPINAND myfs;

File file, file1;


// Set SKIP_FIRST_LATENCY true if the first read/write to the SD can
// be avoid by writing a file header or reading the first record.
const bool SKIP_FIRST_LATENCY = true;

// Size of read/write.
const size_t BUF_SIZE = 2048;

// File size in MB where MB = 1,000,000 bytes.
const uint32_t FILE_SIZE_MB = 5;

// Write pass count.
const uint8_t WRITE_COUNT = 10;

// Read pass count.
const uint8_t READ_COUNT = 10;
//==============================================================================
// End of configuration constants.
//------------------------------------------------------------------------------
// File size in bytes.
const uint32_t FILE_SIZE = 1000000UL*FILE_SIZE_MB;

// Insure 4-byte alignment.
uint32_t buf32[(BUF_SIZE + 3)/4];
uint8_t* buf = (uint8_t*)buf32;

void setup() {
  //pinMode(13, OUTPUT);
  pinMode(10, OUTPUT);
  //digitalWrite(13, HIGH);
  while (!Serial) ; // wait
  Serial.println("LittleFS Test"); delay(5);
  delay(10);
  //if(!myfs.begin(buf, sizeof(buf))){
  //if (!myfs.begin(3000000)) {
  if(!myfs.begin()){
  //if(!myfs.begin(10)){
    Serial.println("Serial.println starting spidisk");
    while (1) ;
  }
  myfs.quickFormat();
  Serial.println("started");

  float s;
  uint32_t t;
  uint32_t maxLatency;
  uint32_t minLatency;
  uint32_t totalLatency;
  bool skipLatency;
  myfs.remove("bench.dat");
  //for(uint8_t cnt=0; cnt < 10; cnt++) {
    // open or create file - truncate existing file.
    file = myfs.open("bench.dat", FILE_WRITE);

    // fill buf with known data
    if (BUF_SIZE > 1) {
    for (size_t i = 0; i < (BUF_SIZE - 2); i++) {
      buf[i] = 'A' + (i % 26);
    }
    buf[BUF_SIZE-2] = '\r';
    }
    buf[BUF_SIZE-1] = '\n';

    cout << F("FILE_SIZE_MB = ") << FILE_SIZE_MB << endl;
    cout << F("BUF_SIZE = ") << BUF_SIZE << F(" bytes\n");
    cout << F("Starting write test, please wait.") << endl << endl;

    // do write test
    uint32_t n = FILE_SIZE/BUF_SIZE;
    cout <<F("write speed and latency") << endl;
    cout << F("speed,max,min,avg") << endl;
    cout << F("KB/Sec,usec,usec,usec") << endl;
    for (uint8_t nTest = 0; nTest < WRITE_COUNT; nTest++) {
    file.seek(0);

    maxLatency = 0;
    minLatency = 9999999;
    totalLatency = 0;
    skipLatency = SKIP_FIRST_LATENCY;
    t = millis();
    for (uint32_t i = 0; i < n; i++) {
      uint32_t m = micros();
      if (file.write(buf, BUF_SIZE) != BUF_SIZE) {
      Serial.println("write failed");
      }
      m = micros() - m;
      totalLatency += m;
      if (skipLatency) {
      // Wait until first write to SD, not just a copy to the cache.
      skipLatency = file.position() < 512;
      } else {
      if (maxLatency < m) {
        maxLatency = m;
      }
      if (minLatency > m) {
        minLatency = m;
      }
      }
    }

    t = millis() - t;
    s = file.size();
    cout << s/t <<',' << maxLatency << ',' << minLatency;
    cout << ',' << totalLatency/n << endl;
    }
    cout << endl << F("Starting read test, please wait.") << endl;
    cout << endl <<F("read speed and latency") << endl;
    cout << F("speed,max,min,avg") << endl;
    cout << F("KB/Sec,usec,usec,usec") << endl;

    // do read test
    for (uint8_t nTest = 0; nTest < READ_COUNT; nTest++) {
      file.seek(0);
      maxLatency = 0;
      minLatency = 9999999;
      totalLatency = 0;
      skipLatency = SKIP_FIRST_LATENCY;
      t = millis();
      for (uint32_t i = 0; i < n; i++) {
        buf[BUF_SIZE-1] = 0;
        uint32_t m = micros();
        int32_t nr = file.read(buf, BUF_SIZE);
        if (nr != BUF_SIZE) {
          Serial.println("read failed");
        }
        m = micros() - m;
        totalLatency += m;
        if (buf[BUF_SIZE-1] != '\n') {
          Serial.println("data check error");
        }
        if (skipLatency) {
        skipLatency = false;
        } else {
        if (maxLatency < m) {
          maxLatency = m;
        }
        if (minLatency > m) {
          minLatency = m;
        }
      }
     }
    
    s = file.size();
    
    
    t = millis() - t;
    cout << s/t <<',' << maxLatency << ',' << minLatency;
    cout << ',' << totalLatency/n << endl;
    }
    cout << endl << F("Done") << endl;
    file.close();
  //}
}

void loop() {}
 
Out of curiosity I ran the same bench sketch on the N01, N02 and the W25Q512JV on the SPI memory board:
Code:
SPINAND SEQ BENCH TESTS
FILE_SIZE_MB = 5
BUF_SIZE = 2048 bytes
---------------------------------------
N02 write/read speed and latency
speed,max,min,avg
CLK, KB/Sec,usec,usec,usec
@55Mhz  WRITE:  1169.94,31533,1265,1750
@55Mhz   READ:  4511.88,3827,426,453
@75Mhz  WRITE:  1305.95,30350,1225,1568
@75Mhz   READ:  4637.45,3723,413,441

N01 write/read speed and latency
speed,max,min,avg
CLK, KB/Sec,usec,usec,usec
@55Mhz  WRITE:  1441.93,5015,1295,1420
@55Mhz   READ:  4350.89,3971,442,470
@75Mhz  WRITE:  1173.79,31773,1304,1744
@75Mhz   READ:  4483.56,3853,428,456


=========================================

SPI SEQ BENCH TESTS
FILE_SIZE_MB = 5
BUF_SIZE = 2048 bytes
---------------------------------------
512MB write/read speed and latency
speed,max,min,avg
CLK, KB/Sec,usec,usec,usec
@30Mhz  WRITE:  82.77,106796,5540,24744
@30Mhz   READ:  2157.60,1779,618,949
 
Sounds like good progress! I think I will put on hold the QSPI board. I did get some luck yesterday when I lowered the QSPI speed. I may double check wiring later.

Will switch back to another test board.

Will also try out that LA analyzer to see how well it does show the Winbond stuff.
 
Back
Top