The state of SPI Flash support

Status
Not open for further replies.

hoho

Well-known member
Hi,

What is the biggest (storage-wise) SPI flash chip known to be properly working with Teensy as of today? Which is available and not obsolete.
I have found a number of Paul's mentions of NAND and that he lacks the time to check them out.

I need about 1GB (1 Gigabyte) of fast readonly storage. I have about 800MB of audio samples and I need a really fast and reliable access to them (including simultaneous playback for up to 6-8 samples at a time). I currently have them on SD card. In RAW format and I open all the files (SD.open()) during the startup so that there is no overhead for that operation. To play a sample, I just start file.read() the data from the existing instance (and file.seek(0) in case the sample has played before). Having them on SD card works nicely almost always, they just play. Except for the moments when it doesn't work right away and some unfortunate delay or interruption happen. I try that on Teensy 3.6 with the build-in SD card reader.

In the SerialFlash library code there is a mention of 256MB chip, but it's obsolete and not available anymore. Maybe someone other than Paul has some progress with the NAND chips? I am also considering a USB stick with the USB host mode, but the whole USB host thing would make it significantly more complex hence less reliable (and I'm not really sure it will do the thing).

Any other options for up to 1GB of fast readonly storage? I also plan to use T4 in the final thing, playing with 3.6 for now just because it has a build-in SD card reader.

Thanks.
 
Hi,

Be careful - the size of flash chips is normally given in Mbits. So, if you want a single chip capable of storing 1GB, look for one that's at least 8Gb capacity. That will limit you to NAND flash - with lower speeds than NOR flash. Also, note that larger memory chips require longer addresses and that will slow things down. NAND flash is not suitable for random access, which you will need if you want to play multiple samples simultaneously - it's okay for an MP3 player, for example, that only plays one track at a time.

The largest NOR flash chips that would be easy to solder are 1Gb or 2Gb. I would like to know if the [FONT=Verdana,Arial,Tahoma,Calibri,Geneva,sans-serif]W25Q01JV ([FONT=Verdana,Arial,Tahoma,Calibri,Geneva,sans-serif]https://www.winbond.com/resource-files/W25Q01JV_DTR%20RevB%2011132019.pdf) would work with the Teensy Audio library - can anyone tell me?

I don't think there's a simple way to do what you want. One option would be to have more than one NOR flash chip, but you would need to figure out the hardware and the code for that.

Best of luck with it.



[/FONT][/FONT]
 
I've learned about Mbits some time ago, yeah, it was disappointing.

But given it's SPI, I suppose I would be ok to have an array of, say, up to 4 chips, I can spare 4 pins for the chip select purposes in this project...
The Memoryboard project (https://github.com/FrankBoesing/memoryboard) seems to have some logic for the multiple chips already.
4 2Gbits chips (if they are not madly expensive) sound completely normal to me. 7-8 1Gbits chips start to feel a little bit insane, but still not overly insane in case they are proven to work...

Thanks! And thanks for the NAND not suitable for random access info, I didn't know that.
 
Hi,

You may find that the only SPI 2Gb NOR flash chips are BGA - so you may need to go the insane route!
 
Why is it always falling down to the insanity... :)
As far as I can find, Cypress S70FL01GSAGMFI013 seems to be the only 1Gb option available to order in Europe.
Not cheap enough to order it blindly though...
There is also similar Micron MT25QL01GBBB8ESF from DigiKey, but with the customs it will be much worse.

Anyone with S70FL01GSAGMFI013 experience by any chance? :)
 
And if I read the datasheet correctly, this chip consists of two parts and requires two chip select pins.
I probably could reduce the number of samples to 500MB, which will require 4 chips and 8 chip select pins...
 
Until you can sort out the SPI flash situation, I suggest that you use the SDFat 2.0 beta and format your SD card as an EXFat device. If you then write your sound files as contiguous files, the playback will not require any FAT access after the file is open. That might help with delays in playback, since a normal FAT file access requires reading two sectors each time the playback moves to a new cluster in the FAT.
 
Thanks. I will have that in mind.
I've ordered a couple of S70FL01GSAGMFI013 to try. If I will manage to get them working, I'll order more, otherwise will be trying other approaches.
Theoretically, they seem pretty simple, regardless of the SOIC-16 package just a few of those 16 pins are really needed in my case, the voltage seems to be right...
 
I've started a project with the 1 Gbit W25N01GVZEIG NAND chip. Still very early but it doesn't seem too terrible to deal with, it has some features to allow you to remap bad sections and other quality of life stuff.

The interface is pretty much like talking to a NOR chip except it has a local cache so that everything takes two steps.
 
If using SPI flash turns out to be too much of a problem, you might consider this approach:

Create a one GByte pre-allocated contiguous EXFat file and treat it like a 1GB flash.
Once you have your various sound files stuffed into that large file, with a starting address for each separate sound file, you can write an interface much like you will need for the SPI flash:

SetReadAddress(uint32_t rdaddr) becomes gbFile.seek(rdaddr) instead of spiFlashSetAddress(rdaddr)
ReadMData(uint16_t *buffptr, uint16_t len) becomes gbFile.read(buffptr, len) instead of spiFlashRead(buffptr, len);

If you're reading from multiple files at (sort of) the same time, you have to set the file position before each read, just as you would with SPI flash.

I wrote a test program to see how fast a T3.6 can do this type of data reading. The program creates a 1GB file, then either reads 40MB as a contiguous set of 4KB buffers, or it reads the same 40MB with a seek before each buffer read.

Here are the results:


EXFat Benchmark test
SDIO initialization done.
Type is exFAT

Reading 10240 sequential buffers of 4096 bytes
Buffer read times: average = 208.10 usec Maximum = 691 usec >500uSec: 2

Reading 10240 random buffers of 4096 bytes
Buffer read times: average = 626.92 usec Maximum = 1523 usec >999uSec: 11

Writing 1GB Benchmark file.
Pre-Allocation succeeded.
Buffer write times: average = 327.15 usec Maximum = 42749 usec >999uSec: 11513

You can see that the addition of the seek() before the buffer read about triples the transfer time. Still, the maximum time to read a 4KB buffer is only 1.5milliSeconds and the average is about 627microSeconds.

Writing to the file takes longer---with much greater maximum times. This is due to the time needed to erase internal flash blocks and handle whatever wear-leveling algorithm is in place. The tests were done on a Samsung EVO 16GB microSD with the T3.6 at 180MHz.

On a Sandisk 8GB card formated as FAT32, the sequential read was about the same, but the random reads average went up to 1370microSeconds, probably because of the need to travers the FAT to find the cluster and sector for the seek(). EXFAT doesn't do this: it simply divides the seek position by the sector size and adds the offset of the beginning of the file.

Here is the test code I used to collect the data:
Code:
/**************************************************************************
*  Test EXFat random access
*  M. Borgerson  4/9/2020
*********************************************************************************/


#include "SdFat.h"
#include "sdios.h"
#include "FreeStack.h"
#include "ExFatlib\ExFatLib.h"
#include <time.h>
#include <TimeLib.h>
SdExFat sd;
ExFile BenchFile;


#define SD_CONFIG SdioConfig(FIFO_SDIO)

// SDCARD_SS_PIN is defined for the built-in SD on some boards.
#ifndef SDCARD_SS_PIN
const uint8_t SD_CS_PIN = SS;
#else  // SDCARD_SS_PIN
// Assume built-in SD is used.
const uint8_t SD_CS_PIN = SDCARD_SS_PIN;
#endif  // SDCARD_SS_PIN



/*****************************************************************************
   Read the Teensy RTC and return a time_t (Unix Seconds) value

 ******************************************************************************/
time_t getTeensy3Time() {
  return Teensy3Clock.get();
}


void setup() {
  // put your setup code here, to run once:

  while (!Serial) {}
  Serial.begin(9600);
  Serial.println("\nEXFat Benchmark test\n");
  if (!sd.begin(SD_CONFIG)) {
    Serial.println("\nSDIO Card initialization failed.\n");
  } else  Serial.println("SDIO initialization done.");

  if (sd.fatType() == FAT_TYPE_EXFAT) {
    Serial.println("Type is exFAT");
  } else {
    Serial.printf("Type is FAT%d\n", int16_t(sd.fatType()));
  }
  // set date time callback function  so file gets a good date
  SdFile::dateTimeCallback(dateTime);
  setSyncProvider(getTeensy3Time);
}

char fname[] = "btest.dat";
void loop() {
  // put your main code here, to run repeatedly:
  char ch;
  if (Serial.available()) {
    ch = Serial.read();
    if (ch == 'w')  WriteBenchFile(fname);
    if (ch == 's')  SeqReadBenchFile(fname);
    if (ch == 'r')  RandReadBenchFile(fname);
    if (ch == 'd')  sd.ls(LS_SIZE | LS_DATE | LS_R);
  }
}

#define RBUFFSIZE 4096
// Write a 1GB contiguous file for benchmark testing
// each 100MB takes about 24 seconds on T3.6 at 180MHz
// data in the file will be random data from stack area
void WriteBenchFile(char *filename) {
  uint64_t alloclength;
  uint32_t i, num;
  ExFile benchFile;
  uint32_t startmicro, dmicro, maxmicro, blockmax, mbytes, gt999us;
  float msum;
  unsigned char benchbuff[RBUFFSIZE];
  alloclength = 1024l * 1024l * 1024l; // 1GB allocation length
  Serial.printf("\n\nWriting 1GB Benchmark file.\n");
  // Open the file
  if (!benchFile.open(filename,  O_RDWR | O_CREAT | O_TRUNC)) {
    Serial.printf("Unable to open <%s> for writing.", filename);
    return;
  }

  if (!benchFile.preAllocate(alloclength)) {
    Serial.println("Pre-Allocation failed.");
    return;
  } else {
    Serial.println("Pre-Allocation succeeded.");
  }

  // now write the data in blocks of RBUFFSIZE --   262144 blocks
  // send out a message every 100MB or 25600 blocks
  blockmax = 262144;
  mbytes = 0;
  maxmicro = 0; 
  msum = 0.0;
  gt999us = 0;
  for(i=0; i<blockmax; i++){
    startmicro = micros();
    benchFile.write(&benchbuff, 4096);
    dmicro = micros() - startmicro;
    if(dmicro > maxmicro) maxmicro = dmicro;
    if(dmicro > 999) gt999us++;
    msum+= dmicro;
    if((i%25600) == 0){
      Serial.printf("%lu MBytes\n",mbytes);
      mbytes+= 100;
    }
  }


  benchFile.close();
  Serial.printf("Buffer write times: average = %4.2f usec   Maximum = %lu usec    >999uSec: %lu\n",
                              msum/262144, maxmicro, gt999us);
}


// Read 40 MBytes from benchFile and keep track of max and average read time

#define BUFFSTOREAD 10240   // 40MByte read
void SeqReadBenchFile(const char *filename) {
  uint16_t idx, numread;
  ExFile benchFile;
  uint8_t rbuffer[RBUFFSIZE];
  uint32_t startmicro, dmicro, maxmicro, gt500us;
  float microsum;

  if (!benchFile.open(filename, O_READ)) {
    Serial.printf("\nCould not open <%s> for reading.", filename);
    return;
  }
  startmicro = micros();  // save starting time
  Serial.printf("\n\nReading %lu sequential buffers of %u bytes\n",BUFFSTOREAD, RBUFFSIZE);
  maxmicro = 0; 
  microsum = 0.0;
  gt500us = 0;
  for(idx = 0; idx<BUFFSTOREAD; idx++){
    startmicro = micros();
    numread = benchFile.read(&rbuffer, RBUFFSIZE);
    dmicro = micros()-startmicro;
    if(dmicro > maxmicro) maxmicro = dmicro;
    if(dmicro > 500) gt500us++;
    microsum += dmicro;

  }
  Serial.printf("Buffer read times: average = %4.2f usec   Maximum = %lu usec    >500uSec: %lu\n",
                              microsum/BUFFSTOREAD, maxmicro, gt500us);
  
  benchFile.close();
  Serial.println();

}

void RandReadBenchFile(const char *filename) {

  uint16_t idx, numread;
  ExFile benchFile;
  uint8_t rbuffer[RBUFFSIZE];
  uint32_t startmicro, dmicro, maxmicro, gt500us, fpos;
  float microsum;

  if (!benchFile.open(filename, O_READ)) {
    Serial.printf("\nCould not open <%s> for reading.", filename);
    return;
  }
  startmicro = micros();  // save starting time
  Serial.printf("\n\nReading %lu random buffers of %u bytes\n",BUFFSTOREAD, RBUFFSIZE);
  maxmicro = 0; 
  microsum = 0.0;
  gt500us = 0;
  for(idx = 0; idx<BUFFSTOREAD; idx++){
    startmicro = micros();
    fpos = random(0,10000)*RBUFFSIZE;
    benchFile.seek(fpos);
    numread = benchFile.read(&rbuffer, RBUFFSIZE);
    dmicro = micros()-startmicro;
    if(dmicro > maxmicro) maxmicro = dmicro;
    if(dmicro > 999) gt500us++;
    microsum += dmicro;

  }
  Serial.printf("Buffer read times: average = %4.2f usec   Maximum = %lu usec    >999uSec: %lu\n",
                              microsum/BUFFSTOREAD, maxmicro, gt500us);
  
  benchFile.close();
  Serial.println();

}

//------------------------------------------------------------------------------
/*
   User provided date time callback function.
   See SdFile::dateTimeCallback() for usage.
*/
void dateTime(uint16_t* date, uint16_t* time) {
  // use the year(), month() day() etc. functions from timelib

  // return date using FAT_DATE macro to format fields
  *date = FAT_DATE(year(), month(), day());

  // return time using FAT_TIME macro to format fields
  *time = FAT_TIME(hour(), minute(), second());
}
 
Last edited:
After several WTF moments, I've managed to get this chip working.

That chip is actually mentioned here https://github.com/PaulStoffregen/S...3457044ced436a401092076218de2a4/tests.txt#L22 as «Pass1/2». And with the SerialFlash library that was exactly my experience, only one half was working.

The thing is that SerialFlash library is designed as a singleton for one chip only. If I remove static from all the methods, move some global static variables inside the class — it's possible to create several SerialFlashChip instances and SerialFlashChip.begin() each instance with its own chip select pin.

And I don't actually need files. I've written a script which goes through all my samples and joins them into a set of <64MB blobs. Then I put these blobs to the respective chips. Plus the script generates C header files where each sample has a record about the chip and the address for the particular sample. And when I need to play a sample, I just start SerialFlashChip.read() from a corresponding instance and address.
 
Status
Not open for further replies.
Back
Top