Section type conflict and other memory error

Thanks to all your explanation, I finally understood how all this works!
Now I have a code that compiles and can create a file to store a large array, and read it afterwards.

Code:
/*
  Test LittleFS write, read
 */
#include <LittleFS.h>
LittleFS_QPINAND myfs;  // QSPI NAND Flash attached to SPI
#include "onnx__Conv_110.h"
const int N = 100000;
const int total = 921600;
DMAMEM float data[N] = { 0 };
float last = 0;
elapsedMicros chrono;

void setup() {
  Serial.begin(9600);
  while (!Serial) {
    // wait for serial port to connect.
  }
  Serial.print("Initializing SPI FLASH...");
  if (!myfs.begin()) {
    Serial.printf("Error starting %s\n", "SPI FLASH");
    while (1)
      ;
  }

  // uncomment only the first time
  // Serial.println("Formatting SPI FLASH...");
  // myfs.lowLevelFormat('.');

  Serial.println("Searching file...");
  File dir = myfs.open("/");
  bool found = false;
  while (true) {
    File entry = dir.openNextFile();
    if (!entry) break;
    if (strcmp(entry.name(), "onnx__Conv_110.h") == 0) {
      found = true;
      Serial.println("File exists!");
      Serial.printf("Size: %d\n", entry.size());
      break;
    }
    entry.close();
  }
  if (!found) {
    Serial.println("Creating file...");
    chrono = 0;
    File dataFile = myfs.open("onnx__Conv_110.h", FILE_WRITE);
    // if the file is available, write to it:
    if (dataFile) {
      for (int j = 0; j < total; j++) {
        dataFile.println(onnx__Conv_110_output_0[j], 8);
        if (j % 100000 == 0) Serial.printf("i = %d data = %f\n", j, onnx__Conv_110_output_0[j]);
      }
      Serial.printf("Execution time: %.3f ms\n", (float)chrono / 1000.0);
    } else Serial.println("error opening onnx__Conv_110.h");
    Serial.printf("Size: %d\n", dataFile.size());
    dataFile.close();
    delay(200);
  }
  last = onnx__Conv_110_output_0[N - 1];

  Serial.println("Now reading data...");
  chrono = 0;
  File file2 = myfs.open("onnx__Conv_110.h", FILE_READ);
  char buffer[15];  // 8 digits after .
  int index = 0;
  int i = 0;
  if (file2) {
    while (file2.available()) {
      buffer[index++] = file2.read();
      if (buffer[index - 1] == '\r') {
        buffer[index - 1] = '\0';  // NULL terminate the array
        data[i++] = atof(buffer);
        index = 0;
        file2.read();
        if (i % 10000 == 0) Serial.println(i);
      }
      if (i > N) break;
    }
    Serial.printf("Execution time: %.3f ms\n", (float)chrono / 1000.0);
  } else Serial.println("error opening file");
  file2.close();

  Serial.printf("Checking data number %d...\nData written: %f\n", N, last);
  Serial.printf("Data read   : %f\n", data[N - 1]);
}
void loop() {
  // nope
}

The output is correct:
Code:
Initializing SPI FLASH...NAND Flash Memory Size =  265289728 bytes / 253 Mbyte / 2 Gbit
Flash initialized.
Searching file...
File exists!
Size: 11546225
Now reading data...
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Execution time: 520.251 ms
Checking data number 100000...
Data written: -0.021341
Data read   : -0.021341

It take 520 ms to read 100 000 float values (53 ms for 10 000 and 2.7 ms for 100).
  • If I set the PSRAM frequency to 133 MHz, the time reduces to 509 ms for 100 000 data.
  • Using compilation options 'Fastest with LTO' reduces again to 504 ms
  • Overclocking the CPU to 860MHz reduces down to 370 ms.
Code:
Initializing SPI FLASH...NAND Flash Memory Size =  265289728 bytes / 253 Mbyte / 2 Gbit
Flash initialized.
Searching file...
File exists!
Size: 11546225
Now reading data...
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Execution time: 370.558 ms
Checking data number 100000...
Data written: -0.021341
Data read   : -0.021341

The array data is stored in PSRAM (using DMAMEM), which is 8 MB. It should be able to store float arrays up to 2000000 values. But when I increase the value of N (size of the array) to 300 000 I get this error message:

Code:
c:/users/fa125436/appdata/local/arduino15/packages/teensy/tools/teensy-compile/11.3.1/arm/bin/../lib/gcc/arm-none-eabi/11.3.1/../../../../arm-none-eabi/bin/ld.exe: C:\Users\fa125436\AppData\Local\arduino\sketches\BE3F2F28DD8D15A2286C6FF9565BEE79/Teensy_test_LittleFS_Flash.ino.elf section `.bss.dma' will not fit in region `RAM'
c:/users/fa125436/appdata/local/arduino15/packages/teensy/tools/teensy-compile/11.3.1/arm/bin/../lib/gcc/arm-none-eabi/11.3.1/../../../../arm-none-eabi/bin/ld.exe: region `RAM' overflowed by 688128 bytes
collect2.exe: error: ld returned 1 exit status

Why does it not use the PSRAM?

Another question: isn't it possible to read and write float numbers directly as groups of 4 bytes? The resulting file would be smaller in the Flash memory.
 
Last edited:
Code:
const int N = 100000;
const int total = 921600;
DMAMEM float data[N] = { 0 };

If you set N=300000
then data will try to allocate: 300000*4=1200000 bytes which does not fit into 512K bytes...
1200000-512000=688000
Which is pretty close to the number of overflow... mentioned: 688128

Why does it not use PSRAM... Because you told it to use DMAMEM.
Note: none of these sections allow you to have the compiler preset the values that will go into those variables....
 
Why does it not use PSRAM... Because you told it to use DMAMEM.

Use EXTMEM for PSRAM.

Your write loop writes total values (916200) and then your read loop reads N values (100000). Is that what you intended?

Why do you #include the file that you read/write within the program?
 
Why does it not use PSRAM... Because you told it to use DMAMEM.

Use EXTMEM for PSRAM.
Oups, I mixed EXTMEM and DMAMEM !!!! My fault.

Your write loop writes total values (916200) and then your read loop reads N values (100000). Is that what you intended?

Why do you #include the file that you read/write within the program?
Yes, I'm just testing, but obviously in the end I'll load the entire array.
This array is generated by another program and is in the form of an .h file that I can #include.
But I can also transform this file for example in binary format. But what next? Can I upload directly such a file to the external Flash of my Teensy? And how can I read the values?
 
Last edited:
Can I upload directly such a file to the external Flash of my Teensy? And how can I read the values?

There is no way that I know of to upload data directly to your QSPI flash. Can you please tell us what you are trying to do? I thought you were trying to build a data logger, but if you need to load data into your flash, that doesn't sound like a data logger. What is in these files that you want to load into flash, and what do you want to do with them?
 
First: I changed to EXTMEM and it works perfectly, reading 300 000 float data in 1760 ms, and more than 5 seconds for the whole 900 000 data array.

What I want to do: port a neural network (NN) onto the Teensy. The network can be quite large and the parameters (weights and biases) are provided in the form of .h files, in float or int8 format.
But as I said earlier, I can change this format to whatever is better.

My Teensy has 8MB PSRAM and 256 MB Flash : I expected this external Flash can be used to store large files. But the maximum size of an array I can use for computing will be 8 MB. This seems the main limitation of my NN application.

But storing float values in ASCII format, using
Code:
dataFile.println(onnx__Conv_110_output_0[j], 8);
doesn't seem optimum to me. That's why I'm looking for alternative solutions.
 
Where is this being stored that a 4 byte float cannot be stored directly?
Sorry, I don't understand what you mean.

If I understand well, the dataFile.println writes the values as ASCII code using (in my case) 8 digits after the dot.
When I read the data again from the file, it seems also to read it as ASCII characters, then trandorm it to a float using atof.

A float is coded on 4 bytes. But writing it as x.xxxxxxxx + \r\n requires 12 bytes, so the file size could be reduced by 3.
But for this, I would need some sort of dataFile.write(float) or dataFile.write(bytes, 4) instruction and the same for reading.

I think it is possible in ESP32 using the Preferences library for emulating EEPROM. That's why I wonder if it is also possible in the Teensy.
 
If you want to store data in binary format then write it to the file in binary format.

file.write( dataPtr , length ) and the equivalent read function should work for reading binary data. This would allow you to read/write a single value or a whole array of values in one go.

This also has the advantage that if you want the n'th value in a file you can use file.seek(n*sizeof(float)) to jump to the correct point in the file.
 
Thanks a lot, that's what I was looking for.
I changed the code and it's much simpler and faster.

Code:
/*
  Test LittleFS write, read
 */
#include <LittleFS.h>
LittleFS_QPINAND myfs;  // QSPI NAND Flash attached to SPI
#include "onnx__Conv_110.h"
const int total = 921600;
const int N = total;
EXTMEM float data[N] = { 0 };
float last = 0;
elapsedMicros chrono;
void setup() {
  // Open serial communications and wait for port to open:
  Serial.begin(9600);
  while (!Serial) {
    // wait for serial port to connect.
  }
  // set clock speed for PSRAM to 132 MHz
  CCM_CBCMR &= ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK);
  CCM_CBCMR |= (CCM_CBCMR_FLEXSPI2_PODF(3) | CCM_CBCMR_FLEXSPI2_CLK_SEL(3));

  Serial.print("Initializing SPI FLASH...");
  // see if the Flash is present and can be initialized:
  if (!myfs.begin()) {
    Serial.printf("Error starting %s\n", "SPI FLASH");
    while (1)
      ;
  }
  Serial.printf("NAND Flash Memory Size =  %d bytes / ", myfs.totalSize());
  Serial.printf("%d Mbyte / ", myfs.totalSize() / 1048576);
  Serial.printf("%d Gbit\n", myfs.totalSize() * 8 / 1000000000);
  Serial.println("Flash initialized.");
  Serial.println("Formatting SPI FLASH...");
  myfs.lowLevelFormat('.');
  Serial.println("Searching file...");
  File dir = myfs.open("/");
  bool found = false;
  while (true) {
    File entry = dir.openNextFile();
    if (!entry) break;
    if (strcmp(entry.name(), "onnx__Conv_110.h") == 0) {
      found = true;
      Serial.println("File exists!");
      Serial.printf("Size: %d\n", entry.size());
      break;
    }
    entry.close();
  }
  if (!found) {
    Serial.println("Creating file...");
    chrono = 0;
    File dataFile = myfs.open("onnx__Conv_110.h", FILE_WRITE);
    // if the file is available, write to it:
    if (dataFile) {
      dataFile.write(onnx__Conv_110_output_0, 4 * total);
      Serial.printf("Execution time: %.3f ms\n", (float)chrono / 1000.0);
    } else Serial.println("error opening onnx__Conv_110.h");
    Serial.printf("Size: %d (%d)\n", dataFile.size(), total *4);
    dataFile.close();
    delay(200);
  }
  last = onnx__Conv_110_output_0[N - 1];
  Serial.println("Now reading data...");
  chrono = 0;
  File file2 = myfs.open("onnx__Conv_110.h", FILE_READ);
  if (file2) {
    for (int i = 0; i < N; i++) {
      uint8_t databytes[4];
      file2.readBytes(databytes, 4);
      memcpy(&data[i], &databytes, 4);
    }
    Serial.printf("Execution time: %.3f ms\n", (float)chrono / 1000.0);
  } else Serial.println("error opening file");
  file2.close();

  Serial.printf("Checking data number %d...\nData written: %f\n", N, last);
  Serial.printf("Data read   : %f\n", data[N - 1]);
}
void loop() {
  // put your main code here, to run repeatedly:
}

For the entire array of 921600 floats, the write time is 1069 ms and the read time is 395 ms (with PSRAM speed of 133 MHz) instead of 5000!
If I read the data by groups of 16 * 4 bytes, it reduces the read time to 227 ms. For groups of 128 * 4, the time is 225 ms: not really an improvement.
Now if I compile using 'fastest with LTO' option and overclocking 816MHz, the read time is 220 ms.
 
I'm not sure how much internal buffering littleFS does but if possible writing in blocks that match the FLASH sector size will probably be the most efficient option.
Since NAND flash is block based for reads as well as writes there may also be a benefit to reading in blocks equal to the sector size however I'd expect the file system to be buffering each page and so that not to make much difference. Random reads will however be slower than sequential reads.

Overclocking will only have minimal impact on the speed, the SPI bus speed to the flash is probably going to be the main limiting factor for reads and that will be constant for all CPU speeds.
You could try increasing the clock speed for the SPI bus, it looks like it's set to 30MHz at the start of LittleFS.cpp but I'm not sure if that is the speed used for QSPI connected devices or how much headroom there is before things stop working.
 
Your goal is to read the entire array from your LittleFS file into your PSRAM array, right? Have you tried reading the entire array with one call to readBytes(), similar to the way you write?

Code:
file2.readBytes( data, total*4 );
 
Code:
file2.readBytes( data, total*4 );
That's what I tried in the first place, but readBytes requires a byte array and data is a float array.
So I'd need a very large bytes buffer array (size 4*total bytes) to store the content of the file, and then memcpy it to data.

How can I change the clock speed for the SPI bus ?
 
Last edited:
very large bytes buffer array (size 4*total bytes) to store the content of the file
Correct - indicated number of bytes will transfer - it doesn't care what may be in them. Just specify the needed size there is room for.
If you get a compiler warning just add a (byte*) cast to make it happy.

YMMV (depends on the chip in use as soldered) - but here it post on fast PSRAM clock:
 
Thanks, I did this in post 26.
Oh, you mean that the PSRAM and the additional Flash use the same SPI bus?

Correct - indicated number of bytes will transfer - it doesn't care what may be in them.
If the free PSRAM memory is larger than twice the size of the float array, then I can dynamically assign a temporary buffer and use it to store the content of the file before memcpy-ing it to my float array.
I'll try that to see if it is faster.
 
I'll try that to see if it is faster.
Here is what I did to read the file:

C++:
if (file2) {
    Serial.println(1);  // to trace execution
    char *databytes = (char*) extmem_malloc(4 * N);
    Serial.println(2);
    file2.readBytes(databytes, 4 * N); // freeze if N = total (921600)
    Serial.println(3);
    memcpy(&data, &databytes, 4 * N);  // crash if N smaller (10000)
    Serial.println(4);
    free(databytes);
    Serial.println(5);
  }

And global variables are:
C++:
const int total = 921600;
const int N = 10000;
EXTMEM float data[N] = { 0 };

When I try to read the entire file in one shot, i.e. N = 921600, the code freezes at the readBytes instruction.
If I reduce this value, for example 10000, then the code crashes after memcpy.

Did I do something wring?
 
This pointer isn't checked for new malloc? :: (char*) extmem_malloc(4 * N);

EXTMEM is allocating space already, and it will not be {0} initialized :: EXTMEM float data[N] = { 0 };
 
You don’t need two arrays or memcpy(). You can read directly into data[] like this:

Code:
file2.readBytes( (char*)data, total*4 );

Some of your sketches have N and total the same. Others not.
 
Some of your sketches have N and total the same. Others not.
Yes, sorry for the confusion, this is just a test code, I'm learning to use the external Flash.

You don’t need two arrays or memcpy(). You can read directly into data[] like this:

Code:
file2.readBytes( (char*)data, total*4 );
Thanks a lot, this is exactly what I was looking for! I wouldn't have thought about this alone...
However, it seems not robust. It works for low values of N, but not for higher values.

Here is the code:
C++:
/*
  Test LittleFS write, read
 */
#include <LittleFS.h>
LittleFS_QPINAND myfs;  // QSPI NAND Flash attached to SPI
#include "onnx__Conv_110.h"
const int total = 921600;
const int N = 30000;
EXTMEM float data[N];
float last = 0;
elapsedMicros chrono;
void setup() {
  // Open serial communications and wait for port to open:
  Serial.begin(9600);
  while (!Serial) {
    // wait for serial port to connect.
  }
  // set clock speed for PSRAM to 132 MHz
  CCM_CBCMR &= ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK);
  CCM_CBCMR |= (CCM_CBCMR_FLEXSPI2_PODF(3) | CCM_CBCMR_FLEXSPI2_CLK_SEL(3));
  Serial.print("Initializing SPI FLASH...");
  // see if the Flash is present and can be initialized:
  if (!myfs.begin()) {
    Serial.printf("Error starting %s\n", "SPI FLASH");
    while (1)
      ;
  }
  Serial.println("Flash initialized.");

  // Uncomment only the first time
  // Serial.println("Formatting SPI FLASH...");
  // myfs.lowLevelFormat('.');

  Serial.println("Searching file...");
  File dir = myfs.open("/");
  bool found = false;
  while (true) {
    File entry = dir.openNextFile();
    if (!entry) break;
    if (strcmp(entry.name(), "onnx__Conv_110.h") == 0) {
      found = true;
      Serial.printf("File exists! Size: %d\n", entry.size());
      break;
    }
    entry.close();
  }
  if (!found) {
    Serial.println("Creating file...");
    chrono = 0;
    File dataFile = myfs.open("onnx__Conv_110.h", FILE_WRITE);
    if (dataFile) {
      dataFile.write(onnx__Conv_110_output_0, 4 * total);
      Serial.printf("Execution time: %.3f ms\n", (float)chrono / 1000.0);
    } else Serial.println("error opening onnx__Conv_110.h");
    Serial.printf("Size: %d (%d)\n", dataFile.size(), total * 4);
    dataFile.close();
    delay(200);
  }
  last = onnx__Conv_110_output_0[N - 1];
  Serial.println("Now reading data...");
  chrono = 0;
  File file2 = myfs.open("onnx__Conv_110.h", FILE_READ);
  if (file2) {
    file2.readBytes((char*)data, N * 4);
    Serial.printf("Execution time: %.3f ms\n", (float)chrono / 1000.0);
  } else Serial.println("error opening file");
  file2.close();

  Serial.printf("Checking data number %d...\nData written: %f\n", N, last);
  Serial.printf("Data read   : %f\n", data[N - 1]);
}
void loop() {
  // nope
}
It works with N = 30000:
Initializing SPI FLASH...NAND Flash Memory Size = 265289728 bytes / 253 Mbyte / 2 Gbit
Flash initialized.
Searching file...
File exists! Size: 3686400
Now reading data...
Execution time: 8.793 ms
Checking data number 30000...
Data written: -0.005433
Data read : -0.005433
but not when N=100000: the code freezes
Initializing SPI FLASH...NAND Flash Memory Size = 265289728 bytes / 253 Mbyte / 2 Gbit
Flash initialized.
Searching file...
File exists! Size: 3686400
Now reading data...
I thought it might be related to the increase of the SPI bus speed, but if I comment out the lines
C++:
// set clock speed for PSRAM to 132 MHz
  CCM_CBCMR &= ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK);
  CCM_CBCMR |= (CCM_CBCMR_FLEXSPI2_PODF(3) | CCM_CBCMR_FLEXSPI2_CLK_SEL(3));
the results is the same.

So far, the only code that worked is the one from message 35: it reads the file by pieces into a buffer in RAM1 - I suppose - and copies it to the PSRAM. But trying to read directly from the Flash to the PSRAM doesn't reach the end of the file.

Is there any kind of conflict for the use of the SPI bus that creates this problem?
Or is there a problem with the .h file?

C++:
#include <stdint.h>

PROGMEM const float onnx__Conv_110_output_0[921600] =
{
    0.029072318226099014, -0.010079068131744862, -0.01411939226090908, 0.010565056465566158, -0.005434311926364899,
    -0.001253387425094843, 0.03692169860005379, 0.04304002597928047, -0.020667336881160736, -0.05344244837760925,
...
    0.011018412187695503, -0.0016959584318101406, -0.038232747465372086, -0.016288094222545624, -0.004808211233466864
};
 
So this means that the speed could be increased as well?
If the speed change is applied in that test sketch - it suggests it works at that speed for the tests performed.
It just writes a series of fixed and repeatable random numbers across the PSRAM - it is not exhaustive of all values everywhere.

But, having it work does indicate the PSRAM is healthy and properly attached. Had to ask since the reports of Hang/Restart are otherwise open to wonder why that might be.

PSRAM had some significant set of other testing and use since its release - though this use case BULK reading from FLASH/PROGMEM (?) and WRITING to PSRAM may not have been done before.

The same test was modified for testing the SDRAM ( if that thread was seen? ) - and some of the series of test worked when others tended to fail as access speed was adjusted - and for that chip when alternate CAP's were used.
So this means that the speed could be increased as well?
QSPI speed is independent of SPI or other devices. However (IIRC) changing for ONE device affects the other QSPI chip as well - and a FLASH chip on second QSPI may not be capable of that speed?
 
C++:
if (file2) {
    Serial.println(1);  // to trace execution
    char *databytes = (char*) extmem_malloc(4 * N);
    Serial.println(2);
    file2.readBytes(databytes, 4 * N); // freeze if N = total (921600)
    Serial.println(3);
    memcpy(&data, &databytes, 4 * N);  // crash if N smaller (10000)
    Serial.println(4);
    free(databytes);
    Serial.println(5);
  }

And global variables are:
C++:
const int total = 921600;
const int N = 10000;
EXTMEM float data[N] = { 0 };

Did I do something wring?
You did something pointless.

Why not read directly into the final buffer using
Code:
file2.readBytes((char*)data, 4 * N)

Also I'd recommend doing something like:

Diff:
#define ArrayType float

const int arrayLen = 10000;
EXTMEN ArrayType data[arrayLen]
 
void readFunction(){
...
file.readBytes((char*)data,arrayLen*sizeof(ArrayType));
...
}

This way it's easier to change the data type in the future if you need to, the code doesn't make any assumptions as to the size of the data type used. And N is a very generic name for a global variable, for the sake of a few extra key presses give it a more meaningful name.
 
That's what I did in message number 45, but it seems to freeze for large values of N (or arrayLen).
Please read message 45 as it seems you answer to message 42...
 
Back
Top