Fastest Way to Log to SD Card

Power_Broker

Well-known member
I'm using the built-in SD interface on my T4.1 for datalogging and want to have as low of latency as absolutely possible. I'm using SdFat.h and have successfully tested the following sketch:

Code:
#include "SdFat.h"




#ifdef SDCARD_SS_PIN
const uint8_t SD_CS_PIN = SDCARD_SS_PIN;
#endif // SDCARD_SS_PIN
#define SPI_CLOCK SD_SCK_MHZ(50)
#define SD_CONFIG SdioConfig(FIFO_SDIO)




SdFs sd;
FsFile newFile;




double prevTime = 0;
double curTime  = 0;
double diff     = 0;
double period   = 0;
double hz       = 0;

long i = 0;




void setup()
{
  Serial.begin(2000000);
  while(!Serial);

  if (!sd.begin(SD_CONFIG))
  {
    Serial.println("SD initialization failed\n");
    while (1);
  }
  Serial.println("SD initialization succeeded\n");

  newFile = sd.open("log.txt", FILE_WRITE);
  
  Serial.println("Setup done");
}




void loop()
{
  prevTime = micros();
  
  newFile.println(i);
  i++;

  if (!(i % 100))
  {
    newFile.flush();
    newFile.close();
    newFile = sd.open("log.txt", FILE_WRITE);
  }
  
  curTime = micros();

  diff   = curTime - prevTime;
  period = diff / 1000000.0;
  hz     = 1.0 / period;
  
  Serial.print("diff: ");   Serial.println(diff);
  Serial.print("period: "); Serial.println(period);
  Serial.print("hz: ");     Serial.println(hz);
  Serial.println();
}

I realize this is likely not the most efficient way to write data to the SD card, but what is?

  • Should I batch the data inside a buffer and write the entire buffer at once?
  • What should the buffer size be?
  • Also, can I set the `SPI_CLOCK` value over 50MHz?
  • When periodically saving the data, should I use both `flush()` and `close()` or just `close()`? (I'm streaming data and the sketch won't be able to tell when power will be cut or the card removed)

Thanks!
 
[*]Should I batch the data inside a buffer and write the entire buffer at once?
[*]What should the buffer size be?

Yes. Writing in 4K or larger chunks is the most effective thing you can do to increase performance.


Also, can I set the `SPI_CLOCK` value over 50MHz?

Probably not. Even if it does happen to work, you should not rely on it, as SD cards have a max spec of 25 MHz and Teensy's LPSPI peripheral has a max spec of 30 MHz.

But you should try DEDICATED_SPI for a nice optimization if no other chips are connected to the SPI pins. See File > Examples > SD > SdFat_Usage for the proper syntax (your code probably isn't using the SdFat access properly).
 
Yes. Writing in 4K or larger chunks is the most effective thing you can do to increase performance.




Probably not. Even if it does happen to work, you should not rely on it, as SD cards have a max spec of 25 MHz and Teensy's LPSPI peripheral has a max spec of 30 MHz.

But you should try DEDICATED_SPI for a nice optimization if no other chips are connected to the SPI pins. See File > Examples > SD > SdFat_Usage for the proper syntax (your code probably isn't using the SdFat access properly).

When periodically saving the data, should I use both `flush()` and `close()` or just `close()`? (I'm streaming data and the sketch won't be able to tell when power will be cut or the card removed)
As far as I can see there is NO EXAMPLE using flush.
 
As far as I can see there is NO EXAMPLE using flush.

I really don't understand the differences between all of the different file types, file systems, cards, cardinfo, etc., but File.flush() eventually calls sync() on one of the underlying SdFat file classes. There are examples that call sync(), such as SdFat\examples\bench.ino. If you call File.flush(), you're just calling sync() through this hierarchy.

from SD\src\SDFile.h, in class SDFile
Code:
  virtual void flush() { sdfatfile.flush(); }

from SdFat\src\FsFile.h, in class FsBaseFile
Code:
  void flush() {sync();}

and in the same class
Code:
  bool sync() {
    return m_fFile ? m_fFile->sync() :
           m_xFile ? m_xFile->sync() : false;
  }

m_fFile and m_xFile are pointers to FatFile and ExFatFile respectively, and you can see their actual implementations of sync() in FatFile.cpp and ExFatFile.cpp
 
I also do logging to sdcard with a Teensy 4.1 (well, a Teensy MicroMod but it's the same chip as the 4.1 and also supports SDIO like you're using). I use a 16KB ring buffer to write into and then batch out in 512 byte increments as I believe this is the general size of a full sector write on an SDCard. I always call flush on every sector write. If you call flush it will be written to the disk. If you don't it'll get written to the disk any time the system wants to. That will very likely lead to data loss and/or corruption. Calling flush is obviously a lot slower but it means you can pretty much count on the data actually being there when you go to read it. When writing a known quantity you can flush at the end. When streaming (in my case, debugging data and informational logs while the program is running) you can never know when the user will pull the plug. It's tempting to dream up a scheme where your monitor incoming power and try to flush immediately upon external power loss and before the 3.3v rail on your board dies. I don't think that's a viable idea for an sdcard. It will take too long and draw too much power and just isn't likely to happen. So, you probably have to plan for power failure at any moment and flush as often as you can tolerate.
 
One thing to keep in mind is that SD card write times can be very variable. Most writes will be relatively fast but some will stall for 10's or sometimes 100's of ms.

If you have data coming in at a constant rate and you want to ensure you write it all to the SD card then you need a lot of buffer space. It's not simply enough to keep up on average, you need to be able to buffer enough to not overflow when you get a worst case write time and then be able to run fast enough to catch up. You also need to be able to buffer the data that arrives during one of those long writes. Depending on the rate and which hardware interface it is using that normally means using interrupts or DMA to receive the data.

The method I've used in the past is to define a circular buffer large enough to hold ~300ms worth of data. Data is received and written into the buffer on an interrupt, that way data is still logged if the SD card takes a while. The background loop checks if there is at least 512 bytes of data waiting, if so it writes 512 bytes to the SD card in a single block.
The log filename is generated as sprintf(filename,"Log%04.log",logIndex++); every so many MB of data the log file is closed, the name re-generated and the next file opened. This has a slight performance hit but limits the amount of data lost if things die unexpectedly without the need to constantly flush or close and re-open the file. On startup it finds the lowest value of logIndex that doesn't already exists and starts from there, this slows down startup but prevents data getting overwritten.

Data should ideally be logged in a binary format. If you must log it in text format then the binary to text conversion should be in the background loop not the interrupt. This means a second buffer that stores the converted data until there is enough for a 512 byte write, if your text versions of the data don't fit exactly into 512 byte blocks then you need to make sure the buffer is large enough to handle the extra and then queue any overflow up as part of the next write block.
 
Run this example: ...\hardware\teensy\avr\libraries\SdFat\examples\bench\bench.ino

You can adjust buffer size.

Just now on 'used' cardwith data 512B Read and write as fast as 4KB and 16KB ... YMMV

See:
Code:
...
[B]// Set PRE_ALLOCATE true to pre-allocate file clusters.
const bool PRE_ALLOCATE = true;
[/B]
// Set SKIP_FIRST_LATENCY true if the first read/write to the SD can
// be avoid by writing a file header or reading the first record.
const bool SKIP_FIRST_LATENCY = true;

[B]// Size of read/write.
const size_t BUF_SIZE = 512;
[/B]...

Results at 512B:
Code:
Use a freshly formatted SD for best performance.

Type any character to start
FreeStack: 449160
Type is exFAT
Card size: 128.18 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B
OEM ID: SM
Product: GD2S5
Version: 3.0
Serial number: 0XD1679B33
Manufacturing date: 8/2015

FILE_SIZE_MB = 5
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
16833.94,55066,22,29
14244.10,53565,22,34

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
22725.82,843,22,22
22829.59,23,22,22
 
Run this example: ...\hardware\teensy\avr\libraries\SdFat\examples\bench\bench.ino

You can adjust buffer size. Just now on 'used' card with data 512B Read and write as fast as 4KB and 16KB ... YMMV

SdFat's file.write(), for both Fat and ExFat files, always writes 512 bytes at a time, so it makes sense that buffer size doesn't matter much in bench.ino. Smaller buffer sizes are almost as fast, so I think SdFat is buffering/caching and still only writing 512 bytes at a time.

SdFat example TeensySdioLogger shows how to use file.isBusy() to avoid blocking in file.write() during the long busy times. By testing file.isBusy() and always writing 512 bytes at a time, max latency in file.write() can be reduced from ~40 ms to ~5 us. This does not increase the total data rate, but it does provide a way to minimize time blocked in file.write(). As @AndyA and @CollinK mention above, for data logging at some sustained rate, you must have a data buffer large enough for however much data is gathered during the longest busy time of ~40 ms. I've had good success with allocating 50KB of buffer for space for each 1 MB/s of data rate. The highest rate I've been able to sustain is 8 MB/s (400 KB buffer) due to RAM limit.
 
SdFat's file.write(), for both Fat and ExFat files, always writes 512 bytes at a time, so it makes sense that buffer size doesn't matter much in bench.ino. Smaller buffer sizes are almost as fast, so I think SdFat is buffering/caching and still only writing 512 bytes at a time.

...

Thanks - almost deleted before posting when the numbers were so similar. There was a sketch once that showed better improvements up to 16KB - and lesser below that - maybe it was the SD Card in use ... so that is YMMV was added

That sketch also does:
Code:
// Set PRE_ALLOCATE true to pre-allocate file clusters.
const bool PRE_ALLOCATE = true;
 
There was a sketch once that showed better improvements up to 16KB - and lesser below that - maybe it was the SD Card in use ... so that is YMMV was added

That sketch also does:
Code:
// Set PRE_ALLOCATE true to pre-allocate file clusters.
const bool PRE_ALLOCATE = true;

Both bench.ino and TeensySdioLogger.ino do preAllocate(). SdFat has evolved/improved a great deal, so learned wisdom from a few years ago may need an update. Bill is up to 2.2.1 now, with more big changes/improvements since version 2.1.2 that is in TD now, so I look forward to the next update.
 
Here's what I got by changing bench.ino to loop through various buffer sizes. The results vary from run to run, so think of this as 20-21 MB/s write, all the way down to 8-byte writes, and 22-23 MB/s read, declining below 512. The max latencies vary from run-to-run, with ~40 ms showing up at any buf size. Note that for buffers < 512 the minimum times can be 0. This reflects fast read/write from the SdFat buffer, as opposed to buffers >= 512, which always require a physical transfer.
Code:
FreeStack: 431272
Type is FAT32
Card size: 31.91 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SD32G
Version: 8.5
Serial number: 0X6919A91C
Manufacturing date: 10/2014

FILE_SIZE_MB = 10
bufsize in bytes, speeds in KB/s, max/min/avg in usec
buf_size   write     max     min     avg    read     max     min     avg
--------   -----   -----   -----   -----   -----   -----   -----   -----
   16384   21487   14599     717     761   23046     711     710     710
    8192   20846   19987     358     391   23046     356     355     355
    4096   19636   38315     179     208   23046     178     177     177
    2048   21269   16456      89      96   22945      90      88      89
    1024   21098   19420      44      48   22895      45      44      44
     512   19747   40493      22      25   22795      23      22      22
     256   20972   20379       0      12   22029      23       0      11
     128   21226   12458       0       5   21487      23       0       5
      64   20088   40703       0       3   20440      23       0       3
      32   21269   17155       0       1   18526      23       0       1
      16   20888   18568       0       0   15863      23       0       0
       8   19973   15635       0       0   12039      23       0       0
       4   12179   14554       0       0    8362      23       0       0
       2    6294   22398       0       0    5010      23       0       0
       1    3348   40823       0       0    2860      23       0       0
 
Last edited:
The previous post was for T4.1. Here is the same data, with the same card, for T3.5
Code:
FILE_SIZE_MB = 10
bufsize in bytes, speeds in KB/s, max/min/avg in usec
buf_size   write     max     min     avg    read     max     min     avg
--------   -----   -----   -----   -----   -----   -----   -----   -----
   16384   16565   18006     931     987   16644     985     982     983
    8192   16461   22573     464     495   16591     495     492     493
    4096   15044   41825     231     270   16487     250     247     247
    2048   16565   22187     114     122   16257     127     125     125
    1024   16750   15451      56      60   15863      66      63      63
     512   16333   16146      27      30   15330      35      32      32
     256   16487   16788       5      14   11848      39       5      20
     128   15087   19658       3       7   10200      38       4      11
      64   10968   15074       3       5    7968      37       3       7
      32    7277   13681       2       3    5693      36       3       4
      16    4235   18147       2       3    3571      36       2       3
       8    2252   40473       2       3    2003      36       2       3
       4    1215   16157       2       2    1079      36       2       3
       2     603   13607       2       2     551      36       2       3
       1     308   36546       2       2     280      36       2       2
 
It should be interesting to see how the Teensy Micromod compares with the T4.1 and T3.5. The MM uses an SPI interface rather than the 4-bit dedicated SDIO interface when connected to the Sparkfun Datalogging CarrierBoard. I expect it will be much slower than Teensies that use the SDIO interface. Post #5 says that the MicroMod has an SDIO interface, which it does. However the pins for the SDIO interface are not used on the Datalogging CarrierBoard. I suspect that the SPI interface to the SD card is used to maintain compatibility with other MicroMod processor boards that do not have a full SDIO implementation.
 
Back
Top