Fast Data Logger

Good info, thanks. So if the card is fully formatted, will the block erase be skipped? I should test that.

What oceanographic company? I used to work at SeaLink which got bought up by EG&G. We mostly did releases.
 
I ran another test at 4.5Msmp/sec but I first did a full format with the SD Card organization's format program. Not encouraging (I didn't 'pretty' this one up). This seems to indicate a large latency could happen early in a logging session, so even short runs may not be immune.

1770578043061.png
 
So, 7 MB at 4.5 MB/s means over 1.5 second? I never see anything like that. I ran my program at 10 MB/s for 30’min and stored 18 GB with max busy time 38 ms and max buffer used below 400 KB.
 
Oops - I should have expanded that. It’s 4.5 mega-samples per second with each sample being 4 bytes- so 18 Mbytes/sec.
 
They axis is bytes of buffer needed- like in the other graphs, but not labeled here. The x axis is samples / 8100. Had to bin the data since Excel only allows 1e6 rows.
 
Last edited:
But the Y axis goes up to 8 MB. How can you need that much buffer? Should it be 800,000 instead of 8,000,000?
And that is the problem. It's saying that, in order to accommodate these rare delays, one would need 8Mbytes of buffer.

I just ran bench.ino with a setting of 15000 Mbytes to ensure we would hit at least one of these delays:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
22084.16,69634,22,23 ~ 1,500,000 bytes of latency (22MB * .069 secs).
22062.88,153961,22,23 ~ 3,400,000 bytes

When run with the default 50Mbytes the latencies are much smaller

KB/Sec,usec,usec,usec
22094.51,7621,22,23 ~ 167,000 bytes
21968.31,9686,22,23 ~ 213,092 bytes

One must run a a long test to see this problem.
 
And that is the problem. It's saying that, in order to accommodate these rare delays, one would need 8Mbytes of buffer.

I just ran bench.ino with a setting of 15000 Mbytes to ensure we would hit at least one of these delays:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
22084.16,69634,22,23 ~ 1,500,000 bytes of latency (22MB * .069 secs).
22062.88,153961,22,23 ~ 3,400,000 bytes

When run with the default 50Mbytes the latencies are much smaller

KB/Sec,usec,usec,usec
22094.51,7621,22,23 ~ 167,000 bytes
21968.31,9686,22,23 ~ 213,092 bytes

One must run a a long test to see this problem.
I’ve done many long tests, and I don’t see anything like what you’re saying. My results say you can do it with 800K buffer
 
I believe you said you were running at 10MB/sec? I ran at 6 with no problems, but failed at 14. I’d love to be wrong here.
If you run a card using bench.ino for 15000Mb do you get the long latencies? I think I understand what that means - that no data is being stored during that time, which implies a buffer = time * sample rate.
Perhaps you are handling the SDfat interface better than I am- I modeled my code after bench.ino and it seemed to be fast, but perhaps there are some tricks I don’t know.
 
I believe you said you were running at 10MB/sec? I ran at 6 with no problems, but failed at 14. I’d love to be wrong here.
If you run a card using bench.ino for 15000Mb do you get the long latencies? I think I understand what that means - that no data is being stored during that time, which implies a buffer = time * sample rate.
Perhaps you are handling the SDfat interface better than I am- I modeled my code after bench.ino and it seemed to be fast, but perhaps there are some tricks I don’t know.
I've been doing all of my testing with the sketch that I posted to this thread. This morning I made an update to support file sizes > 32 bits, and I'll post that. It actually buffers data and keeps track of max buffer usage, so I'm quite sure of results. The run I did this morning was 1800 seconds @ 10 MB/s = 18 GB, and the max busy time was 38 ms with max buffer < 400K. Did you make updates to bench.ino to support file sizes > 32 bits?
 
I appreciate your persistence in helping me figure this out.
What changes did you make for file size > 32bits? I did make the file size variable 64 bits. Maybe I’ve missed something.
 
(A very) long time ago, on a Rabbit 3000 processor, I removed the FAT layer to increase write speed. No big gains, but it worked. At that time, not much RAM for buffers, no DMA, 8 bits Z80 CPU at 54MHz, ....
And then, I discovered that writing a file in a directory was what caused low speeds, hickups and delays. Speed of writing the file at the root of the file system was close to the low level write speed.
 
Last edited:
Ha! I remember the Rabbit - nice processor as I recall.
I did the same on the data logger I designed 20-some years ago. The FAT overhead was too much, plus, I think, it still had a 2^32 (or 31?) max file size. I reserved a couple of sectors to store metadata , then just did sector writes for the data. This was on SCSI drives.
Might be worth looking into that for this. The SD seemed fast enough, even with the FAT overhead. And it might be (unless I have a coding error) that the long delay is inherent in the media - possibly due to sector erase times, which would not be due to FAT.
Although FAT is convenient , I’d love to just have straight out, 8-32 bit wide, low level access storage. But, the SDCard wins on size and weight. Being removable and non-volatile are bonuses.
 
Early FAT libraries could be bad. I remember one that scanned the FAT from the beginning when it needed to allocate a new cluster. Twice.

For speed I do not use any file system. There are still periodic delays.

As for delays to erase, once a long time ago I ran across information on how Sandisk handled this. Sectors would be in groups that had spare sectors. The extras would be erased. When you wrote data, it went to one of those erased sectors. The replaced sector would then get erased and added to the pool of erased sectors.

It seems likely that it is handled differently now.

Another detail is ACMD23: "Setting a number of write blocks to be pre-erased (ACMD23) will make a following Multiple Block Write
operation faster compared to the same operation without preceding ACMD23."
 
All good info. May have to get into the guts of this.

I did a run with a fully formatted card. Still had the delays, which, if due to sector erases, means the card doesn't check to see if a sector is pre-erased. I get it- most people will not fully format a drive (takes an hour), and it's safer, and easier, to assume it needs erasing.
 
I really doubt the formatting program trims (the technical term for telling the card a sector is unused by the filesystem) all the sectors. Doing so requires low-level access to the SD card registers which most operating systems simply aren't likely to provide. In other words the Teensy can do it, but don't expect a desktop PC program to "pre-erase" any part of a card.
 
Here is my sketch modified to allow file size > 32 bits. My card is 128 GB Sandisk Extreme Plus, and I use SdFat example SDFormatter to do the formatting (option 'F').

Last night I did a run of 2 hours @ 10 MB/s = 72 GB, and still got max busy delay of 40 ms and did not overflow the buffer. Max buffer usage was 424960 out of 436906, so I'm confident that with about 720K, it could run for one hour @ 18 MB/s. This test had the buffer statically allocated in RAM1 (TCM). I plan to try two more things. Buffer of ~500K in RAM2, which I think will support 12 MB/s, and then try using on-the-fly compression. A compression ratio of 1.5x on 12 MB/s would achieve 18 MB/s.

Would you mind doing the following:
  • run SdFat example SDFormatter and do an erase/format?
  • run the program below with no changes (other than duration) and report your results?
  • post a not-too-big file with some real data?
Code:
Feb  8 2026  21:53:22
Teensy 4.1
Teensyduino version 160
SdFat version 2.1.2
Type any character to begin
Log for 7200 seconds at 10.00 MB/s (1024 bytes per interrupt)
Pre-allocated file 75497472000 bytes
RingBuf 436906 bytes
Start dataTimer (period = 97 us)
...
Stop dataTimer
147455999 writes in 7151.616 s (687.624 s writing to file = 9.615 %)
File is full
fileSize     = 75497471488 before sync()
rb.bytesUsed =        512 before sync()
fileSize     = 75497472000 after sync()
rb.bytesUsed =          0 after sync()
rbMaxUsed    =     424960
avg write us =       4.66
min write us =       4.58
max write us =       7.74
avg busy  us =      18.26
min busy  us =      15.21
max busy  us =   40270.10
file.close()


Code:
// Test Teensy SDIO with write busy in a data logger demo.
//
// The driver writes to the uSDHC controller's FIFO then returns while the
// controller writes the data to the SD.  The first sector puts the controller
// in write mode and takes about 11 usec on a Teensy 4.1. About 5 usec is
// required to write a sector when the controller is in write mode.

#include <SdFat.h>
#include <RingBuf.h>

//******************************************************************************
// global variables
//******************************************************************************
#define SD_CONFIG    (SdioConfig(FIFO_SDIO))        // use Teensy SDIO
#define LOGGING_TIME_S    (480ULL)            // s
#define DATA_RATE_BPS    ((10)*(1024*1024))        // bytes/s
#define LOG_FILE_SIZE    (DATA_RATE_BPS * LOGGING_TIME_S)// total bytes
#define RING_BUF_SIZE    (DATA_RATE_BPS / 24)        // 42-ms buffer at BPS
#define BUFLEN        (1024/*256*/)            // bytes per write
#define C2US(c)        ((c)*(1E6/F_CPU))        // CPU cycles to us

//******************************************************************************
// global variables
//******************************************************************************
IntervalTimer dataTimer;        // IntervalTimer for ISR-level writes
RingBuf<FsFile,RING_BUF_SIZE> rb;       // ISR --> RingBuf --> loop --> SD file
SdFs     sd;                // SdFat type
FsFile   file;                // SdFat file type
size_t   rbMaxUsed = 0;            // RingBuf max bytes (useful diagnostic)
char     buf[BUFLEN];            // test buffer
uint32_t error;                // RingBuf/file error code

//******************************************************************************
// IntervalTimer callback -- write BUFLEN bytes to RingBuf
//******************************************************************************
void dataTimerCallback( void )
{
#if (SD_FAT_VERSION == 20102)        // #if SdFat 2.1.1
  rb.memcpyIn( buf, BUFLEN );        //   write to RingBuf via rb.memcpyIn()
#elif (SD_FAT_VERSION >= 20202)        // #elif SdFat >= 2.2.0
  rb.beginISR();            //   begin interrupt access
  rb.write( buf, BUFLEN );        //   write to RingBuf via rb.write()
  rb.endISR();                //   end interrupt access
#endif                    // #endif
  if (rb.getWriteError())        // if write error occurred
    error = 1;                //   set global error code
} 

//******************************************************************************
// setup()
//******************************************************************************
void setup()
{
  Serial.begin(9600);
  while (!Serial && millis() < 3000) {}

  Serial.printf( "%s  %s\n", __DATE__, __TIME__ );
  Serial.printf( "Teensy %s\n",
#if defined(ARDUINO_TEENSY35)
  "3.5" );
#elif defined(ARDUINO_TEENSY41)
  "4.1" );
#endif
  Serial.printf( "Teensyduino version %1lu\n", TEENSYDUINO );
  Serial.printf( "SdFat version %s\n", SD_FAT_VERSION_STR );

  // Initialize the SD.
  if (!sd.begin(SD_CONFIG)) {
    sd.initErrorHalt(&Serial);
  }
 
  // these 2 lines are necessary to enable cycle counting on T3.x
  #if (defined(KINETISL) || defined(KINETISK))        // if Teensy LC or 3.x
  ARM_DEMCR    |= ARM_DEMCR_TRCENA;            //   enable debug/trace
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;        //   enable cycle counter
  #endif
}

//******************************************************************************
// loop()  open file, preAllocate, init RingBuf, log data, print results/stats
//******************************************************************************
void loop()
{
  while (Serial.available()) { Serial.read(); }
  Serial.println( "Type any character to begin" );
  while (!Serial.available()) {}
 
  Serial.printf( "Log for %1lu seconds at %1.2f MB/s (%1lu bytes per interrupt)\n",
        LOGGING_TIME_S, (float)(DATA_RATE_BPS/(1024*1024.0)), BUFLEN );
  Serial.printf( "Pre-allocated file %1llu bytes\n", LOG_FILE_SIZE );
  Serial.printf( "RingBuf %1lu bytes\n", RING_BUF_SIZE );

  // Open or create file - truncate existing file.
  if (!file.open( "logfile.txt", O_RDWR | O_CREAT | O_TRUNC )) {
    Serial.println( "open failed\n" );
  }
  // File must be pre-allocated to avoid huge delays searching for free clusters.
  else if (!file.preAllocate( LOG_FILE_SIZE )) {
     Serial.println( "preAllocate failed\n" );
     file.close();
  }
  // init the file and RingBuf
  else {
    rb.begin(&file);
  }
 
  // Init data buffer with random data
  randomSeed( micros() );
  for (int i=0; i<BUFLEN; i++)
    buf[i] = 0x30+random( 10 );
  buf[BUFLEN-1] = '\n';
 
  uint32_t timer_period_us = 1E6 * BUFLEN / DATA_RATE_BPS;
  Serial.printf( "Start dataTimer (period = %1lu us)\n", timer_period_us );
  dataTimer.begin( dataTimerCallback, timer_period_us );

  uint32_t count = 0, start_ms = millis(), start_busy = 0;
  bool busy = false;
  error = 0;
  rbMaxUsed = 0;
  uint64_t sum_busy=0, sum_write=0;
  uint32_t min_busy=0xFFFFFFFF, max_busy=0;
  uint32_t min_write=0xFFFFFFFF, max_write=0;
  uint32_t seconds = 0;
  elapsedMillis ms = 0;
  while (error == 0 && millis() - start_ms < LOGGING_TIME_S*1000) {
      
    if (ms >= 1000) {
      ms -= 1000;
      if ((seconds%60)==0)
        Serial.printf( "min %03lu ", seconds/60 );
      seconds++;
      Serial.print( "." );
      if ((seconds%60)==0)
        Serial.println();
    }
    
    // number of bytes in RingBuf
    size_t n = rb.bytesUsed();
    if (n > rbMaxUsed) {
      rbMaxUsed = n;
    }

    // bytes in RingBuf now will fit, but any more will exceed file size
    if ((n + file.curPosition()) > (LOG_FILE_SIZE - BUFLEN/*_MAX*/)) {
      error = 2; // file full
    }

    // write one sector (512 bytes) from RingBuf to file
    // Not busy only allows one sector before possible busy wait
    if (file.isBusy()) {
      if (!busy) {
        busy = true;
        start_busy = ARM_DWT_CYCCNT;
      }
    }
    else if (busy) {
      busy = false;
      uint32_t busy_cyc = ARM_DWT_CYCCNT - start_busy;
      sum_busy += busy_cyc;
      if (busy_cyc < min_busy) min_busy = busy_cyc;
      if (busy_cyc > max_busy) max_busy = busy_cyc;
    }
    if (n >= 512 && !busy) {
      uint32_t start_write = ARM_DWT_CYCCNT;
      if (512 != rb.writeOut(512)) {
        error = 1; // write error
      }
      uint32_t write_cyc = ARM_DWT_CYCCNT - start_write;
      sum_write += write_cyc;
      if (count > 0) {
        if (write_cyc < min_write) min_write = write_cyc;
        if (write_cyc > max_write) max_write = write_cyc;
      }
      count++;
    }
  }
 
  uint32_t duration_ms = millis() - start_ms;
  Serial.printf( "\nStop dataTimer\n" );
  dataTimer.end();
 
  double duration_s = duration_ms/1000.0;
  double write_s = C2US(sum_write)/1E6;
  double write_percent = 100*(write_s/duration_s);
  Serial.printf( "%1lu writes in %1.3lf s (%1.3lf s writing to file = %1.3lf %c)\n",
        count, duration_s, write_s, write_percent, '%' );
  switch (error) {
    case 0:   Serial.printf( "No error\n" );            break;
    case 1:   Serial.printf( "Not enough space in RingBuf\n" );    break;
    case 2:   Serial.printf( "File is full\n" );        break;
    case 3:   Serial.printf( "Write from RingBuf failed" );    break;
    default:  Serial.printf( "Undefined error %1lu\n", error );    break;
  }

  // write any remaining RingBuf data to file
  Serial.printf( "fileSize     = %10llu before sync()\n", (uint64_t)file.fileSize() );
  Serial.printf( "rb.bytesUsed = %10llu before sync()\n", (uint64_t)rb.bytesUsed() );
  rb.sync();
 
  // file and buffer stats
  Serial.printf( "fileSize     = %10llu after sync()\n", (uint64_t)file.fileSize() );
  Serial.printf( "rb.bytesUsed = %10llu after sync()\n", (uint64_t)rb.bytesUsed() );
  Serial.printf( "rbMaxUsed    = %10llu\n", (uint64_t)rbMaxUsed );
  Serial.printf( "avg write us = %10.2lf\n", C2US(sum_write)/count );
  Serial.printf( "min write us = %10.2lf\n", C2US(min_write) );
  Serial.printf( "max write us = %10.2lf\n", C2US(max_write) );
  Serial.printf( "avg busy  us = %10.2lf\n", C2US(sum_busy)/count );
  Serial.printf( "min busy  us = %10.2lf\n", C2US(min_busy) );
  Serial.printf( "max busy  us = %10.2lf\n", C2US(max_busy) );
 
  // print first N line(s) of file.
  int lines=0;
  if (lines > 0) {
    Serial.printf( "First %1d line(s) of file\n", lines );
    file.truncate();
    file.rewind();
  }
  for (int n=0; n < lines && file.available();) {
    int c = file.read();
    if (c < 0) break;
    Serial.write(c);
    if (c == '\n') n++;
  }
 
  // close file
  file.close();
  Serial.printf( "file.close()\n\n" );
}
 
Thanks - will do.

And to jmarsh:
I use the formatter provided by the SD Organization. it takes an hour fro a 256GB drive, so I hope it is doing some deep level cleanup. I can't access that part of their site right now so I can't check on details.
 
I really doubt the formatting program trims (the technical term for telling the card a sector is unused by the filesystem) all the sectors. Doing so requires low-level access to the SD card registers which most operating systems simply aren't likely to provide. In other words the Teensy can do it, but don't expect a desktop PC program to "pre-erase" any part of a card.
From what I remember, the PC file system only sees sectors from 0 to LAST. It never handle replacement sectors due to wear leveling or defective ones. It is all transparent to the PC. Same for HDD and Flash drive. The device controller handles this. When the disk becomes slower and slower, it is time to change it before final crash. Defective sectors list and tables are accessible with specialized software which connects at low level.
 
ok - this is weird. I ran your code with a fully formatted card. I meant to change:
#define LOGGING_TIME_S (480ULL)
to 1800 to do 1/2 hour, but I mistakenly used 180 (3 minutes), but here is what I got - max busy us = 158505.48. Full output below:
Something is weird.

Feb 9 2026 10:20:51
Teensy 4.1
Teensyduino version 159
SdFat version 2.1.2
Type any character to begin
Log for 180 seconds at 10.00 MB/s (1024 bytes per interrupt)
Pre-allocated file 1887436800 bytes
RingBuf 436906 bytes
Start dataTimer (period = 97 us)
min 000 ............................................................
min 001 ............................................................
min 002 ...........................................................
Stop dataTimer
3686397 writes in 179.140 s (17.214 s writing to file = 9.609 %)
File is full
fileSize = 1887435264 before sync()
rb.bytesUsed = 682 before sync()
fileSize = 1887435946 after sync()
rb.bytesUsed = 0 after sync()
rbMaxUsed = 436906
avg write us = 4.67
min write us = 4.58
max write us = 6.99
avg busy us = 18.26
min busy us = 15.23
max busy us = 158505.48
file.close()
 
Strange. I get max busy time of ~40 ms with both my 32 GB Sandisk Ultra and 128 GB Extreme Plus.

EDIT: I'm using default T4.1 build settings "Faster" and 600 MHz. I'm using TD 1.60b5, but I only recently updated from 1.59 and I've always gotten max 40 ms with 1.59, too.
  • Is your hardware a standard T4.1 soldered or socketed to a custom PCB?
  • When you say fully formatted, do you mean via SDFormatter, or some other method?
  • Can you run SDFormatter and SdInfo and post their output regarding your card?
 
"It's not what you don't know, it's what you know that ain't so"

Well, this is embarrassing. I went to run SDInfo and it would not compile - there were several missing definitions. But I had updated the SDfat library and I pulled SDinfo.ino from that library. Then , looking at the verbose compile info, I realized it was using source from the Teensy install, not the Arduino/libraries. I pulled that out of the Teensy library directory and recompiled SDinfo successfully. Then ran your program for 30 minutes and got:

max busy us = 19548.53

I'll try it on my board next, but this all makes sense.
 
From what I remember, the PC file system only sees sectors from 0 to LAST. It never handle replacement sectors due to wear leveling or defective ones. It is all transparent to the PC. Same for HDD and Flash drive. The device controller handles this. When the disk becomes slower and slower, it is time to change it before final crash. Defective sectors list and tables are accessible with specialized software which connects at low level.
I'm aware of how a flash translation layer works.
But I can practically guarantee you the software is not "connecting to the card at a low level".
If you plug a USB SD card reader into a PC, the PC doesn't even know that it's an SD card - it looks just like any other mass storage (i.e. block based) device and there's no way for the PC to issue SD specific commands. So all the formatter is doing is filling sectors with 0xFF bytes (or zeroes) rather than trimming them.
 
max busy us = 19548.53

That's good news. Just to be clear, I'm using the Teensy-specific version of SdFat. You said previously that you managed to create a 950K buffer across RAM1 and RAM2. If your max busy stays below 40 ms, you should be able to get to 18 MB/s, or maybe even 20.
 
Back
Top