SdFs - a New SD Library for FAT16/FAT32/exFAT

Status
Not open for further replies.

Bill Greiman

Well-known member
I have posted an early version of a new SD library that supports FAT16/FAT32/exFAT on SD/SDHC/SDXC cards on GitHub.

I am calling it SdFs during development and it is located here.

I have not decided whether it will replace SdFat or remain a separate library. The goals of providing full options on an Uno while achieving maximum performance on ARM boards has resulted in more complexity than I like.

Please try the SdFs examples. All SdFs classes are documented in Doxygen html located in SdFs/doc.

You will certainly find bugs. I have been using Arduino 1.8.3 for development. I have done minimal testing with Arduino Uno, Due, Zero, Teensy 3.6 and eBay STM32F103C boards.

I would appreciate your comments.
 
I have posted an early version of a new SD library that supports FAT16/FAT32/exFAT on SD/SDHC/SDXC cards on GitHub.

I am calling it SdFs during development and it is located here.

I have not decided whether it will replace SdFat or remain a separate library. The goals of providing full options on an Uno while achieving maximum performance on ARM boards has resulted in more complexity than I like.

Please try the SdFs examples. All SdFs classes are documented in Doxygen html located in SdFs/doc.

You will certainly find bugs. I have been using Arduino 1.8.3 for development. I have done minimal testing with Arduino Uno, Due, Zero, Teensy 3.6 and eBay STM32F103C boards.

I would appreciate your comments.

Thanks, will try it
 
I updated GitHub with a few changes to improve compatibility with the Teensy version of SD.h.

I now recognize the symbol BUILTIN_SDCARD for Teensy 3.5/3.6. My goal is to make conversion from SD.h to SdFs.h as simple as possible. I ran two old tests of SD.h with this change.

Code:
#define USE_SDFS 
#ifdef USE_SDFS
#include "SdFs.h"
SdFat SD;
#else  // USE_SDFS
#include <SD.h>
#endif  // USE_SDFS

The first test is a simple binary write test, see the attached benchSD.ino file.

Here are results of 100 bytes writes on Teensy 3.6 using the standard SD.h library.
Code:
File size 5MB
Buffer size 100 bytes
Starting write test.  Please wait up to a minute
Write 487.61 KB/sec
Maximum latency: 18948 usec, Minimum Latency: 4 usec, Avg Latency: 204 usec

Starting read test.  Please wait up to a minute
Read 1803.10 KB/sec
Maximum latency: 1388 usec, Minimum Latency: 4 usec, Avg Latency: 55 usec
Here are results with SdFs.h
Code:
File size 5MB
Buffer size 100 bytes
Starting write test.  Please wait up to a minute
Write 12987.01 KB/sec
Maximum latency: 7513 usec, Minimum Latency: 1 usec, Avg Latency: 7 usec

Starting read test.  Please wait up to a minute
Read 13586.96 KB/sec
Maximum latency: 986 usec, Minimum Latency: 1 usec, Avg Latency: 7 usec

I also ran a simple print test, see the attached PrintBenchmarkSD.ino file. Most of the CPU time for print is in formatting but there is still some improvement.

Here are the results with the standard SD.h library.
Code:
Test of println(uint16_t)
Time 0.33 sec
File size 128.89 KB
Write 395.37 KB/sec
Maximum latency: 5609 usec, Minimum Latency: 2 usec, Avg Latency: 15 usec

Test of println(double)
Time 0.57 sec
File size 149.00 KB
Write 262.32 KB/sec
Maximum latency: 5670 usec, Minimum Latency: 10 usec, Avg Latency: 27 usec

Here are results for SdFs.h.
Code:
Test of println(uint16_t)
Time 0.19 sec
File size 128.89 KB
Write 671.30 KB/sec
Maximum latency: 749 usec, Minimum Latency: 4 usec, Avg Latency: 9 usec

Test of println(double)
Time 0.37 sec
File size 149.00 KB
Write 408.22 KB/sec
Maximum latency: 786 usec, Minimum Latency: 14 usec, Avg Latency: 17 usec

The gain for print is small but SdFs.h gives a large improvement for binary I/O.

Another advantage is that you can add SDXC/exFAT support with this change.
Code:
#define USE_SDFS 
#ifdef USE_SDFS
#include "SdFs.h"
SdFs SD;
FsFile file;
#else  // USE_SDFS
#include <SD.h>
File file;
#endif  // USE_SDFS

exFAT can have one file that fills the entire SD. You can pre-allocate space for a file and trim unused space when you close the file. You can have cluster sizes up to 32MiB.
 

Attachments

  • benchSD.ino
    3.6 KB · Views: 338
  • PrintBenchmarkSD.ino
    2.8 KB · Views: 304
Last edited:
benchSD runs OK, but PrintBenchmarkSD (and SDFormatter) gives me error

Arduino: 1.8.1 (Windows 7), TD: 1.36, Board: "Teensy 3.6, Serial, 180 MHz, Fast, US English"

In file included from C:\Users\jh01497\Documents\Arduino\libraries\SdFs-master\src/SdCard/SdSpiCard.h:30:0,

from C:\Users\jh01497\Documents\Arduino\libraries\SdFs-master\src/SdCard/SdCard.h:23,

from C:\Users\jh01497\Documents\Arduino\libraries\SdFs-master\src/SdFs.h:27,

from C:\Users\jh01497\Documents\Arduino\libraries\SdFs-master\examples\SdFormatter\SdFormatter.ino:13:

c:\users\jh01497\documents\arduino\libraries\sdfs-master\src\spidriver\sdspidriver.h: In member function 'void SdSpiLibDriver::begin(SdSpiConfig)':

c:\users\jh01497\documents\arduino\libraries\sdfs-master\src\spidriver\sdspidriver.h:118:13: error: cannot convert 'SPI1Class*' to 'SPIClass*' in assignment

m_spi = &SDCARD_SPI;

^

Error compiling for board Teensy 3.6.
 
Thanks, will try it

my logger program works fine with SdFs with DMA_SDIO.
O,K at 2.4 MB/s Ram in Teensy 3.6 is not big enough to handle all possible latencies, so from time to time I have a buffer overun, but that is not a problem of the FS.
adaption was very easy.
 
benchSD runs OK, but PrintBenchmarkSD (and SDFormatter) gives me error

Looks like you are using an older version of Teensyduino. I am using Teensyduino, Version 1.37 with Arduino IDE 1.8.3.

I don't understand why benchSD works while other examples fail. I tried an older version of Teensyduino and all fail.

I did a search of 1.8.7 and older versions and found SPI and SPI1 have different types in older versions.
In 1.8.7:
Code:
SPIClass SPI((uintptr_t)&KINETISK_SPI0, (uintptr_t)&SPIClass::spi0_hardware);
SPIClass SPI1((uintptr_t)&KINETISK_SPI1, (uintptr_t)&SPIClass::spi1_hardware);
SPIClass SPI2((uintptr_t)&KINETISK_SPI2, (uintptr_t)&SPIClass::spi2_hardware);

In 1.8.6, ouch:
Code:
SPIClass SPI;
SPI1Class SPI1;

I use SPI1 to provide SPI access to the Teensy 3.5/3.6 built-in SD. You get more consistent write latencies with SPI so even though SDIO is faster, you can build simple data loggers that get fewer overruns with SPI.

You can make examples that don't use SPI on the built-in card work by editing SdFs/src/FsConfig.h and commenting out this at about line 180.
Code:
//#define SDCARD_SPI      SPI1

I may not do a fix since I really need other mods that will be in the current beta version of the Teensyduino SPI library.

I may disable SPI on the built-in card for old versions of Teensyduino.
 
I have posted an early version of a new SD library that supports FAT16/FAT32/exFAT on SD/SDHC/SDXC cards on GitHub.

Hi Bill, you may want to edit the enableGPIO (https://github.com/greiman/SdFs/blob/master/src/SdCard/SdioTeensy.cpp#L252) a bit.
It disables the pins completely (sets "0" (high-z floating)) , but the interface needs some pullups. I had some problems with this behavior in Pauls`library, the card-init did sometimes not work (~1 of 10 times - sometimes more often, sometimes less). Since it is wrong to disable the pins anyway and switching to "input-pullup" instead does not hurt, I'd change it.
 
Last edited:
Hi Bill, you way want to edit the enableGPIO (https://github.com/greiman/SdFs/blob/master/src/SdCard/SdioTeensy.cpp#L252) a bit.
It disables the pins completely (sets "0" (high-z floating)) , but the interface needs some pullups. I had some problems with this behavior in Pauls`library, the card-init did sometimes not work (~1 of 10 times - sometimes more often, sometimes less). Since it is wrong to disable the pins anyway and switching to "input-pull" instead does not hurt, I'd change it.

Sounds like a good idea. SD cards can lock-up with noise. If they ever go into SPI mode you must cycle power to use SDIO. To put a card into SPI mode, you send CMD0 with CS/DAT3 low. After that you must have CS low to send a command so you are stuck. Same is true for some other modes, You must cycle power to leave the mode.

I looked at "12.5.1 Pin Control Register n (PORTx_PCRn)" in the datasheet. I plan to do this:
Code:
static void enableGPIO(bool enable) {
  const uint32_t PORT_PUP = PORT_PCR_PE | PORT_PCR_PS;
  const uint32_t PORT_CLK = PORT_PCR_MUX(4) | PORT_PCR_DSE;
  const uint32_t PORT_CMD_DATA = PORT_CLK | PORT_PUP;

  PORTE_PCR0 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D1
  PORTE_PCR1 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D0
  PORTE_PCR2 = enable ? PORT_CLK      : PORT_PUP;  // SDHC_CLK
  PORTE_PCR3 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_CMD
  PORTE_PCR4 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D3
  PORTE_PCR5 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D2
}

Is this OK?
 
Looks good! - but I'm not sure if a adding PORT_PCR_MUX(1) would'nt be better ? ( MUX(0) is ADC or disabled)
 
Just been checking this out but something I use which seems to be missing, and without any mention within the source, is chdir(). Had a look through the various headers but could find no mention/method of how to set the current working directory.
Also; I noticed FsFile::seekcur() as opposed to seekCur(), not sure if this is intended or not.
 
Just been checking this out but something I use which seems to be missing, and without any mention within the source, is chdir(). Had a look through the various headers but could find no mention/method of how to set the current working directory.
Also; I noticed FsFile::seekcur() as opposed to seekCur(), not sure if this is intended or not.

Thanks for finding the typo in seekCur(), I will fix it today.

chdir() is missing because my first version had a design problem, not just a bug so I deleted it. I think I have an idea how to fix it and it is my highest priority. I appreciate knowing it is used.

My to-do list has a few other high priority items. exFAT is suffering performance problems. I need to add a separate cache for the bitmap and maybe one for the FAT.

The bitmap cache buys lots of performance when writes are not on sector boundaries and space has not been pre-allocated. The bitmap cache allows 4096 clusters per sector to be allocated without access to the SD. Clusters can be as big as 32MiB. Even with 128KiB clusters, it represents 512MiB.
 
Looks good! - but I'm not sure if a adding PORT_PCR_MUX(1) would'nt be better ? ( MUX(0) is ADC or disabled)

I decided to use MUX(1).

Code:
static void enableGPIO(bool enable) {
  const uint32_t PORT_CLK = PORT_PCR_MUX(4) | PORT_PCR_DSE;
  const uint32_t PORT_CMD_DATA = PORT_CLK   | PORT_PCR_PE | PORT_PCR_PS;
  const uint32_t PORT_PUP = PORT_PCR_MUX(1) | PORT_PCR_PE | PORT_PCR_PS;
  
  PORTE_PCR0 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D1
  PORTE_PCR1 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D0
  PORTE_PCR2 = enable ? PORT_CLK      : PORT_PUP;  // SDHC_CLK
  PORTE_PCR3 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_CMD
  PORTE_PCR4 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D3
  PORTE_PCR5 = enable ? PORT_CMD_DATA : PORT_PUP;  // SDHC_D2
}

Also added a fix for mkdir() and updated GitHub. I have chdir() implemented but need to do more testing.
 
I added a dedicated cache for the exFAT allocation bitmap. It is enabled for ARM processors. I also think I improved the exFAT search algorithm for free clusters.

These are dangerous areas since a bug will cause major file system corruption.

If you need performance with exFAT, pre-allocation of clusters to make a contiguous file is always best. Allocation with the bitmap is fast and you can trim excess space when you close the file.

With exFAT, files are either contiguous and don't use the FAT or non-contiguous and use the FAT. If the SD is fragmented, a FAT chain must be created when fragmentation causes a file to be non-contiguous. This can cause a long latency while the FAT chain is constructed.

I fixed the relative path problem and added an example, DirectoryFunctions.ino, to illustrate use of chdir(), mkdir(), rmdir(), and open() with relative paths.
 
Here are some test results that show advantages of using exFAT.

The huge variation in write times for FAT16/FAT32 make writing simple data loggers that run at over about 5-10 samples per second difficult.

exFAT can improve the situation a great deal by pre-allocating a file. You can then check if an SD is busy to insure one sector will be written in less than a known time.

Here is a loop from a test program that I attached.

Code:
while (file.size() < FILE_SIZE) {
   if (waitBusy) {
     while (sd.card()->isBusy()) {}
   }
   uint32_t m = micros();
   if (file.write(buf, sizeof(buf)) != sizeof(buf)) {
     error("write failed");
   }
   m = micros() - m;
   if (m < minMicros) {
     minMicros = m;
   }
   if (m > maxMicros) {
     maxMicros = m;
   }
}

I ran this program on a Teensy 3.6 board using the built-in SD with SPI and a quality Samsung microSD formatted exFAT.

I used SPI since errata with SDHC prevent valid busy tests in SDIO mode. See below for SDIO timing with exFAT.

I pre-allocated a 100 MiB file.

Here is the result with waitBusy true.

Code:
Starting write of 100 MiB.
minMicros: 157
maxMicros: 161
32.79 Seconds
3197.85 KB/sec

So the variation in write time is only about four micros if the card is not busy.

If I run it with waitBusy false, I typically get something like this.

Code:
Starting write of 100 MiB.
minMicros: 158
maxMicros: 1524
32.61 Seconds
3215.90 KB/sec

This means you can allocate a small sized FIFO, read data from sensors into the FIFO at the beginning of a loop and if the SD is not busy, write up to 512 bytes to the card.

I wrote an exFAT based logger and ran it on Teensy 3.6. I attached the program and it is an example in the above library. See the logData() function.

I am reading four ADC values every 200 microseconds. That’s a total of 20,000 ADC values/sec.

Here is the output from the logger. Each FIFO entry is eight bytes and holds four ADC values.

Code:
FreeStack: 254992
1024 FIFO entries will be used.
 
ExFatLogger00.bin
preAllocated: 1024 MiB
 
FreeStack: 246792
Type any character to stop

Log time: 1213.68 Seconds
File size: 48547800 bytes
totalOverrun: 0
maxFifoCount: 4
maxLogMicros: 30
maxWriteMicros: 162
Log interval: 200 micros
maxDelta: 1 micros

The max time to read the four ADC values was 30 microseconds and the maximum time to write the SD was 162 microseconds so it is is just possible to do every 200 microseconds. Only four FIFO entries were used. The jitter in start time for reading the ADCs is about one microsecond.

You could log data reliability at much higher rates if an interrupt routine was used to put data into the FIFO.

Here are results for pre-allocated files and SDIO using the above test with waitBusy false.

Code:
SDIO with a quality Samsung card.

Starting write of 1024 MiB.
minMicros: 26
maxMicros: 8700
57.10 Seconds
18803.93 KB/sec


Starting write of 1024 MiB.
minMicros: 26
maxMicros: 5696
57.07 Seconds
18815.13 KB/sec


Starting write of 1024 MiB.
minMicros: 26
maxMicros: 8343
57.08 Seconds
18809.86 KB/sec


Starting write of 1024 MiB.
minMicros: 26
maxMicros: 8243
57.06 Seconds
18817.44 KB/sec

You should be able to reliably log data at over 10MB/sec with a large FIFO and SDIO. It would be possible to provide over 20 ms of FIFO on Teensy 3.6 at 10 MB/sec.

I plan to do more tests with SDIO and see if any optimization is possible. Too bad the SDHC controller has so many errata.
 

Attachments

  • CardBusyTest.ino
    2.1 KB · Views: 131
  • ExFatLogger.ino
    15.5 KB · Views: 143
  • ExFatLogger.h
    279 bytes · Views: 148
Great work, Bill. I'm trying to get as much as possible of a 4MB/sec (32Mbit/sec, 4bytes/sample) stream onto SD. Currently using your SdFatEx, I can reliably sustain (up to 4GB FAT32 file limit) something like 128KB/sec (about 3.2% of full throughput) and of course quite a bit more in short bursts into a pre-erased SanDisk 32GB Extreme UHS grade 3. Your SdFs looks like it will move me in the right direction.

I am an engineer, but not an embedded system expert as you put it in a previous message. It is sometimes bewildering trying to keep up with the various programming paradigms (Arduino/C/assembly, etc.). I try not to ask too many stupid questions, but here's one: If it turns out that SPI instead of SDIO is the best way to "talk" to SD exFAT, do I have to allocate a physical pin on Teensy 3.6? I am using SDIO and most of the digital I/O pins already, don't really want to have to give one up.
 
I try not to ask too many stupid questions, but here's one: If it turns out that SPI instead of SDIO is the best way to "talk" to SD exFAT, do I have to allocate a physical pin on Teensy 3.6? I am using SDIO and most of the digital I/O pins already, don't really want to have to give one up.

With Teensy 3.6 I use the built-in SD which is connected to SPI1, actually the second port, so you still would have SPI.

If you can capture the data in an interrupt routine while the SD is busy, SDIO would be best.

Another options is an RTOS. I may experiment with this options. I recently ported a new version of ChibiOS/RT to Teensy, it does a context switch in about 650 nanoseconds so an interrupt could trigger a high priority thread to read sensors.

An RTOS is not a magic solution. Here is a interesting quote.
What an RTOS is

An RTOS is an operating system whose internal processes are guaranteed to be compliant with (hard or soft) realtime requirements. The fundamental qualities of an RTOS are:

* Predictability. It is the quality of being predictable in the scheduling behavior.

* Deterministic. It is the quality of being able to consistently produce the same results under the same conditions.

RTOS are often confused with “fast” operating systems. While efficiency is a positive attribute of an RTOS, efficiency alone does not qualifies an OS as RTOS but it could separate a good RTOS from a not so good one.

What an RTOS is not

An RTOS is not a magic wand, your system will not be “realtime” just because you are using an RTOS, what matters is your system design. The RTOS itself is just a toolbox that offers you the required tools for creating a realtime system, you can use the tools correctly or in the wrong way.

Another option is DMA, I used DMA on STM32 with a "circular buffer" STM32 really just uses two buffers and alternates between the two. I use two 512 byte buffers and copy them to a FIFO to be written in a lower priority thread. The STM32 ADC allows a sequence of up to 18 channels to be loaded into the ADC's sequencer. The STM32F7 line, Cortex M7, can reach 7.2 MSPS in Interleaved mode.
 
Last edited:
Is that a Kinetis / Teensy issue or a general issue with the native SD mode (does it work on some other hardware)?

It's a Kinetis problem. It's not simple, there are lots of errata. Working around them is like being in a maze.

See "Mask Set Errata for Mask 0N65N". Here are SDHC errata.
Table 1. Errata and Information Summary
Erratum ID Erratum Title
e3981 SDHC: ADMA fails when data length in the last descriptor is less or equal to 4 bytes
e3982 SDHC: ADMA transfer error when the block size is not a multiple of four
e4624 SDHC: AutoCMD12 and R1b polling problem
e3977 SDHC: Does not support Infinite Block Transfer Mode
e4627 SDHC: Erroneous CMD CRC error and CMD Index error may occur on sending new CMD during data
transfer
e3984 SDHC: eSDHC misses SDIO interrupt when CINT is disabled
e3983 SDHC: Problem when ADMA2 last descriptor is LINK or NOP
e3978 SDHC: Software can not clear DMA interrupt status bit after read operation

The key ones that stop me are "Does not support Infinite Block Transfer Mode", "AutoCMD12 and R1b polling problem", "Software can not clear DMA interrupt status bit after read operation". "Erroneous CMD CRC error and CMD Index error may occur on sending new CMD during data transfer" can also be a problem.

It's not an issue with SD cards. There are problems with lots of other Cortex M chips.

Modern high end SD cards have 512 KiB RUs (Record Units) and 4 MIB AUs (Allocation Units) so the problems only became apparent recently.

I plan on trying the new STM32H7 M7 chips soon. The 400MHz version will soon be available on a Nucleo board for $23 and a 600 MHz may follow. Maybe a new series will have old problems fixed. New problems are also possible.

Edit: The new STM32H7 has an entirely new SD controller. It supports 1.8V modes with SDR104 mode having a bus clock of 208 MHz and a transfer rate of 104 MB/sec.
 
Last edited:
I am writing to a circular buffer from a fast (per 4-byte sample) IRQ "honed" to run in <1usec. From there I copy to a transfer buffer which I then hand off to SdFat. At the outset of this project I mistook the benchmarks (~20MB/sec) as sustainable rates (my "bad"). Fortunately nobody is standing on one foot and holding their breath for this project :) Yes, I can make the buffer(s) larger but Teensy3.6 doesn't have enough RAM to bridge hundreds of milliseconds when the SD takes a little time out for housekeeping.

I do recall you extolling the virtues of an RTOS such as ChibiOS in another thread - I may explore that avenue but I'm an old dog that doesn't learn new tricks so fast. I certainly don't want to spend a lot of time figuring-out how to use an RTOS unless I am sure it will solve the problem and I can implement (port?) the rest of my project without too much fuss.

I also note that some interesting techniques that I haven't really understood yet such as preallocation of large spaces can be done (only exFAT?). That sounds like a perfect solution for me.

It looks like advances in SD hardware and your file system work may lead to improved throughput in my little project - I'll just be patient and watch for a while...
 
I am writing to a circular buffer from a fast (per 4-byte sample) IRQ "honed" to run in <1usec. From there I copy to a transfer buffer which I then hand off to SdFat. At the outset of this project I mistook the benchmarks (~20MB/sec) as sustainable rates (my "bad"). Fortunately nobody is standing on one foot and holding their breath for this project :) Yes, I can make the buffer(s) larger but Teensy3.6 doesn't have enough RAM to bridge hundreds of milliseconds when the SD takes a little time out for housekeeping.

My best SD card can do about 5MB/s sustained on Teensy 3.6 (220kB buffers) with a contiguous, pre-allocated, pre-erased file. It sounds like Bill's Samsung card might do 18MB/s with the same code.

At least with the cards I tested, pre-erasing is important (or there are more house-keeping pauses).

I do recall you extolling the virtues of an RTOS such as ChibiOS in another thread - I may explore that avenue but I'm an old dog that doesn't learn new tricks so fast. I certainly don't want to spend a lot of time figuring-out how to use an RTOS unless I am sure it will solve the problem and I can implement (port?) the rest of my project without too much fuss.

If you perform your data acquisition in an interrupt and all your main / loop() code does is writing to the SD, an RTOS won't help you with performance.
 
Thanks, TNI. I am using a Sandisk Extreme 32GB UHS3, and for my throughput measurements I format/erase using Bill's SDFormatter - it's >100x the speed of the "official" one from Tuxera (on my PC, via USB SD card reader).

Your point about RTOS sounds right to me, but I'm no expert at this. Although in part my code deals with a hardware interface I developed (not proud of it, just the way I got this to work) I've stripped away hopefully not too much code so the method to my madness is apparent:
Code:
// for special hardware with daisy-chained AD7768s
// (single 32Mb/sec - 1Msample/sec - bit stream)
// Arduino 1.8.3/Teensyduino 1.37
// uses W.Greiman's SdFat (mid 2017)

#include "SdFat.h"

const unsigned int IBUFSIZ = 8192;  // 8Kbyte ring buffer
byte inbuf[IBUFSIZ];  // ring buffer
volatile unsigned int wloc; // ring buffer next write location
unsigned int rloc; // ring buffer last read location
unsigned int bufBytes = 0; // # bytes in buffer
const unsigned int TBUFSIZ = 512; // transfer buffer size (bytes)
byte tbuf[TBUFSIZ];   // transfer buffer (because I don't know better)

// My method of transferring data from the A/Ds into ring buffer
// and from there into transfer buffer before passing to SDFat
// is probably naive and inefficient.

int nChan = 7;  // #channels to capture (1 thru nChan)
volatile int chNum = 0; // channel number
unsigned long nSamp = 65536; // #samples per channel
volatile unsigned long sNum;  // scan number

SdFatSdioEX sdEx;
File file;
char filename[23];  //yyyymmddHHMMSSrrnn.dat (rr:coded SR, nn:#ch)

//-----------------------------------------------------------------------------
void setup()
{
	...
  NVIC_SET_PRIORITY(IRQ_PORTC, 0); // highest priority
  ...
}

//-----------------------------------------------------------------------------
void acquire()
{
  unsigned int nwrit;
    
  sprintf(filename,"%d%02d%02d%02d%02d%02d%02x%02d.dat",year(),month(),day(),hour(),minute(),second(),srKHZ[srNDX]-1,nChan);
  if (!file.open(filename, O_RDWR | O_CREAT))
  {
    sdEx.errorHalt("open failed");
  }

  wloc = 0;
  rloc = 0;
  sNum = 0;
  chNum = 0;
  attachInterrupt(SCintPin,getSamp,RISING);
  digitalWriteFast(pinTableC[6],true);  // arm DRDY trigger
  while(sNum<nSamp)
  {
// a scan will never add more than 128 bytes (32ch arbitrary limit):
    bufBytes = (wloc-rloc)%IBUFSIZ;
    if(bufBytes>TBUFSIZ)
    {
        for(unsigned int j=0;j<TBUFSIZ;j++) tbuf[j] = inbuf[(rloc+j)%IBUFSIZ];
        nwrit = file.write(tbuf, TBUFSIZ);
        rloc = (rloc+TBUFSIZ)%IBUFSIZ;
    }
  }
  detachInterrupt(SCintPin);
  digitalWriteFast(pinTableC[6],false);
  
// clean up and close file
  bufBytes = (wloc-rloc)%IBUFSIZ;
  if(bufBytes>0)
  {
    for(unsigned int j=0;j<bufBytes;j++) tbuf[j] = inbuf[(rloc+j)%IBUFSIZ];
    nwrit = file.write(tbuf, bufBytes);
    rloc = (rloc+bufBytes)%IBUFSIZ;
  }
  file.close();
}

//-----------------------------------------------------------------------------
void getSamp()
// get next sample (4 bytes) into circular buffer inbuf[]
{
    uint8_t tmp = GPIOC_PDIR & 0x70;  // keep whatever is in C4-C6
    unsigned int tloc = wloc;   // global variable accesses take too long
  
// nops are precision delays to allow for '595 enable time
    GPIOC_PDOR = tmp | 1;     // OE for MS byte shift register
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    inbuf[tloc] = GPIOD_PDIR & 0xFF;  // the AD7768(s) flags byte
    
// reverse order because MATLAB bit24 wants lsByte first:
    GPIOC_PDOR = tmp | 2;  
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    inbuf[tloc+3] = GPIOD_PDIR & 0xFF;
   
    GPIOC_PDOR = tmp | 4;
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    inbuf[tloc+2] = GPIOD_PDIR & 0xFF;

    GPIOC_PDOR = tmp | 8;    // OE for LS byte shift register 
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    __asm__ volatile ("nop");
    inbuf[tloc+1] = GPIOD_PDIR & 0xFF;

    chNum = (chNum+1)%nChan;
    if(chNum==0)
    {
      __asm__ volatile ("nop");
      digitalWriteFast(pinTableC[6],false); // reset DRDY trigger
      digitalWriteFast(pinTableC[6],true);
      sNum++;         // update sample number
    }
    GPIOC_PDOR = tmp; // clear the OEs
    wloc = (tloc+4)%IBUFSIZ;
}
 
My best SD card can do about 5MB/s sustained on Teensy 3.6 (220kB buffers) with a contiguous, pre-allocated, pre-erased file. It sounds like Bill's Samsung card might do 18MB/s with the same code.

At least with the cards I tested, pre-erasing is important (or there are more house-keeping pauses).

If you perform your data acquisition in an interrupt and all your main / loop() code does is writing to the SD, an RTOS won't help you with performance.

My experience is also that ZERO buffer overrun requires buffers that can handle at least 120 ms SD access delays.
Most (SanDisk) SD cards (60G/128G) I had tested, had very regular (every 16 MByte ca) a SD-delay of over 110 ms . Different disks behaved slightly different.
to measure disk performance, write to disk was raw and sequential: directly to first partition, no file system, pre-erasing had only a minor impact (some ms for largest latency), write buffer size was 16 or 32 kB with similar results.
sanDisk2.jpg
The regularity seen may be because of a (nearly) fresh 64G disk. Only latencies > 15 ms are plotted.
I assume that used disks will have different latency pattern (see also pattern at the beginning of picture)
 
I am using a Sandisk Extreme 32GB UHS3, and for my throughput measurements I format/erase using Bill's SDFormatter - it's >100x the speed of the "official" one from Tuxera (on my PC, via USB SD card reader).

to measure disk performance, write to disk was raw and sequential: directly to first partition, no file system, pre-erasing had only a minor impact (some ms for largest latency), write buffer size was 16 or 32 kB with similar results.

Modern SD cards are extremely complex. Buying an UHS3 card won't help since you will never use more than SD V2.0 features.

Many cards expect a filesystem layout specified by the SD standard. Performance in the FAT area of a 32GB card is different than the data area.

RU (Record Units) are huge, 512KiB, on high end cards so if you do smaller writes, the card will likely need to move data which will exhaust the erased free AU (Allocation Unit) list and you will get a big delay. AUs are also large, 4MiB on high end cards.

So modern cards aren't designed for how we use them on micro-controllers. You must experiment to see which cards works best for how you are using them.

I can write huge files, with max a max latency of about 10ms at 10 MB/sec with the FIFO version of SDIO. The problem is Teensy 3.6 errata limit this to 0xFFFF blocks then I must restart the write so this is why there is the 10 ms or more latency. I try to make the restart lineup on a RU boundary.

See this for tests I ran on cards at a sustained 10 MB/sec.

I used the program from the above link to write a 9,083,144,192 byte test file. You need to add 2^33, 8,589,934,592, to the byte count since I cast file size to uint32_t to print it.

I modified the program to pre-allocate a 16 GiB file.

Code:
FIFO_DIM = 400
FreeStack: 49595
Type any character to stop

166 maxFifoCount
493209600 bytes
887.03 seconds
10.24 MB/sec

The program used a max of 166 sector buffers or 84,992 bytes of buffer. This means the max latency was about 8.5 ms.

Here is the Window 10 version of file size, GB is really GiB:
Code:
Size:         8.45 GB (9,083,144,192 bytes)

Size on Disk: 8.45 GB (9,083,158,528 bytes)

I used a Samsung 64GB Pro Select card.


The problem with my FIFO SDIO is that it uses lots of CPU so you can't use it with a program that uses a high rate ISR to capture data.

In summary, SDHC errata prevent effective use of high end cards with DMA so you are unlikely to get high performance from programs like heppjs posted.

Edit: SanDisk and Samsung cards with Pro in their name usually have MLC flash which store two bits in a cell. Other consumer cards have TLC flash which stores three bits in a cell. MLC cards have much better latency properties than TLC card.
 
Last edited:
I hardly can get the referenced ino file working

had to increase timer interval from 50 to 200 micros

Code:
Type any character to begin

FIFO_DIM = 400
FreeStack: 49563
Type any character to stop

232 maxFifoCount
73154048 bytes
28.58 seconds
2.56 MB/sec

Type any character to run test again

Type any character to stop

Overrun ERROR!!
400 maxFifoCount
5668864 bytes
2.29 seconds
2.47 MB/sec

Type any character to run test again

Type any character to stop

Overrun ERROR!!
400 maxFifoCount
132350464 bytes
51.89 seconds
2.55 MB/sec

Type any character to run test again

As I said it depends on the uSD.
 
I hardly can get the referenced ino file working

had to increase timer interval from 50 to 200 micros

As I said it depends on the uSD.

Are you using the latest version of SdFs? If so, you really proved "it depends on the uSD".

All of my Samsung Pro, Pro+ 64GB uSD cards work with the test you used.

Here is a real surprise. I did a test with DMA SDIO using a Samsung Pro Select 64GB card.

I simulated the case where data is captured in a high rate non DMA ISR. I assumed 10.24 MB/sec rate. I used a FIFO with six 32KiB entries.

The test program is attached.

Here is my result:

Code:
FIFO_DIM = 6
FreeStack: 59575
Type any character to stop

4 maxFifoCount
8689254400 bytes
848.56 seconds
10.24 MB/sec
2052831653 yieldCalls

Note that almost any test will use at least two FIFO entries, one for the ISR and one being written to the SD. You could allocate seven 32KiB entries on Teensy 3.6. My simulation doesn't really have one allocated in the ISR so with seven it should be safe.

There were two billion yield calls from the DMA busy loop in the driver so there is CPU time for the ISR or with cleaver programming or an RTOS, you could use much of the CPU time burned in the DMA wait loop for other uses.
 

Attachments

  • Teensy36FifoLogger.ino
    4.3 KB · Views: 162
Last edited:
Status
Not open for further replies.
Back
Top