Teensy 3.6 microSD pins?

Status
Not open for further replies.

Bill Greiman

Well-known member
Could you provide details for the on-board microSD?

I have just tested SdFat on Teensy 3.6 using an external socket on pins 10, 11, 12, and 13.

I used SdFat-beta which has a class, SdFatEX, that optimizes multi-block transfers.

SdFatEX keeps the SD selected with SD_CS low unless you force it to be released.

Here are results for 512 byte reads and writes.
File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3238.13,9271,155,157
3204.92,15792,155,159

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3293.60,2529,154,155
3297.94,1346,154,154

Here are results with 50 byte reads and writes. These go through the cache.
File size 5 MB
Buffer size 50 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2891.84,9417,1,17
2875.22,15833,1,17

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2936.00,3474,1,16
2941.18,1292,1,16

Here is the performance for the standard version of SdFat with 50 byte reads and writes. Writes for SdFatEX are about six times faster and reads are more than twice as fast.

File size 5 MB
Buffer size 50 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
459.73,25878,1,108
451.51,25492,1,110

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1327.32,3321,1,37
1327.67,1391,1,37

True DMA or SDIO would be great. A STM32F411 can do over 5 MB/sec for 512 byte transfers with 50 Mhz DMA SPI.
 
Last edited:
Bill,

There are others who know a whole lot more than I do, about this. Earlier I thought about starting to play with some of this, and took a quick look through chapter 60 of the pdf, and decided it was probably best left to those who have done more on SD stuff than I have. Also sounded like Paul wanted to look at it soon (about a month ago)

I know that the IO pins are logically pins PTE0-5 and I believe for SDCard usage you need ALT4 functionality.

I am pretty sure before you can touch any of the registers associated with the SDCARD you need to enable access.
SIM_SCGC3 |= SIM_SCGC3_SDHC;

I know that you need to choose a clock to use with the SDCard, and probably needed to set up those options in the register SIM_SOPT2. Looks like two sets of options. One is an option set, that many subsystems use like LPUART which I looked at to get Serial6. I know the system already chooses different options for this depending on CPU speed. But then it looks like you have a couple of other options for clock as well.

That is about as far as I got...
 
Is there a SDHC driver? I would like to see the API.

Most SDIO drivers for Cortex-M chips don't work well with new SD cards. They usually have an API that reads/writes n blocks starting at a given block number.

If you want to write a large file fast, you need to use huge multi-block writes.

In SdFatEX I have a cache for a FAT block so I can do a multi-block write of up 8192 blocks or 4 MB.

For good performance with modern SD cards the API should be a call to start a multi-block transfer, a call to send a block, and a call to terminate the transfer.

I see a factor of ten performance difference on STM32 between single block and streaming multi-block writes with 50 MHz SPI.

I expect even more when I have an SDIO driver that allows large streaming writes, maybe a factor of 40 with a write speed of 20 MB/sec.
 
Looks like a typical implementation. I don't see a way to stream blocks.

I suspect file system write using this driver will be slow unless the writes are very large.

I did this port, as it was easy, supported exFAT, it provided all file handling for general use and I could do it in a few days. Unfortunately, I lost my K66 soon after porting.

For my own application, which requires fasted write-only, I have a different approach: only root directory, open only consecutive files, only with exFAT32, no FAT table only bitmap write, only multiple cluster writes. For the moment I have only a SPI based version, no SDIO.

As you may know, K66 SDHC features internal DMA, so there is no need to use regular DMA channels.
I tried to prepare some background ISR driven read/write extensions, but as I said no K66 anymore, so development stopped
No attempt were made to speed up (cache) short read/writes.

Also I knew that when you get your T3.6, you are more qualified than me to provide filing support.
 
For my own application, which requires fasted write-only, I have a different approach: only root directory, open only consecutive files, only with exFAT32, no FAT table only bitmap write, only multiple cluster writes. For the moment I have only a SPI based version, no SDIO.

Even with exFAT, the ability to do very large multiple block transfers is key to high speed.

I did two tests of your FatFS port. I wrote a 5 MB file with this loop.

Code:
   uint32_t m = micros();
   for (int n = 0; n < FILE_SIZE/BUFFSIZE; n++) {
     rc = f_write(&fil, buffer, BUFFSIZE, &wr);
     if (rc) die(rc);
   }
   m = micros() - m;

For 50 byte BUFFSIZE the 5 MB write takes 9.97 seconds or 502 KB/sec. Much slower than the 2,875 KB/sec I get with SPI and maximum multi-block writes.

For 4,096 byte BUFFSIZE the time for writing 5 MB is 2.85 seconds or 1,750 KB/sec. That's still slower than the 2,875 KB/sec I get for 50 byte writes with SPI.

My SPI driver is not DMA, just a FIFO driver. With a DMA SPI driver on STM32 I get almost 5 MB/sec.

I really would like to see an SDIO driver that allowed arbitrarily long multiple block writes.

If you look at section 4.13 of the SD Physical Layer Simplified Specification V 4.10, you will see that modern SD cards have an AU size of 4 MB and an RU size of 512 KB.

The 512 byte transfer size is not an internal block size. Cards manage buffers and flash in much large sizes and require huge multi-block transfers to achieve very high performance.

Edit: Here is the key statement from the SD spec.

Application Notes:
Performance may increase when larger data is written by one multiple write command. Therefore, the host may use larger RU sizes and transfer multiple RUs with one multiple-write command.

As I stated above, an RU is 512 KB on a high end card.
 
Last edited:
Even with exFAT, the ability to do very large multiple block transfers is key to high speed.

I did two tests of your FatFS port. I wrote a 5 MB file with this loop.

Code:
   uint32_t m = micros();
   for (int n = 0; n < FILE_SIZE/BUFFSIZE; n++) {
     rc = f_write(&fil, buffer, BUFFSIZE, &wr);
     if (rc) die(rc);
   }
   m = micros() - m;

For 50 byte BUFFSIZE the 5 MB write takes 9.97 seconds or 502 KB/sec. Much slower than the 2,875 KB/sec I get with SPI and maximum multi-block writes.

For 4,096 byte BUFFSIZE the time for writing 5 MB is 2.85 seconds or 1,750 KB/sec. That's still slower than the 2,875 KB/sec I get for 50 byte writes with SPI.

My SPI driver is not DMA, just a FIFO driver. With a DMA SPI driver on STM32 I get almost 5 MB/sec.

I really would like to see an SDIO driver that allowed arbitrarily long multiple block writes.

If you look at section 4.13 of the SD Physical Layer Simplified Specification V 4.10, you will see that modern SD cards have an AU size of 4 MB and an RU size of 512 KB.

The 512 byte transfer size is not an internal block size. Cards manage buffers and flash in much large sizes and require huge multi-block transfers to achieve very high performance.

Edit: Here is the key statement from the SD spec.



As I stated above, an RU is 512 KB on a high end card.

you are the expert, so I will not argue with you about Chan's approach to implement (ex)FAT(32).
I only adapted an existing SDIO driver.
Also I assume you have read the SDHC section in the K66 reference manual, so you know what the K66 is capable and what not.

I, for myself, I need DMA not for speed but for doing more important stuff with the CPU, like FFT and keep logging in the background.
Objective is to fill up the largest available SD card as fast as possible.

I don't care about filing per se, it is only convenient bridge to be able to read SD cards with PC without special software.

I look forward for your K66 SDIO FAT implementation for testing
 
I really would like to see an SDIO driver that allowed arbitrarily long multiple block writes.

I'm pretty sure you'll beat me to it, but if not, I'm going to work on this in a couple months. At this moment I've got a list of extremely urgent things which must get done to avoid delaying the Kickstarter rewards.
 
Objective is to fill up the largest available SD card as fast as possible.

I am curious, what is your data source and what are the rates?

I mostly use ChibiOS for fast logging. It's really nice since the RTOS handles sleeping while the DMA SD transfers take place. It is easy to log from a single ADC at over a million samples per second.

ChibiOS is tickless so there is little extra overhead and a context switch on a 200 MHz cpu takes under 1/2 microsecond. You just use the OS provided queues and synchronization stuff.

I will be implementing exFAT, not so much for the FAT problem but the 4GB max file size. I am looking at master slave or interleaved ADCs and at 6-8 MB/sec, 4GB lasts about 10 minutes.

With FAT32 I just create a 4GB contiguous file and stream blocks to it with raw writes. I have a call in SdFat to create the file. It takes a fraction of a second since only 1,000 FAT blocks need to be updated. I truncate the file if not all the space is used.

This idea for fast logging dates back to the early 1980s with the VxWorks RTOS.
 
Last edited:
If you just need a few huge contiguous files, you don't really need a file system. The GPT partition table format allows 128 partitions. So you can effectively have 128 pseudo-files. For 'file' identification you can use GUIDs (16 bytes) or partition labels (36 UTF-16 characters).

You could also create a FAT file system in one of the partitions.

Using the raw partitions works very well under Linux. You have command line tools to list the partitions, their GUIDs and labels and you get automatically created device files to read and write them. (Using Windows is quite messy, there doesn't seem to be a decent way access the partitions directly, you need to use raw access to the whole disk.)
 
I am curious, what is your data source and what are the rates?

I mostly use ChibiOS for fast logging. It's really nice since the RTOS handles sleeping while the DMA SD transfers take place. It is easy to log from a single ADC at over a million samples per second.

ChibiOS is tickless so there is little extra overhead and a context switch on a 200 MHz cpu takes under 1/2 microsecond. You just use the OS provided queues and synchronization stuff.

I will be implementing exFAT, not so much for the FAT problem but the 4GB max file size. I am looking at master slave or interleaved ADCs and at 6-8 MB/sec, 4GB lasts about 10 minutes.

With FAT32 I just create a 4GB contiguous file and stream blocks to it with raw writes. I have a call in SdFat to create the file. It takes a fraction of a second since only 1,000 FAT blocks need to be updated. I truncate the file if not all the space is used.

This idea for fast logging dates back to the early 1980s with the VxWorks RTOS.

Bill,
the actual application is: 4 hydrophones sampled at 350 kHz with 16 bit resolution.
at the same time 2 x 4 channel decimation filter for event detection.
(if some one is really interested: 3-d localization of harbor porpoises in presence of low frequency anthropogenic noise (ship, pile drivers etc.)

Concerning processing (T3.2 is 99 % CPU loaded, so I need T3.6, but I could use 2 T3.2)
Concerning data archiving: T3.2 SPI is far too slow and RAM too little for buffering. Initial tests with SDIO is promising and RAM of T3.6 is higher, but I need a T3.6 for more testing.

It takes a fraction of a second since only 1,000 FAT blocks need to be updated
Well, if you wanted archiving without dropping data, you need some buffer, which we do not have on T3.2

If you just need a few huge contiguous files, you don't really need a file system. The GPT partition table format allows 128 partitions. So you can effectively have 128 pseudo-files. For 'file' identification you can use GUIDs (16 bytes) or partition labels (36 UTF-16 characters).

You could also create a FAT file system in one of the partitions.

Using the raw partitions works very well under Linux. You have command line tools to list the partitions, their GUIDs and labels and you get automatically created device files to read and write them. (Using Windows is quite messy, there doesn't seem to be a decent way access the partitions directly, you need to use raw access to the whole disk.)

I have used different systems that write directly to flash. And I wrote also directly to uSD (als long one honors the MBR and the start of the partition, a PC allows access to it) but I wanted a logging program where I can access the individual files with, say Matlab without additional data conversion.
exFAT helps a lot as it simplifies filing.

OT: In the end I may end up with resurrecting an old WORM file system that I wrote in the early 90's that considered the media as tape while writing, but allowing random read access afterwards.
 
First, Paul, I am going to try to do a SDHC driver that will allow arbitrarily long multi-block transfers. At first glance the K66 SDHC looks good.

I have really been burned with SDIO on some chips.

On fast logging.

I agree that exFAT is best but using multiple FAT32 files is almost as good.

To write fast you must write large multi-block chunks. I just uses a freshly formatted card and create a number of large contiguous files.

These files form a contiguous range of blocks of any size.

I do a start multi-block transfer at the start of this range and just write blocks. Finally I do an end transfer. I have written 20GB as a single multi-block write. You need a relatively small buffer pool since the latency variation for each block transfer is very small.

Once again, this is how modern SD cards are designed to be written.

On the PC you just concatenate these files to a single large file, that's just a copy and you want to copy the data from the SD anyhow.
 
I spent the morning hacking WMXZ's SDHC driver to work with SdFat. Here are first results.

First I tried the SD.h ReadWrite example. Here are the two mods required to run the example.

Code:
//#include <SD.h>
//#include <SPI.h>
#include "SdFat.h"
SdFatSdio SD;

// skip to second mod

//  if (!SD.begin(chipSelect)) {
  if (!SD.begin()) {

The output.

Initializing SD card...initialization done.
Writing to test.txt...done.
test.txt:
testing 1, 2, 3.

Next I ran the SdFat bench example.

Here is the result with 512 byte writes/reads.

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
513.47,25991,863,996
513.79,24640,861,995

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1927.40,2182,252,265
1930.38,1387,251,264

Sadly SdFatEX on the SPI port wins with its optimized multi-block transfers. See above post.

Now for 32 KB writes/reads.
FreeStack: 222664
Type is FAT32
File size 5 MB
Buffer size 32768 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
7981.95,12152,3618,4086
7918.50,12169,3616,4124

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
8427.64,4163,3259,3892
8399.22,4162,3472,3898

Done

That's the fastest result for any board using the Arduino IDE. I love the FreeStack number even with the 32KB buffer!

SDIO with the ChibiOS/RT driver on STM32F411 is faster but it runs at 50 MHz.

I think I saw K66 code somewhere for switch to SDIO high speed mode. Probably won't help write a lot without extended multi-block transfers.

Edit:
I checked write performance for uSDFS and the write speed is identical to Sdfat. For large writes, speed is determined by the driver.
 
Last edited:
Here's the next step in speed. I am starting to rewrite the SDHC driver and now put the SD in "High Speed Mode".

Here are the performance results.

Samsung PRO+ 32GB
File size 5 MB
Buffer size 32768 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
12705.96,9386,2089,2567
12420.79,9406,2108,2627

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
15139.02,4224,1739,2162
15093.14,3998,2115,2168

SanDisk Extreme Pro 32GB
File size 5 MB
Buffer size 32768 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
11972.92,5780,1645,2716
11830.73,6632,1660,2744

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
18244.46,2067,1696,1794
18311.53,2067,1763,1794
 
Status
Not open for further replies.
Back
Top