Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 15 of 15

Thread: Teensy 3.6 microSD pins?

  1. #1
    Senior Member
    Join Date
    Nov 2012
    Posts
    271

    Teensy 3.6 microSD pins?

    Could you provide details for the on-board microSD?

    I have just tested SdFat on Teensy 3.6 using an external socket on pins 10, 11, 12, and 13.

    I used SdFat-beta which has a class, SdFatEX, that optimizes multi-block transfers.

    SdFatEX keeps the SD selected with SD_CS low unless you force it to be released.

    Here are results for 512 byte reads and writes.
    File size 5 MB
    Buffer size 512 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    3238.13,9271,155,157
    3204.92,15792,155,159

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    3293.60,2529,154,155
    3297.94,1346,154,154
    Here are results with 50 byte reads and writes. These go through the cache.
    File size 5 MB
    Buffer size 50 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    2891.84,9417,1,17
    2875.22,15833,1,17

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    2936.00,3474,1,16
    2941.18,1292,1,16
    Here is the performance for the standard version of SdFat with 50 byte reads and writes. Writes for SdFatEX are about six times faster and reads are more than twice as fast.


    File size 5 MB
    Buffer size 50 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    459.73,25878,1,108
    451.51,25492,1,110

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    1327.32,3321,1,37
    1327.67,1391,1,37
    True DMA or SDIO would be great. A STM32F411 can do over 5 MB/sec for 512 byte transfers with 50 Mhz DMA SPI.
    Last edited by Bill Greiman; 09-07-2016 at 01:42 AM.

  2. #2
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,416
    Bill,

    There are others who know a whole lot more than I do, about this. Earlier I thought about starting to play with some of this, and took a quick look through chapter 60 of the pdf, and decided it was probably best left to those who have done more on SD stuff than I have. Also sounded like Paul wanted to look at it soon (about a month ago)

    I know that the IO pins are logically pins PTE0-5 and I believe for SDCard usage you need ALT4 functionality.

    I am pretty sure before you can touch any of the registers associated with the SDCARD you need to enable access.
    SIM_SCGC3 |= SIM_SCGC3_SDHC;

    I know that you need to choose a clock to use with the SDCard, and probably needed to set up those options in the register SIM_SOPT2. Looks like two sets of options. One is an option set, that many subsystems use like LPUART which I looked at to get Serial6. I know the system already chooses different options for this depending on CPU speed. But then it looks like you have a couple of other options for clock as well.

    That is about as far as I got...

  3. #3
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    Is there a SDHC driver? I would like to see the API.

    Most SDIO drivers for Cortex-M chips don't work well with new SD cards. They usually have an API that reads/writes n blocks starting at a given block number.

    If you want to write a large file fast, you need to use huge multi-block writes.

    In SdFatEX I have a cache for a FAT block so I can do a multi-block write of up 8192 blocks or 4 MB.

    For good performance with modern SD cards the API should be a call to start a multi-block transfer, a call to send a block, and a call to terminate the transfer.

    I see a factor of ten performance difference on STM32 between single block and streaming multi-block writes with 50 MHz SPI.

    I expect even more when I have an SDIO driver that allows large streaming writes, maybe a factor of 40 with a write speed of 20 MB/sec.

  4. #4
    Senior Member
    Join Date
    Oct 2013
    Location
    Rogersville MO
    Posts
    253
    You might check out https://github.com/WMXZ-EU/uSDFS

  5. #5
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    Quote Originally Posted by cartere View Post
    Looks like a typical implementation. I don't see a way to stream blocks.

    I suspect file system write using this driver will be slow unless the writes are very large.

  6. #6
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,330
    Quote Originally Posted by Bill Greiman View Post
    Looks like a typical implementation. I don't see a way to stream blocks.

    I suspect file system write using this driver will be slow unless the writes are very large.
    I did this port, as it was easy, supported exFAT, it provided all file handling for general use and I could do it in a few days. Unfortunately, I lost my K66 soon after porting.

    For my own application, which requires fasted write-only, I have a different approach: only root directory, open only consecutive files, only with exFAT32, no FAT table only bitmap write, only multiple cluster writes. For the moment I have only a SPI based version, no SDIO.

    As you may know, K66 SDHC features internal DMA, so there is no need to use regular DMA channels.
    I tried to prepare some background ISR driven read/write extensions, but as I said no K66 anymore, so development stopped
    No attempt were made to speed up (cache) short read/writes.

    Also I knew that when you get your T3.6, you are more qualified than me to provide filing support.

  7. #7
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    Quote Originally Posted by WMXZ View Post
    For my own application, which requires fasted write-only, I have a different approach: only root directory, open only consecutive files, only with exFAT32, no FAT table only bitmap write, only multiple cluster writes. For the moment I have only a SPI based version, no SDIO.
    Even with exFAT, the ability to do very large multiple block transfers is key to high speed.

    I did two tests of your FatFS port. I wrote a 5 MB file with this loop.

    Code:
           
       uint32_t m = micros();
       for (int n = 0; n < FILE_SIZE/BUFFSIZE; n++) {
         rc = f_write(&fil, buffer, BUFFSIZE, &wr);
         if (rc) die(rc);
       }
       m = micros() - m;
    For 50 byte BUFFSIZE the 5 MB write takes 9.97 seconds or 502 KB/sec. Much slower than the 2,875 KB/sec I get with SPI and maximum multi-block writes.

    For 4,096 byte BUFFSIZE the time for writing 5 MB is 2.85 seconds or 1,750 KB/sec. That's still slower than the 2,875 KB/sec I get for 50 byte writes with SPI.

    My SPI driver is not DMA, just a FIFO driver. With a DMA SPI driver on STM32 I get almost 5 MB/sec.

    I really would like to see an SDIO driver that allowed arbitrarily long multiple block writes.

    If you look at section 4.13 of the SD Physical Layer Simplified Specification V 4.10, you will see that modern SD cards have an AU size of 4 MB and an RU size of 512 KB.

    The 512 byte transfer size is not an internal block size. Cards manage buffers and flash in much large sizes and require huge multi-block transfers to achieve very high performance.

    Edit: Here is the key statement from the SD spec.

    Application Notes:
    Performance may increase when larger data is written by one multiple write command. Therefore, the host may use larger RU sizes and transfer multiple RUs with one multiple-write command.
    As I stated above, an RU is 512 KB on a high end card.
    Last edited by Bill Greiman; 08-24-2016 at 12:18 PM.

  8. #8
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,330
    Quote Originally Posted by Bill Greiman View Post
    Even with exFAT, the ability to do very large multiple block transfers is key to high speed.

    I did two tests of your FatFS port. I wrote a 5 MB file with this loop.

    Code:
           
       uint32_t m = micros();
       for (int n = 0; n < FILE_SIZE/BUFFSIZE; n++) {
         rc = f_write(&fil, buffer, BUFFSIZE, &wr);
         if (rc) die(rc);
       }
       m = micros() - m;
    For 50 byte BUFFSIZE the 5 MB write takes 9.97 seconds or 502 KB/sec. Much slower than the 2,875 KB/sec I get with SPI and maximum multi-block writes.

    For 4,096 byte BUFFSIZE the time for writing 5 MB is 2.85 seconds or 1,750 KB/sec. That's still slower than the 2,875 KB/sec I get for 50 byte writes with SPI.

    My SPI driver is not DMA, just a FIFO driver. With a DMA SPI driver on STM32 I get almost 5 MB/sec.

    I really would like to see an SDIO driver that allowed arbitrarily long multiple block writes.

    If you look at section 4.13 of the SD Physical Layer Simplified Specification V 4.10, you will see that modern SD cards have an AU size of 4 MB and an RU size of 512 KB.

    The 512 byte transfer size is not an internal block size. Cards manage buffers and flash in much large sizes and require huge multi-block transfers to achieve very high performance.

    Edit: Here is the key statement from the SD spec.



    As I stated above, an RU is 512 KB on a high end card.
    you are the expert, so I will not argue with you about Chan's approach to implement (ex)FAT(32).
    I only adapted an existing SDIO driver.
    Also I assume you have read the SDHC section in the K66 reference manual, so you know what the K66 is capable and what not.

    I, for myself, I need DMA not for speed but for doing more important stuff with the CPU, like FFT and keep logging in the background.
    Objective is to fill up the largest available SD card as fast as possible.

    I don't care about filing per se, it is only convenient bridge to be able to read SD cards with PC without special software.

    I look forward for your K66 SDIO FAT implementation for testing

  9. #9
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,555
    Quote Originally Posted by Bill Greiman View Post
    I really would like to see an SDIO driver that allowed arbitrarily long multiple block writes.
    I'm pretty sure you'll beat me to it, but if not, I'm going to work on this in a couple months. At this moment I've got a list of extremely urgent things which must get done to avoid delaying the Kickstarter rewards.

  10. #10
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    Objective is to fill up the largest available SD card as fast as possible.
    I am curious, what is your data source and what are the rates?

    I mostly use ChibiOS for fast logging. It's really nice since the RTOS handles sleeping while the DMA SD transfers take place. It is easy to log from a single ADC at over a million samples per second.

    ChibiOS is tickless so there is little extra overhead and a context switch on a 200 MHz cpu takes under 1/2 microsecond. You just use the OS provided queues and synchronization stuff.

    I will be implementing exFAT, not so much for the FAT problem but the 4GB max file size. I am looking at master slave or interleaved ADCs and at 6-8 MB/sec, 4GB lasts about 10 minutes.

    With FAT32 I just create a 4GB contiguous file and stream blocks to it with raw writes. I have a call in SdFat to create the file. It takes a fraction of a second since only 1,000 FAT blocks need to be updated. I truncate the file if not all the space is used.

    This idea for fast logging dates back to the early 1980s with the VxWorks RTOS.
    Last edited by Bill Greiman; 08-24-2016 at 02:16 PM.

  11. #11
    Senior Member
    Join Date
    Jan 2013
    Posts
    843
    If you just need a few huge contiguous files, you don't really need a file system. The GPT partition table format allows 128 partitions. So you can effectively have 128 pseudo-files. For 'file' identification you can use GUIDs (16 bytes) or partition labels (36 UTF-16 characters).

    You could also create a FAT file system in one of the partitions.

    Using the raw partitions works very well under Linux. You have command line tools to list the partitions, their GUIDs and labels and you get automatically created device files to read and write them. (Using Windows is quite messy, there doesn't seem to be a decent way access the partitions directly, you need to use raw access to the whole disk.)

  12. #12
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,330
    Quote Originally Posted by Bill Greiman View Post
    I am curious, what is your data source and what are the rates?

    I mostly use ChibiOS for fast logging. It's really nice since the RTOS handles sleeping while the DMA SD transfers take place. It is easy to log from a single ADC at over a million samples per second.

    ChibiOS is tickless so there is little extra overhead and a context switch on a 200 MHz cpu takes under 1/2 microsecond. You just use the OS provided queues and synchronization stuff.

    I will be implementing exFAT, not so much for the FAT problem but the 4GB max file size. I am looking at master slave or interleaved ADCs and at 6-8 MB/sec, 4GB lasts about 10 minutes.

    With FAT32 I just create a 4GB contiguous file and stream blocks to it with raw writes. I have a call in SdFat to create the file. It takes a fraction of a second since only 1,000 FAT blocks need to be updated. I truncate the file if not all the space is used.

    This idea for fast logging dates back to the early 1980s with the VxWorks RTOS.
    Bill,
    the actual application is: 4 hydrophones sampled at 350 kHz with 16 bit resolution.
    at the same time 2 x 4 channel decimation filter for event detection.
    (if some one is really interested: 3-d localization of harbor porpoises in presence of low frequency anthropogenic noise (ship, pile drivers etc.)

    Concerning processing (T3.2 is 99 % CPU loaded, so I need T3.6, but I could use 2 T3.2)
    Concerning data archiving: T3.2 SPI is far too slow and RAM too little for buffering. Initial tests with SDIO is promising and RAM of T3.6 is higher, but I need a T3.6 for more testing.

    It takes a fraction of a second since only 1,000 FAT blocks need to be updated
    Well, if you wanted archiving without dropping data, you need some buffer, which we do not have on T3.2

    Quote Originally Posted by tni View Post
    If you just need a few huge contiguous files, you don't really need a file system. The GPT partition table format allows 128 partitions. So you can effectively have 128 pseudo-files. For 'file' identification you can use GUIDs (16 bytes) or partition labels (36 UTF-16 characters).

    You could also create a FAT file system in one of the partitions.

    Using the raw partitions works very well under Linux. You have command line tools to list the partitions, their GUIDs and labels and you get automatically created device files to read and write them. (Using Windows is quite messy, there doesn't seem to be a decent way access the partitions directly, you need to use raw access to the whole disk.)
    I have used different systems that write directly to flash. And I wrote also directly to uSD (als long one honors the MBR and the start of the partition, a PC allows access to it) but I wanted a logging program where I can access the individual files with, say Matlab without additional data conversion.
    exFAT helps a lot as it simplifies filing.

    OT: In the end I may end up with resurrecting an old WORM file system that I wrote in the early 90's that considered the media as tape while writing, but allowing random read access afterwards.

  13. #13
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    First, Paul, I am going to try to do a SDHC driver that will allow arbitrarily long multi-block transfers. At first glance the K66 SDHC looks good.

    I have really been burned with SDIO on some chips.

    On fast logging.

    I agree that exFAT is best but using multiple FAT32 files is almost as good.

    To write fast you must write large multi-block chunks. I just uses a freshly formatted card and create a number of large contiguous files.

    These files form a contiguous range of blocks of any size.

    I do a start multi-block transfer at the start of this range and just write blocks. Finally I do an end transfer. I have written 20GB as a single multi-block write. You need a relatively small buffer pool since the latency variation for each block transfer is very small.

    Once again, this is how modern SD cards are designed to be written.

    On the PC you just concatenate these files to a single large file, that's just a copy and you want to copy the data from the SD anyhow.

  14. #14
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    I spent the morning hacking WMXZ's SDHC driver to work with SdFat. Here are first results.

    First I tried the SD.h ReadWrite example. Here are the two mods required to run the example.

    Code:
    //#include <SD.h>
    //#include <SPI.h>
    #include "SdFat.h"
    SdFatSdio SD;
    
    // skip to second mod
    
    //  if (!SD.begin(chipSelect)) {
      if (!SD.begin()) {
    The output.

    Initializing SD card...initialization done.
    Writing to test.txt...done.
    test.txt:
    testing 1, 2, 3.
    Next I ran the SdFat bench example.

    Here is the result with 512 byte writes/reads.

    File size 5 MB
    Buffer size 512 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    513.47,25991,863,996
    513.79,24640,861,995

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    1927.40,2182,252,265
    1930.38,1387,251,264
    Sadly SdFatEX on the SPI port wins with its optimized multi-block transfers. See above post.

    Now for 32 KB writes/reads.
    FreeStack: 222664
    Type is FAT32
    File size 5 MB
    Buffer size 32768 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    7981.95,12152,3618,4086
    7918.50,12169,3616,4124

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    8427.64,4163,3259,3892
    8399.22,4162,3472,3898

    Done
    That's the fastest result for any board using the Arduino IDE. I love the FreeStack number even with the 32KB buffer!

    SDIO with the ChibiOS/RT driver on STM32F411 is faster but it runs at 50 MHz.

    I think I saw K66 code somewhere for switch to SDIO high speed mode. Probably won't help write a lot without extended multi-block transfers.

    Edit:
    I checked write performance for uSDFS and the write speed is identical to Sdfat. For large writes, speed is determined by the driver.
    Last edited by Bill Greiman; 08-25-2016 at 07:14 PM.

  15. #15
    Senior Member
    Join Date
    Nov 2012
    Posts
    271
    Here's the next step in speed. I am starting to rewrite the SDHC driver and now put the SD in "High Speed Mode".

    Here are the performance results.

    Samsung PRO+ 32GB
    File size 5 MB
    Buffer size 32768 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    12705.96,9386,2089,2567
    12420.79,9406,2108,2627

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    15139.02,4224,1739,2162
    15093.14,3998,2115,2168
    SanDisk Extreme Pro 32GB
    File size 5 MB
    Buffer size 32768 bytes
    Starting write test, please wait.

    write speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    11972.92,5780,1645,2716
    11830.73,6632,1660,2744

    Starting read test, please wait.

    read speed and latency
    speed,max,min,avg
    KB/Sec,usec,usec,usec
    18244.46,2067,1696,1794
    18311.53,2067,1763,1794

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •