SD card questions

I got a T4.1 for Christmas (yay! should have posted about it though) and I've been messing with the SD card slot. I want some guidance on how to write code with the SdFat library.

  1. Which ways can I properly initialize the SD card for T4.1? Would overclocking (the microcontroller) make any difference? Right now I do it kind of like sd.begin(SdioConfig(FIFO_SDIO))
  2. How would I get the maximum amount of speed (both reading and writing) out of my SD card (as it is)?
  3. How could I perform raw reads/writes directly to the SD card (no files)? (mainly to gain extra SD card space)
  4. How could I execute a program (binary format of course) stored on the SD card? (I know that unlike AVR, ARM probably allows for this)
 
Last edited:
Quick answers:

Recommend starting with File > Examples > SD > SdFat_Usage. This gives compatibility with programs written for SD.h but you can access all the SdFat functions if you really want.

Overclocking SDIO isn't really an option unless you dive into the SdFat driver code. Definitely not a good way to get started. The SDIO hardware is quite complicated.

Performance varies wildly depending on read and write size. If you always read and write in 4096 byte blocks, you'll get much better performance with most cards than using very small access. Extremely large access, as modern cameras do, gives the card's full performance. But unless you have the PSRAM chip, buffers that large probably aren't practical.

Raw sector access can be done by creating a C++ class which inherits the block device class from SdFat. Some programming required. Probably also not a good way to get started. Modern cards are so large and FAT overhead is so small that you'll get very little percentage-wise by going to a lot of trouble to do direct non-filesystem access to the media.

With FAT64 (aka "exfat") cards, pretty much any card larger than 32GB which hasn't been reformatted to FAT32, you can create large continuously allocated files. That gives you the advantages of direct media access while remaining compatible with putting the card into a PC.

Creating programs to run dynamically loaded into RAM requires diving down a deep rabbit hole of messing with linker scripts. It is theoretically possible and it has been discussed many times on this forum. As far as I know, nobody has really made it work in a satisfying way.
 
Any possible examples of code for my questions? (the ones which can easily be answered)

Recommend starting with File > Examples > SD > SdFat_Usage. This gives compatibility with programs written for SD.h but you can access all the SdFat functions if you really want.
I want to use the SdFat library, not SD.

Overclocking SDIO isn't really an option unless you dive into the SdFat driver code. Definitely not a good way to get started. The SDIO hardware is quite complicated.
I meant the MCU, not SDIO.
 
Last edited:
The code included for SdFat is the full source library. Only some glue edits for PJRC inclusion and use as SD.h replacement.

If not used through SD interfaces the features of SdFat are there 'natively' and the library examples for SdFat will work when not mixed with SD.h style usage.

The PJRC SD.h interface results in the best throughput for read and write speeds the card in use can provide given that code nearing 20 mBits/sec - where write seems under 20 and read can be over 20.

There are posts showing the perf example and buffer size edits where the 4K buffer as Paul notes approach the max the hardware can interface with the card.
Like: Maximum-average-throughput-to-SD-card-on-T4-1

> Not sure Processor OC helps or changes anything - could test with Perf Sketch or other.
> Not sure if the SdFat library presents examples of RAW usage as desired- seems it has a number of examples and documentation (Doxygen ZIP included in library) as it is a well developed library.
github.com/greiman/SdFat
"SdFat/examples/examplesV1/RawWrite/RawWrite.ino"
Code:
 * This program illustrates raw write functions in SdFat that
 * can be used for high speed data logging.

DOC:: ...\teensy\hardware\avr\0.58.3\libraries\SdFat\doc\html\index.html
Code:
<h3>Replace the content of the html folder by unzipping html.zip.</h3>
<h3>I have zipped the documentation since Doxygen changes every file each time it runs.</h3>
<h3>This makes viewing changes on GitHub difficult.</h3>
<p> </p>
 
Recommend starting with File > Examples > SD > SdFat_Usage. This gives compatibility with programs written for SD.h but you can access all the SdFat functions if you really want.
I tried putting the SdFat_Usage begin() functions into my code and they don't work. sdInitErrorHalt shows SdError: 0x1, 0xFF. So those don't work.

Again, I don't want to use anything in SD.h.
 
Since Teensyduino 1.54, everything is SdFat. The SD.h library for Teensy is just inline functions that actually use SdFat.

I'm guessing you probably saw advice online that SdFat is efficient and the old SD library is slow or lacks features. This is true for other boards. It was true for Teensy with version 1.53 and earlier. But starting with 1.54 we got rid of the old SD library and replaced it with thin compatibility SD.h that just uses SdFat. With Teensy you're not losing anything by using the simpler SD.h.

So please, do yourself a favor at least while trying to thing things working and go with the simple SD library examples, at least just to confirm your hardware is working.
 
Since Teensyduino 1.54, everything is SdFat. The SD.h library for Teensy is just inline functions that actually use SdFat.
Oh... I was confused, sorry

I'm guessing you probably saw advice online that SdFat is efficient and the old SD library is slow or lacks features. This is true for other boards
Yup, I did before I got my T4.1. Never found anything about SD being the same as SdFat.
I guess I'll have to switch libraries now.
 
Also, about #4: Could I read the code into a standard array or malloc() array, and then use __attribute__ ((section (".ramfunc")))? Or is there any other way users have proposed?
 
Also, about #4: Could I read the code into a standard array or malloc() array, and then use __attribute__ ((section (".ramfunc")))? Or is there any other way users have proposed?

How big is the code on the SD card? Known to be small or large or unknown? Is it a self standing snippet() that doesn't call anything to be linked?

Code can't execute from RAM2 (per the .LD file - OLD??), and in RAM1: variables:6944, code:34176, padding:31360 : the Padding varies in what is left of the 32KB block.

I've posted code locating that leftover ITCM region - if you put stuff there it could be called?

Not seen others sideloading code to run having/solving that problem.

Maybe the .LD changed? {for lockable Teensy?) Seems RAM2 can 'x':Execute now? And RAM1:DTCM seems like it can? Misreading this?
Code:
MEMORY
{
	ITCM (rwx):  ORIGIN = 0x00000000, LENGTH = 512K
	DTCM (rwx):  ORIGIN = 0x20000000, LENGTH = 512K
	RAM (rwx):   ORIGIN = 0x20200000, LENGTH = 512K
	FLASH (rwx): ORIGIN = 0x60000000, LENGTH = 7936K
	ERAM (rwx):  ORIGIN = 0x70000000, LENGTH = 16384K
}

Was p#4 SD.h~=SdFat wrong or just not read?
 
Creating programs to run dynamically loaded into RAM requires diving down a deep rabbit hole of messing with linker scripts.

It seems this works... maybe? I hope it's not causing undefined behavior.
Code:
void setup(){
  Serial.begin(9600);
  char a[]="\xfe\xff\xff\xea";
  ((void(*)(void))a)();
  Serial.println("this shouldn't be reached");
}

void loop(){}
 
You can find an example here:

https://github.com/PaulStoffregen/c...cdd222992d1f788ab66a341/teensy3/eeprom.c#L133

Scroll up to line 116 for the array initialization, and scroll down to line 382 for the actual asm code.

I get it 😉

I did figure out it runs in Thumb mode (that's kind
of why I was confused on my last post along with the endianness).

So couldn't I just read() x number of bytes into an array/vector/malloc() and then execute it like that? Or are there other possible ways?
 
The answer is yes, you can. But a few important caveats exist.

The easy part is the array must be 16 bit aligned. That's why it's an array of uint16_t in the core library, not an array of char as in msg #10.

By default the MPU is configured to disallow execution from memory normally used for data. You'll probably need to reconfigure the MPU, but that's also pretty simple.

The harder part is code you put into the array must be position independent, because you don't necessarily know the absolute address of your buffer. For simple functions with only small size local variables, this works pretty well. But if you compose a larger program, how to handle global and static variables becomes an issue. Likewise if you will need malloc or C++ new.

An even harder (perhaps impossible) issue is security. You are implicitly trusting the code you've read from that SD card by allowing it to run on hardware without a MMU or other strong security measures. You could try to use the MPU and ARM root/user modes and maybe even trust zone, but those features are relatively weak. If the code on that SD card can't be fully trusted (any opportunity for an attacker to modify it) this sort of loading and running code is almost certain to lead to security vulnerabilities. Maybe that's not a concern for your use, but really needs to be mentioned for anyone who later finds this conversation by search. This sort of thing is a serious security risk.
 
The address will (probably) change as you make any edits to your program. All the files you've created on SD cards that are position dependent code will need to be rebuilt every time the buffer address changes. Maybe that's fine for your application?
 
Now I realize the challenge that comes with loading programs from an SD card... It's basically forcing you to make an executable format from scratch and that isn't easy lol.
 
If you really want to do this, maybe you're now ready for that linker scripts rabbit hole (as mentioned in the last paragraph of msg #2).

To avoid the need for a dynamic loading process that re-writes all the absolute addresses, you'll probably want to edit Teensy 4.1's linker script. Or maybe you could just reuse the existing hab_log section from line 70. The idea is to create a big memory buffer that will always be located at a fixed known location in memory. Today the only thing that uses hab_log is the LockSecureMode sketch. To see it, you need Arduino 1.8.x (not yet available on Arduino 2.0.x due to issue #58) and click Tools > Teensy 4 Security. The last button in the Teensy 4 Security opens the LockSecureMode sketch. Look for the comment "do not delete this line", though that probably won't be an issue once your program has code which reads bytes from the SD card and puts them into your known-location buffer.

Then of course you'll also need to create a linker script for compiling programs which use that buffer. Presumably you'll dedicate part of the buffer for code, part for static/global variables, and part for the stack. Or maybe your programs will just use the existing stack? Presumably you'll create a special section similar to "ivt" that tells your program where the loaded code's entry point is? Or maybe you'll have a special section at the beginning and just 1 function in compiled program which uses it, so you can just jump to the beginning? So many choices...

Anyway, it is theoretically possible to craft linker scripts that give you known fixed addresses. This questions has come up a number of times and so far (as far as I know) nobody has actually put in the effort and shared any working results. If you do, I hope you'll share it?
 
If you really want to do this, maybe you're now ready for that linker scripts rabbit hole (as mentioned in the last paragraph of msg #2).

To avoid the need for a dynamic loading process that re-writes all the absolute addresses, you'll probably want to edit Teensy 4.1's linker script. Or maybe you could just reuse the existing hab_log section from line 70. The idea is to create a big memory buffer that will always be located at a fixed known location in memory. Today the only thing that uses hab_log is the LockSecureMode sketch. To see it, you need Arduino 1.8.x (not yet available on Arduino 2.0.x due to issue #58) and click Tools > Teensy 4 Security. The last button in the Teensy 4 Security opens the LockSecureMode sketch. Look for the comment "do not delete this line", though that probably won't be an issue once your program has code which reads bytes from the SD card and puts them into your known-location buffer.

Then of course you'll also need to create a linker script for compiling programs which use that buffer. Presumably you'll dedicate part of the buffer for code, part for static/global variables, and part for the stack. Or maybe your programs will just use the existing stack? Presumably you'll create a special section similar to "ivt" that tells your program where the loaded code's entry point is? Or maybe you'll have a special section at the beginning and just 1 function in compiled program which uses it, so you can just jump to the beginning? So many choices...

Anyway, it is theoretically possible to craft linker scripts that give you known fixed addresses. This questions has come up a number of times and so far (as far as I know) nobody has actually put in the effort and shared any working results. If you do, I hope you'll share it?
As you said, that definitely sounds challenging. I don't write linker scripts so I can't try.

Also I've wanted to talk about #1 and #2 again. I tried using the SdFat_Usage sketch and the only begin() functions which work are the two using SdioConfig(). If I try to use SdSpiConfig() in any way I get "initialization failed!". I set my CS pin correctly, to BUILTIN_SDCARD. Why isn't SdSpiConfig() working for me?
 
I set my CS pin correctly, to BUILTIN_SDCARD. Why isn't SdSpiConfig() working for me?

No, that's not correct for SdFat.

Setting the CS pin to BUILTIN_SDCARD is only for the SD library SD.begin(). It is a SD library convention which does not apply to SdFat.

When you use SdFat functions, either SdFat only or Teensy's SD library but with SD.sdfs to access the underlying SdFat library, you must use SdioConfig to access the SDIO hardware or SdSpiConfig to access SPI hardware. SdFat's SdSpiConfig doesn't have any concept of a special constant BUILTIN_SDCARD.

SdSpiConfig certainly should work if you connect a SD card to the SPI pins, as the audio shield does.

Now for some guesswork... maybe you're asking this question because you want a way to use SDIO with a different clock frequency? In msg #2, I said "Overclocking SDIO isn't really an option unless you dive into the SdFat driver code." No API exists to control the SDIO clock frequency, like is done with SPI clock frequency. The only way is to edit the driver-level code within SdFat.

And as I also mentioned in msg #2, writing your code to transfer large block sizes has a large impact on real world performance. SD card command latency almost always plays a larger factor than raw transfer rate for SDIO or sometimes even SPI.
 
No, that's not correct for SdFat.

Setting the CS pin to BUILTIN_SDCARD is only for the SD library SD.begin(). It is a SD library convention which does not apply to SdFat.

When you use SdFat functions, either SdFat only or Teensy's SD library but with SD.sdfs to access the underlying SdFat library, you must use SdioConfig to access the SDIO hardware or SdSpiConfig to access SPI hardware. SdFat's SdSpiConfig doesn't have any concept of a special constant BUILTIN_SDCARD.

SdSpiConfig certainly should work if you connect a SD card to the SPI pins, as the audio shield does.

Now for some guesswork... maybe you're asking this question because you want a way to use SDIO with a different clock frequency? In msg #2, I said "Overclocking SDIO isn't really an option unless you dive into the SdFat driver code." No API exists to control the SDIO clock frequency, like is done with SPI clock frequency. The only way is to edit the driver-level code within SdFat.

And as I also mentioned in msg #2, writing your code to transfer large block sizes has a large impact on real world performance. SD card command latency almost always plays a larger factor than raw transfer rate for SDIO or sometimes even SPI.
I was asking about the best way to initialize the SD card and how I could get max performance out of it. I also had asked if overclocking the Teensy itself (not the SDIO) would make any difference.
 
I was asking about the best way to initialize the SD card and how I could get max performance out of it. I also had asked if overclocking the Teensy itself (not the SDIO) would make any difference.

That reply suggests proper setup - both methods resolve to SdFat code - but getting to SDIO is unique - provided examples should make it clear. Once they are setup they are using the Max Perf Code - as long as they are fed proper sized buffers to maximize transaction efficiency, 4K or perhaps larger will show improvement to some larger size, THough the nature of SD cards is they can cause long waits on occasion as they do internal housekeeping. Pre allocation can put that off to some degree - or the post #4 not about RAW access perhaps in response to OP question.

... see p#4 ... Also as far as OC: > Not sure Processor OC helps or changes anything - could test with Perf Sketch or other.
> the SD card has its own process and needs to account for non volatile Flash write and management given the provided interface, it will only go so fast. Speed seems to approach 20 MB/sec writes and maybe 25 MB/sec for reads.
 
Trying again the answer these questions as clearly as I can.


I was asking about the best way to initialize the SD card

For SDIO used on Teensy 4.1's built in SD socket, there is only 1 way to initialize. Unlike SPI where you can choose various options, you don't get any configuration choices with SDIO.

The best way is the only way!


and how I could get max performance out of it.

Read and write in large blocks, at least 4K.


I also had asked if overclocking the Teensy itself (not the SDIO) would make any difference.

For the raw SDIO speed, no difference. The SDIO hardware runs in its own clock domain which remains consistent regardless of the CPU clock.

For overall application performance, faster CPU might help. Really depends on your code.

But just in case the answer hasn't been clearly stated, the main thing you can do that affects SD card performance is reading and writing in large blocks.
 
Back
Top