Teensyduino File System Integration, including MTP and MSC

Thought I would take a minor diversion from this part of the problem and look at the SD.transfer stuff mentioned on the issue: Here is the class definition for the SPI wrapper.

Paul, I am going to convert to newer SPI.transfer(buf, retbuf, cnt) version to see if that helps.

This is in: sdfat\src\SpiDriver\SdSpiTeensy3.cpp

Code:
void SdSpiArduinoDriver::send(const uint8_t* buf , size_t count) {
#if USE_BLOCK_TRANSFER
  uint32_t tmp[128];
  if (0 < count && count <= 512) {
    memcpy(tmp, buf, count);
    m_spi->transfer(tmp, count);
    return;
  }
#endif  // USE_BLOCK_TRANSFER
  for (size_t i = 0; i < count; i++) {
    m_spi->transfer(buf[i]);
  }
}
#endif  // defined(SD_USE_CUSTOM_SPI) && defined(__arm__) &&defined(CORE_TEENSY)

Will switch to:
Code:
void SdSpiArduinoDriver::send(const uint8_t* buf , size_t count) {
#if USE_BLOCK_TRANSFER
  if (0 < count && count <= 512) {
    m_spi->transfer(tmp, nullptr, count);
    return;
  }
#endif  // USE_BLOCK_TRANSFER
  for (size_t i = 0; i < count; i++) {
    m_spi->transfer(buf[i]);
  }
}
#endif  // defined(SD_USE_CUSTOM_SPI) && defined(__arm__) &&defined(CORE_TEENSY)

Ditto for changes for send:
Code:
  memset(buf, 0XFF, count);
  m_spi->transfer(buf, count);
converts to:
Code:
 m_spi->setTransferWriteFill(0xFF);  // what to transfer. 
  m_spi->transfer(nullptr, buf, count);

I doubt it will be a giant change in performance. but...


Note: Also with the the above I mentioned
Code:
bool begin(uint8_t csPin = 10, uint32_t maxSpeed=SD_SCK_MHZ(16), uint8_t opt=0) {
Maybe it should also include Which SPIPort you wish to transfer to, as that is an option on that object...
 
I doubt it will be a giant change in performance. but...

Tried it here. No significant change.

But here is a change which makes SHARED_SPI format almost as fast as DEDICATED_SPI :)

Code:
bool FatFormatter::initFatDir(uint8_t fatType, uint32_t sectorCount) {
  size_t n;
  memset(m_secBuf, 0, BYTES_PER_SECTOR);
  writeMsg("Writing FAT ");
  uint8_t *bigbuf = (uint8_t *)malloc(BYTES_PER_SECTOR * 32);
  if (bigbuf) {
    memset(bigbuf, 0, BYTES_PER_SECTOR * 32);
    uint32_t dotcount = 0;
    uint32_t i = 1;
    while (i < sectorCount) {
      uint32_t len = 32 - (i % 32);
      if (!m_dev->writeSectors(m_fatStart + i, bigbuf, len)) {
        return false;
      }
      i += len;
      dotcount += len;
      if (dotcount >= sectorCount/32) {
        dotcount -= sectorCount/32;
        writeMsg(".");
      }
    }   
    free(bigbuf);
  } else {
    for (uint32_t i = 1; i < sectorCount; i++) {
      if (!m_dev->writeSector(m_fatStart + i, m_secBuf)) {
        return false;
      }
      if ((i%(sectorCount/32)) == 0) {
        writeMsg(".");
      }
    }
  }
  writeMsg("\r\n");
  // Allocate reserved clusters and root for FAT32.
  m_secBuf[0] = 0XF8;
  n = fatType == 16 ? 4 : 12;
  for (size_t i = 1; i < n; i++) {
    m_secBuf[i] = 0XFF;
  }
  return m_dev->writeSector(m_fatStart, m_secBuf) &&
         m_dev->writeSector(m_fatStart + m_fatSize, m_secBuf);
}

Only downside is we temporarily allocate 16K on the heap.

It might also slightly slow DEDICATED_SPI and SDIO, but probably not by much.

I'm sure with more work we could do this without any extra memory, and no impact to other hardware cases. But that would require adding a special zeroSectors() function to BlockDeviceInterface, with a default implementation that just calls writeSector() in a loop, but then of course an implementation inside SdSpiCard which looks like writeSectors() but just writes zero to every sector rather than using a buffer in memory. But that's making a lot of changes throughout SdFat....
 
But here is a change which makes SHARED_SPI format almost as fast as DEDICATED_SPI

Re-ran @KurtE's simple test sketch to get the format timings after I made the changes to initFatDir

Using the audio shield I am seeing something a couple of seconds slower:
Code:
Format Done
Format complete [B]5753[/B]
or 4.3 seconds on a 32GB card sansdisk Ultra.

So now swapping to the Extreme on the audio shield just to compare:
Code:
Format done
Format complete [B]11391[/B]
or 2.4seconds or about the same. Looks like we have to be careful with the cards we use for testing :)

Using the our modified sd formatter code where we throw it into dedicated before formating and back to shared:
Code:
Format done
Format complete [B]2364[/B]

This is at 16Mhz.
 
Last edited:
Ok, let's try to do the switch from SHARED_SPI to DEDICATED_SPI. I see Bill just recommended that on the github issue.

I'll try to work it into SD.cpp later today. There's also a lingering bug where mediaPresent() isn't using the full config when restarting the library after a new card is inserted. First I need to dig a little deeper into how Bill is actually implementing those config settings....
 
Run with Paul's code on 32gb samsung card...
Code:
Initializing SD card...Free Cluster Count: 976978 dt: 7167
2021-01-01 00:00         11 File0
2021-01-01 00:00         11 File1
2021-01-01 00:00         11 File2
2021-01-01 00:00         11 File3
2021-01-01 00:00         11 File4
Press any key to reformat disk
Writing FAT ................................
Format Done
Format complete 7055
done!
Sketch created a few files... Probably need to add something for SDFat for setting proper date... But the format took 7 seconds at 16mhz SHARED, which is a lot better.
 
Ok, let's try to do the switch from SHARED_SPI to DEDICATED_SPI. I see Bill just recommended that on the github issue.

I'll try to work it into SD.cpp later today. There's also a lingering bug where mediaPresent() isn't using the full config when restarting the library after a new card is inserted. First I need to dig a little deeper into how Bill is actually implementing those config settings....

You might want to see the question I just posted to Bill on Github. I was doing the same thing with getting the config settings this morning.
 
Ok, let's try to do the switch from SHARED_SPI to DEDICATED_SPI. I see Bill just recommended that on the github issue.

I'll try to work it into SD.cpp later today. There's also a lingering bug where mediaPresent() isn't using the full config when restarting the library after a new card is inserted. First I need to dig a little deeper into how Bill is actually implementing those config settings....

Paul, it will interesting. What is totally unclear to me is if the DEDICATED will then not set/clear the CS pins when the SD is in use or just sets it low at begin and assumes it has full use of SPI at all times...

From his posts, also was unclear if he thinks we should run it all of the time or just for things like format and getting free count...

So will be interesting to see if for example you can try it on something like Audio board with Flash connected and see if both work.
 
I was doing the same thing with getting the config settings this morning.

Looks like we can't easily read the config back out of SdFat, unless we patch the code. It doesn't actually store the config, but rather uses it to initialize driver objects which are then accessed though a base class, so we don't have (easy) access to the info.
 
That was what I was noticing as well. When I suggest meybe we should change the SD.begin()

To be more like:
Code:
bool begin(uint8_t csPin = 10, uint32_t maxSpeed=SD_SCK_MHZ(16), uint8_t opt=0, SPIClass & spi) {

And then change all of the examples that you say you can go through and go direct to the underlying sdfs object,
that you would recommend that you call through the SD begin method as this allows us to cache the data...
 
Quick update on DEDICATED... Note I hacked my format sketch some, to create a couple of files and then do an LS then wait for you to hit enter... does format and LS again...

All without SD library.
Code:
// List files, format, list files
#include "SdFat.h"
#include "sdios.h"
#include <TimeLib.h>

const int chipSelect = 10; // BUILTIN_SDCARD;
#define SPI_SPEED SD_SCK_MHZ(16)  // adjust to sd card 
SdFs sd;
FsFile file;
char filename[10];
void setup()
{
  // set the Time library to use Teensy 3.0's RTC to keep time
  setSyncProvider(getTeensy3Time);
  // Open serial communications and wait for port to open:
  Serial.begin(9600);
  while (!Serial) {
    ; // wait for serial port to connect.
  }
  pinMode(6, OUTPUT);
  digitalWriteFast(6, HIGH);
  pinMode(chipSelect, OUTPUT);
  digitalWriteFast(chipSelect, HIGH);
  // Set callback
  FsDateTime::setCallback(dateTime);

  Serial.printf("Now: %u/%u/%u %u:%02u:%02u\n", month(), day(), year(), hour(), minute(), second());

  Serial.print("Initializing SD card...");
  if (!sd.begin(SdSpiConfig(chipSelect, DEDICATED_SPI, SPI_SPEED))) {
  //if (!sd.begin(SdSpiConfig(chipSelect, SHARED_SPI, SPI_SPEED))) {
    //if (!sd.begin(chipSelect)) {
    Serial.println("initialization failed!");
    return;
  }
  // lets create a few files
  for (uint8_t i = 0; i < 5; i++) {
    sprintf(filename, "File%u", i);
    file.open(filename, O_RDWR | O_CREAT | O_TRUNC);
    file.println("Some text");
    file.close();
  }
  // Lets print out free cluster count:
  delay(10);
  elapsedMillis emFormat = 0;
  Serial.printf("Free Cluster Count: %u dt: ", sd.freeClusterCount());
  Serial.println(emFormat);
  sd.ls(LS_DATE | LS_SIZE);

  Serial.println("Press any key to reformat disk");
  while (Serial.read() == -1);
  while (Serial.read() != -1) ;
  emFormat = 0;
  sd.format(&Serial);
  Serial.printf("Format complete %u\n", (uint32_t)emFormat);
  sd.ls(LS_R | LS_DATE | LS_SIZE);
  Serial.println("done!");
}

void loop()
{
  // nothing happens after setup finishes.
}
//------------------------------------------------------------------------------
// Call back for file timestamps.  Only called for file create and sync().
void dateTime(uint16_t* date, uint16_t* time, uint8_t* ms10) {

  // Return date using FS_DATE macro to format fields.
  *date = FS_DATE(year(), month(), day());

  // Return time using FS_TIME macro to format fields.
  *time = FS_TIME(hour(), minute(), second());

  // Return low time bits in units of 10 ms.
  *ms10 = second() & 1 ? 100 : 0;
}
//------------------------------------------------------------------------------
time_t getTeensy3Time()
{
  return Teensy3Clock.get();
}
Put in the Date/Time code as well.
So test run on 32GB disk, external dedicated.
Code:
Now: 10/9/2021 7:57:05
Initializing SD card...Free Cluster Count: 976986 dt: 2132
2021-10-09 07:57         11 File0
2021-10-09 07:57         11 File1
2021-10-09 07:57         11 File2
2021-10-09 07:57         11 File3
2021-10-09 07:57         11 File4
Press any key to reformat disk
Writing FAT ................................
Format Done
Format complete 4355
done!
But issue is that the CS pin is not working properly in this case. It does for some of the run but not all.
screenshot.jpg

The gap between parts was waiting for me to type something to startup the format...

Another interesting this is some of the places where I see the CS changing state, it does not look like the SPI transfer is totally completed. Example just before I try to find out how much space used, I put in delay(10)
which if look here:
screenshot2.jpg
You see the CS pin went hight and it looked like another byte was transferred. Agan not sure what the rules are here for this? Maybe your new format code is missing some call it expects to say I am done...
 
Looks like we can't easily read the config back out of SdFat, unless we patch the code. It doesn't actually store the config, but rather uses it to initialize driver objects which are then accessed though a base class, so we don't have (easy) access to the info.

That's what I was finding this morning as well that's why I asked Bill. As you said the only way to do it is to patch the code to make it accessible. But then Bill also mentioned in beta 2.1.1 he is redoing shared and dedicated so not sure we want to go down that route.

I may have to play with the mscFormatter code to see what we need to implement it - it is faster but still not sure 100% why. But it does work.
 
Did a little playing with Free Cluster count. On big external...
Thought I would play in case the DEDICATED turns into another Rabbit hole :D

Example sketch:
Code:
// List files, format, list files
#include "SdFat.h"
#include "sdios.h"

const int chipSelect = 10; // BUILTIN_SDCARD;
#define SPI_SPEED SD_SCK_MHZ(16)  // adjust to sd card 
SdFat32 sd;
FatFile file;
char filename[10];
void setup()
{
  Serial.begin(9600);
  while (!Serial) {
    ; // wait for serial port to connect.
  }
  pinMode(6, OUTPUT);
  digitalWriteFast(6, HIGH);
  pinMode(chipSelect, OUTPUT);
  digitalWriteFast(chipSelect, HIGH);
  Serial.print("Initializing SD card...");
  //if (!sd.begin(SdSpiConfig(chipSelect, DEDICATED_SPI, SPI_SPEED))) {
  if (!sd.begin(SdSpiConfig(chipSelect, SHARED_SPI, SPI_SPEED))) {
    //if (!sd.begin(chipSelect)) {
    Serial.println("initialization failed!");
    return;
  }
}

void loop()
{
  elapsedMillis em = 0;
  Serial.printf("Free Cluster Count: %u dt: ", sd.freeClusterCount());
  Serial.println(em);

  em = 0;
  Serial.printf("Prototype Free Clusters  %d dt: ", TestFreeClusterCount());
  Serial.println(em);

  Serial.println("Press any key to run again");
  while (Serial.read() == -1) ;
  while (Serial.read() != -1) ;
}

#define CCSECTORS_PER_READ 8
uint8_t fcc_buffer[CCSECTORS_PER_READ * 512];

int32_t TestFreeClusterCount() {
  uint32_t free_count = 0;
  uint32_t first_sector = sd.fatStartSector();
  uint32_t sectors_left = sd.sectorsPerFat();
  SdCard *card = sd.card();

  uint32_t clusters_per_sector;
  switch (sd.fatType()) {
    default: return -1; // not one we handle.
    case FAT_TYPE_FAT16: clusters_per_sector = 512 / 2; break;
    case FAT_TYPE_FAT32: clusters_per_sector = 512 / 4; break;
  }

  int32_t clusters_to_do = sd.clusterCount() + 2;

  while (sectors_left) {
    uint32_t sectors_to_read = (sectors_left < CCSECTORS_PER_READ) ? sectors_left : CCSECTORS_PER_READ;
    if (!card->readSectors(first_sector, fcc_buffer, sectors_to_read)) {
      Serial.printf("Failed to read sectors: %u cnt: %u\n", first_sector, sectors_to_read);
      return -1;
    }
    // now lets process the data that we read in.
    uint16_t cnt = clusters_per_sector * sectors_to_read;
    if (cnt > clusters_to_do) cnt = clusters_to_do;
    clusters_to_do -= cnt; // update count here...

    if (clusters_per_sector == 512 / 2) {
      // fat16
      uint16_t *fat16 = (uint16_t *)fcc_buffer;
      while (cnt-- ) {
        if (*fat16++ == 0) free_count++;
      }
    } else {
      uint32_t *fat32 = (uint32_t *)fcc_buffer;
      while (cnt-- ) {
        if (*fat32++ == 0) free_count++;
      }
    }

    // update counts of sectors left to read and
    // starting position i
    sectors_left -= sectors_to_read;
    first_sector += sectors_to_read;
  }
  return free_count;
}

As you can see with this Card getting free count at 16mhz is taking nearly 10 seconds. But this run with reading 8 sectors at time is a bit faster:
Code:
Initializing SD card...Free Cluster Count: 976991 dt: 9458
Prototype Free Clusters  976991 dt: 2751
Press any key to run again
When I ran with just 4 sectors at a time it was 3.3 seconds

Note some runs of the standard SD version of 1 per call was as low as about 7.2 seconds

With 32 reads at a time it took about 2.3 seconds.

Thoughts?
 
Thoughts?
Just tried running your sketch on the Audio shield SDCard and got this:
Code:
Initializing SD card...Free Cluster Count: 973583 dt: 5105
Prototype Free Clusters  973583 dt: 2480
Press any key to run again

Free Cluster Count: 973583 dt: 0
Prototype Free Clusters  973583 dt: 2485
Press any key to run again
This is with the Sansdisk Ultra card. Have some smaller coming today to test with 8 and 16GB

I would say its looking good.
 
Are you really sure you want this? Usually this means no updates anymore, no bugfixes, no new features.. :-(
 
@All - Think I am finally getting caught up you guys. Now have MTP_T4 up and running on my Linux machine:) The only thing not working is SD8 external SPI device as expected. Still waiting for my Teensy MicroMod and ATP carrier boards. Been 10 days now and several delivery times that go to pending regularly:(
 
Sounds like we're going to end up with Teensyduino using a pretty substantial fork from Bill's original SdFat code...
Some of these changes may not be overly substantial, like cherry picking some features/issues that are really hurting some things, and fix those. Those chould not be overly difficult to then be able to pick up new things.

However then there are things like multiple file, async... stuff that could require drastic stuff...
Would be really nice if File() would know the type of media and interface. Sorry for repeating myself.
I sort of agree, ditto for FS... But not sure how and the like:
Obviously could have each of our know File systems maybe set a value like, where maybe SD returns 0, and LittleFS returns 1, ... But within each of these for example if I have an SD object, I don't have any easy described way to know if this is on SDIO or SPI or...

I was thinking we could maybe introduce runtime support for dynamic casting.... (Just kidding)

So Is there a way to easily support for example cast a File pointer to an SDFile pointer... Especially when classes might have multiple inheritance?
 
Are you really sure you want this?

No, I'm not so sure, for exactly the reasons you mentioned.

But if Bill isn't willing to improve performance in the many ways we need, or if a major redesign will bring those sorts of changes in the distant future, then we're faced with a very difficult choice, aren't we?



.... or 2.4seconds or about the same. Looks like we have to be careful with the cards we use for testing :)

Ok, here's a 2nd attempt at bringing format() with SHARED_SPI up to speed. You should get pretty similar speed as DEDICATED_SPI.

https://github.com/PaulStoffregen/SdFat/commit/76b080497d4c0dc47454a04f60c07261151ce16a

This is on a "writeSectorsSame" branch on my fork on github, it just grabbing the whole library is simpler than updating 4 files.

One downside is the dots no longer print during the lengthy write operation. At least right now I'm not feeling so sad about missing that.
 
I sort of agree, ditto for FS... But not sure how and the like:
The implementations know it in most of the cases. i.e. Littlefs knows wether it works on qspi ram, qspi flash etc.
Maybe a wrapper from SD can just store if it works with SPI. SD.begin knows the CS-PIN number (or "builtin") Or just "unknown" if there is *really* no other way. SD is a special case.

A global enum would be perfect. Can be a single enum.
SD_SDIO, SD_SPI, LITTLFS_QSPI_RAM, LITTLFS_QSPI_FLASH, LITTLFS_QSPI_PSRAM, LITTLFS_SPI_.... etc
Esp. it would be know to because of (perhaps needed for the program) cache-handling, stopping interrupts, mutexes etc...

p.s. as long as RTTI would be stored in the flash, it wouldn't hurt if it was large. RTTI would be even better.
So.. you all say it is not possible because crazy large. How big is it? How much aditional memory does it need?
 
Last edited:
Just tested it with test-program on PC (FastCRC example)

1.) Size of executable does _not_ change if I don't use runtime-typeinfo.
2.) #include <typeinfo> adds again nothing
3.) printing a line (with the include) with the class name adds ~2kb. ( printf("%s \n",typeid(CRC32).name()); )
4.) printing a second class name adds ~200 Bytes.

On PC.

I don't see incredible increase of size?!? Who said this, and where does this info come from?
And if there would be a way to put into flash - great.

You were talking about "16KB caches"...
 
Last edited:
Tried that on Teensy - seems like the .ld needs some tweaking. Does not link.
Code:
/hardware/tools/arm10/bin/../lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+dp/hard\libgcc.a(pr-support.o):(.ARM.exidx+0x2c): relocation truncated to fit: R_ARM_PREL31 against `.ARM.extab'
So.. can't say anything about memory on Teensy.
 
As you can see with this Card getting free count at 16mhz is taking nearly 10 seconds.
....
With 32 reads at a time it took about 2.3 seconds.

If we're going to end up with a significant fork of SdFat, we might as well put in malloc(16384) on the initial cluster count. Maybe we'll use less on Teensy 3.2.

Likewise, almost all SdFat usage for reading files via the cache could be expected to improve if we just increase from 1 data sector to 8 or more.
 
Ok, RTTI works flawlessly on Teensy. Tweaked the LD-File - works great. Can print class names of objects.
However, it seems Teensy size needs a tweak, too??. I can't see any size difference??
 
Ok, RTTI works flawlessly on Teensy. Tweaked the LD-File - works great. Can print class names of objects.
However, it seems Teensy size needs a tweak, too??. I can't see any size difference??

I have not used RTTI was back on bigger machines like > 20 years ago... So a bit rusty.
 
Back
Top