Teensyduino File System Integration, including MTP and MSC

Good morning all

USB Stuff: Most of the history for PFsLib and Partition formatting can be found on this thread: Many TLAs: MTP MSC FS SD SDFat LittleFS UsbMSCFat to work with each other 8) . You will see all the gory details on getting exFAT/FAT formatter working for partitions.

For the volume additional info was needed sdfat primarily on partition sizes, clustercounts etc....

One of the things we did was create a wrapper for exFAT/Fat Formater
Code:
formatter(PFsVolume &partVol, uint8_t fat_type=0, bool dump_drive=false, bool g_exfat_dump_changed_sectors=false, Stream &Serialx=Serial);
which probably would fit nicely into FS class wrapper.

If you look at the MSCFormatte.ino you will see how we integrated all. Just a couple of notes:
1. myusb.Task() is from USBHost
2. the callback that we set up for USB_MSC has the essenstials for checking whether a new drive is inserted or removed as well as formating.

Sure you all saw all of this but this is also so I remember......
 
Yes good morning all (at least in my time zone)

As Mike and myself have mentioned. At loot of the stuff in the PFSLib code has to do with partitions. We went through and had sketches do things like take an SD card and delete the partitions, recreate new ones, and format them...

But there is another interesting thing with the formatting, which I am not sure of why yet: I took my test sketch which was showing the file bleed through after format and simply timed the format.
Can post again whole file, but boils down to:
Code:
// List files, format, list files
#include <SD.h>
const int chipSelect = 10; // BUILTIN_SDCARD;

void setup()
{
  Serial.begin(9600);
  while (!Serial) {
    ; // wait for serial port to connect.
  }
  if (!SD.begin(chipSelect)) {
    Serial.println("initialization failed!");
    return;
  }
  elapsedMillis emFormat = 0;
  SD.format(0,'*', Serial);
  Serial.printf("Format complete %u\n", (uint32_t)emFormat);
}

void loop()
{
  // nothing happens after setup finishes.
}
}
And Output included: Format complete 48265

This was on an external SPI 32GB microSD.

Now ran the same setup with the MSCFormatter example sketch we did as part of the MSC stuff.

I put an elapsedMillis around this call and ran it on same drive: And our code currently does a format and print out some details on it and returns
in under 11 seconds.

Code:
Cards up to 2 GiB (GiB = 2^30 bytes) will be formated FAT16.
Cards larger than 2 GiB and up to 32 GiB will be formatted
FAT32. Cards larger than 32 GiB will be formatted exFAT.

Commands:
  f <partition> [16|32|ex] - to format
  v <partition> <label> - to change volume label
  d <partition> - to dump first sectors
  p <partition> - print Partition info
  l <partition> - to do ls command on that partition
  c -  toggle on/off format show changed data
  *** Danger Zone ***
  N <USB Device> start_addr <length> - Add a new partition to a disk
  R <USB Device> - Setup initial MBR and format disk *sledgehammer*
  X <partition> [d <usb device> - Delete a partition
Waiting up to 5 seconds for a USB drive 

Initialize SDIO SD card...
msc # Partition Table
	part,boot,bgnCHS[3],type,endCHS[3],start,length
FAT32:	1,0,0x82,0x3,0x0,0xC,0xFE,0xFF,0xFF,8192,31108096
pt_#0:	2,0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0,0
pt_#0:	3,0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0,0
pt_#0:	4,0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0,0
drive s Partition 0 valid

Initialize SPI SD card...
msc # Partition Table
	part,boot,bgnCHS[3],type,endCHS[3],start,length
FAT32:	1,0,0x82,0x3,0x0,0xC,0xFE,0xFF,0xFF,8192,62543872
pt_#0:	2,0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0,0
pt_#0:	3,0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0,0
pt_#0:	4,0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0,0
drive g Partition 0 valid

***** Partition List *****
0(10:1):>> Fat32:  Partition Total Size:15923150848 Used:32768 time us: 87726
1(11:1):>> Fat32:  Partition Total Size:32014073856 Used:32768 time us: 5605261
done...
Enter command:

 **** Start format partition 1 ****

PFsFatFormatter::format................Sector Count: 62543872, Sectors/MB: 2048
Partition Capacity (MB): 30539
Fat Type: 32
    m_dataStart:24576
    m_sectorsPerCluster:64
    m_relativeSectors:8192

format makeFAT32
 MAKEFAT32
m_dataStart: 16384, m_fatSize: 7633, r: 23467
m_dataStart: 24576, m_fatSize: 7632, r: 23465
    m_part: 0
    m_sectorCount: 62543872
    m_dataStart: 24576
    m_sectorsPerCluster: 64
    nc: 976864
    m_fatSize: 7632
partType: 12, m_relativeSectors: 8192, fatStart: 9312, fatDatastart: 32768, totalSectors: 62543872
PFsFatFormatter::initFatDir(32, 15328)
Writing FAT ..................................
free clusters after format: 976991
free clusters after begin on partVol: 976984
Format Done
Elapsed Time format: 10638
Press any key to run again

So why the big differences in speed.
One part of this is that our MSCFormatter sketch is initializing SPI at 33mhz
Code:
if(!sdSPI.begin(SdSpiConfig(SD_SPI_CS, SHARED_SPI, SPI_SPEED))) {
where as SD.begin(10) defaults to 16mhz

So would unnderstand if the SD.format() took in the nature of 20 seconds.

Probably need to investigate what the Fatlib formatter is doing differently.
 
Quick followup to previous post. To show that it is not our wrapper in SD library...
Code:
// List files, format, list files
#include "SdFat.h"
#include "sdios.h"
const int chipSelect = 10; // BUILTIN_SDCARD;
SdFat sd;

void setup()
{
  // Open serial communications and wait for port to open:
  Serial.begin(9600);
  while (!Serial) {
    ; // wait for serial port to connect.
  }

  Serial.print("Initializing SD card...");

  if (!sd.begin(chipSelect)) {
    Serial.println("initialization failed!");
    return;
  }
  Serial.println("Press any key to reformat disk");
  while (Serial.read() == -1);
  while (Serial.read() != -1) ;
  elapsedMillis emFormat = 0;
  sd.format(&Serial);
  Serial.printf("Format complete %u\n", (uint32_t)emFormat);

  Serial.println("done!");
}

void loop()
{
  // nothing happens after setup finishes.
}
Code:
Initializing SD card...Press any key to reformat disk
Writing FAT ................................
Format Done
Format complete 45632
done!

Will update this after change begin to 33mhz and see how much difference

EDIT: Changed to:
Code:
#define SPI_SPEED SD_SCK_MHZ(33)  // adjust to sd card 
  if(!sd.begin(SdSpiConfig(chipSelect, SHARED_SPI, SPI_SPEED))) {

Did not help, actually this run got worse:
Format complete 48847
 
But this is interesting:
I run it with: if(!sd.begin(SdSpiConfig(chipSelect, SHARED_SPI, SPI_SPEED))) {
And it takes this run 46+ seconds.

But now switch it to:
Code:
  pinMode(chipSelect, OUTPUT);
  digitalWriteFast(chipSelect, LOW);
  if(!sd.begin(SdSpiConfig(chipSelect, DEDICATED_SPI, SPI_SPEED))) {
And:
Format complete 2980


That is < 3 seconds!
 
Reran Kurt's format timing test on a SansDisk 32GB Ultra SD Card and got similar results:
Code:
[B]SD[/B]
Shared 16MHz = 23710
Shared 33MHz = 40367

Dedicated 16MHz = 4381
Dedicated 33MHz = 2550

[B]mscFormatter[/B]
Format complete = 7594
 
My understanding of DEDICATED_SPI is it makes use of a write multiple sectors command. But to do this, it must leave CS asserted low between individual sectors, because the library is designed around minimal RAM use. So if you have a flash chip with LittleFS or a sensor or any other SPI chip, and it tries to make use of SPI at the wrong time while SdFat is leaving CS asserted, all that non-SD communication could get written to the SD card and get the card state out of sync with SdFat. That's why it's called DEDICATED_SPI.

Writing multiple sectors is much faster because the underlying media has large block size. When SHARED_SPI writes just one 512 byte sector, the SD card has to do a read-modify-write operation on a huge block. Doing that over and over for each 512 byte sector is slow.
 
HI Paul,
FYI I did raise an issue over on the SDFat library(https://github.com/greiman/SdFat/issues/329) which I know you know about as you have posted on it :D

I totally understand, but to me there is a real question in Shared versus Dedicated...

When I make a call like from the test case:
Code:
  Serial.printf("Free Cluster Count: %u dt: ", sd.freeClusterCount());
  Serial.println(emFormat);

The FreeClusterCount is going to not return, until it has read in all of the sectors associated with the FAT in order to count all of the bits.
And we are not running on a multi-tasking environment, and even if we were, would not probably want other tasks to work on same SPI bus, until we were done...

So wondering if the is some happy medium?
Code:
Initializing SD card...Free Cluster Count: 976991 dt: 5596
Format complete 46467
Free cluster count takes about 5.6 seconds and format > 46 seconds.

With dedicated:
Code:
Initializing SD card...Free Cluster Count: 976991 dt: 1165
Format complete 2952

1.1 second and < 3 seconds...

So again wondering if there is a way to say: For this operation work dedicated...

I was actually shocked when I saw the differences in timing.
 
Paul,
Is some of this best to discuss here, email, or Github issue or ???

For example it looks like some of the differences in speed is in the SPI timing gap between bytes.

Here is SHARED:
screenshot.jpg

versus dedicated:
screenshot2.jpg

Note: these are in the reading in the Fat cluster sectors.

Then there are larger gaps, although hard to see where it is taking 5 times longer...
 
Paul - Kurt

While we are talking about SDFat with shared_spi vs dedicated_spi. I am curious as why when we use MSC formatter using PFsLib shows significant performance improvement over SDFat.

And before your ask for SPI and intializatializing we have
Code:
SdFs sdSPI;

and when we do the begin for the SD Card:
Code:
sdSPI.begin(SdSpiConfig(SD_SPI_CS, SHARED_SPI, SPI_SPEED))

So I am not sure why we are seeing roughly 8secons for msc fatFormater vs 40+ seconds sdFat Formatter.

Not trying to be contrary but trying to understand...
 
I'm a little concerned we may get bogged down in performance optimization rather than focusing the goal of MTP integration. Yes, I know Windows doesn't allow much time, even with optimizations, we can't depend on every card to always be fast. We have to properly handle the case of slow hardware.
 
I'm a little concerned we may get bogged down in performance optimization rather than focusing the goal of MTP integration. Yes, I know Windows doesn't allow much time, even with optimizations, we can't depend on every card to always be fast. We have to properly handle the case of slow hardware.

I totally hear you, but at times not sure what to do inside of MTP. For example if I take the formatstore, which is called by the appropriate message from Host and process it straight forward like:
Warning editing out some of the attempted work around stuff, may not compile...

Code:
uint32_t MTPD::formatStore(int len, int id, uint32_t storage, uint32_t p2) {
  printf(" MTPD::formatStore called\n");
  uint32_t store = Storage2Store(storage);
  uint8_t format_status = storage_->formatStore(store, p2);

  if (format_status) {
    storage_->ResetIndex(); // maybe should add a less of sledge hammer here.
    return MTP_RESPONSE_OK;
  }

  return MTP_RESPONSE_OPERATION_NOT_SUPPORTED; // 0x2005
}
And lets say a format normally completes within a reasonable time like less than 10 seconds, Then Windows shows a something happening status, bar sort of part fill cycling through and when we return a status OF MTP_RESPONSE_OK, that dialog goes away replaced by format successful, at which time windows sends a request for updated object list...

Now if we take 40+ seconds, Windows will error out, I hear a beep and MTP is toast until Teensy is rebooted...

With the Callback code version, I had hacks in place, that is I call to the Callback function and ask it if it thinks it can format itself in a reasonable time, The SD version I believe always said NOPE,
So I instantly returned to windows to windows saying OK and then windows gave you the all is clear dialog. In the mean time I then started up the real format. Then right after I return the status OK, I start an interval timer reading messages sent by windows, like give your updated list of objects. This timer code would return a status of I am busy... Without that again windows would time out that message and again MTP was toast.

With the FS only approach I don't have that callback to the storage to ask if it is going to be fast enough... I could do some real hack like if TotalSize of the storage is > X like 8GB assume NO and see if that works.

But in cases like this not sure if it is best to spend time understanding why it is some sssslllow and see if that can be easily resolved or to spend time working through avoidance and recovery code...

Hard to say.
 
@mjs513 @Paul and all:

Just merged in some changes from my WIP into Main.

So the format code sort works like before when we had callbacks, but instead just decides if the storage is over 8GB in size then do it with a callback. Else do it inline.

I played around with both T4.x and T3.6.

Warning I may have broken some case of building without FS Only mode.
 
but instead just decides if the storage is over 8GB in size then do it with a callback.

We need to eventually eliminate all non-FS access to storage for MTP to be brought into the core library. We can add more APIs in FS if needed, but relying on any media specific callback class means this can't be part of the core library.
 
Sorry bad choice of words..
Everything I am compiling now is FS only. And yes will be moving all of the None-FS only mode code and maybe create an archive of it and then remove from master.

Again I am not liking this code, but what is doing is: in MTPTeeensy.cpp, the method:
formatStore starts off like:
Code:
t32_t storage, uint32_t p2, bool post_process) {
  printf(" MTPD::formatStore called post:%u\n", post_process);
  uint32_t store = Storage2Store(storage);
  // see if we should bail early.
  if (!post_process) {
    // lets guess if we can do inline.
    uint64_t totalsize = storage_->totalSize(store);
    if (totalsize > (8l*1024*1024)) {
      printf(">>> Storage size %lu - defer format\n", totalsize);
      return 0;  // I know a hack...
    }  
  } else {
So it is calling through storage to the FS to say give you size and then I arbitrarily choose a size... Which I see is not large enough I meant 8GB not 8MB.. then it bails...

So in the main loop code we have:
Code:
        case 0x100F: // FormatStore
          return_code = formatStore(p1, p2, false);
          if (return_code == 0) {
            do_defered_format = true;
            return_code = MTP_RESPONSE_OK;
          }
          break;

And then after we send a response back to the host, we then do:
Code:
    if (return_code) {
      CONTAINER->type = 3;
      CONTAINER->len = len;
      CONTAINER->op = return_code;
      CONTAINER->transaction_id = id;
      CONTAINER->params[0] = p1;
#if DEBUG > 1
      printContainer();
#endif
      usb_tx(MTP_TX_ENDPOINT, receive_buffer);
      receive_buffer = 0;
    } else {
      usb_free(receive_buffer);
    }

    if (do_defered_format) {
      printf("### Starting deferred format\n");
      formatStore(p1, p2, true);
    }

Which will hang MTP out to dry for however long it takes for the format to complete. It starts up an interval timer to check for messages and echo the busy status, which works for some things.
But there is no User Interface feedback to let the user know a format is happening in background nor when it completes. I have found any method within MTP to do so.

Later maybe we will have another method to more finetune when we need to do this. Like if the get number of used or free clusters takes more than X time we save away that we are probably working a real real real slow device.
 
I have some ideas about speeding up the shared SPI case, but it will result in a pretty substantial fork from Bill's original code. Not sure how I feel about that yet...
 
I have some ideas about speeding up the shared SPI case, but it will result in a pretty substantial fork from Bill's original code. Not sure how I feel about that yet...

Ok been pretty silent during this conversation but going to put my foot in my mouth with this post (ps forgive me if I sound cranky not feeling well tonight).

Just ran @KurtE's latest changes on a 32GB Fat32 SD Card and a 512GB exFat SDCard.
1. Ran MTP_test and then did a format on the SPI 32GB SD Card. Took over a minute to format when completed I did a refresh and lost the drive on the Audio shield. Then tried to access the SDCard on SDIO and got the beep and could do anything else.
2. While formating you can't do anything, if you try to do anything else with MTP it just hangs.
3. Formatting a exFat card on the SPI seems to work better and MTP is still operational after formatting. But not sure if you go over 512GB if will still work - thats my largest card :)

Right now unless you have a small card on SPI not sure formatting is going to work in MTP unless its formatted as exFAT. Just my two cents.

As for the pulling most of Bill's code that is pretty much what we were doing with PFsLib sort of but with partition support. formating using the msc Formatting code seems to work alot faster even with slow SDCards (32GB maybe about 10-12 seconds) using shared_SPI. So maybe for now leave fat formating until we bring in PFslib (since we are going to do that anyway for msc) and switch over the formating from PFsLib - might work better.

Think Kurt alluded to some of this in his earlier posts.

Might be another way as well that Kurt and I are talking about.
 
Maybe we need to rethink the FS format() API? Perhaps we need a more complex incremental API, and matching incremental format code in each library offering formatting?
 
Maybe we need to rethink the FS format() API? Perhaps we need a more complex incremental API, and matching incremental format code in each library offering formatting?

Maybe, although before that still want to do a few experiments... Read that Hacks :D

Which might include one or more of the of the following:

a) subclass the SDClass object and overwrite the Format class, (or simply edit the current version.

b) In the format method, could try:
before the format do a sdfat.begin(... DEDICATED_SPI do the format call and then after do begin(.... SHARED_SPI)...
Note only when using SPI...

c) have it call our PFS class Format, which was also faster...

Then we can decide if any of these level hacks work well and if there are ways to clean it up...

Bit sort of done for today!
 
Morning all - This morning I think first thing I am going to try to remove all of the none FS version of code.
First thing I did was create a new branch archive_event_class_code
So won't lose it if parts of it will come in handy.

For example will experiment with some of the code I had in the SD wrapper class after this.
 
Quick update: Pass 1 done.
Removed all of the Storage_SD defines and code.
Also removed Callback code.

I believe I updated all of the examples to have the right storage class.
I also updated most of the storage classes to not use what was the LittleFS callback class and files.

Many of the examples build with this including: simple, my version of Simple for T4 with Audio, MTP-Test, MTP-Logger.
Some others specific to LittleFS as well

Some others that use the MSC callback as well as my combined SD class still need some work or removal...

Now back to playing... Probably testing and experimenting with slow SD...
 
I thought you might like that...
SDMTPClass, I left as I thought I might use it as an experiment like:
Code:
class SDMTPClass : public SDClass {
public:
	uint64_t usedSize();
	bool format(int type=0, char progressChar=0, Print& pr=Serial);

And experiment with similar code to Mike posted on the SDFat issue,
Where if the SD object is running on SPI, then we switch first to DEDICATED SPI mode, then do the format, then switch back to SHARED mode.

Likewise, I was going to do the same for usedSize on the first time it is called... Subsequent calls can just go through as the Fat object will cache it.

But as I am trying to ask on the Issue. Take the current SDClass::format code
Code:
bool SDClass::format(int type, char progressChar, Print& pr)
{
	SdCard *card = sdfs.card();
	if (!card) return false; // no SD card
	uint32_t sectors = card->sectorCount();
	if (sectors <= 12288) return false; // card too small
	uint8_t *buf = (uint8_t *)malloc(512);
	if (!buf) return false; // unable to allocate memory
	bool ret;
	if (sectors > 67108864) {
#ifdef __arm__
		ExFatFormatter exFatFormatter;
		ret = exFatFormatter.format(card, buf, &pr);
#else
		ret = false;
#endif
	} else {
		FatFormatter fatFormatter;
		ret = fatFormatter.format(card, buf, &pr);
	}
	free(buf);
	if (ret) {
		// TODO: Is begin() really necessary?  Is a quicker way possible?
		begin(cspin);
	}
	return ret;
}
How does this work: begin(cspin);

If the user never called SD.begin();
But instead did something like: ok = SD.sdfs.begin(SdSpiConfig(chipSelect, SHARED_SPI, SD_SCK_MHZ(24)));

The member variable: uint8_t cspin = 255;
Will never have been updated.

That is why I am asking on issue about how can I get this information from in this case the sdfs object?

What at times I wonder is in SD library if it would make sense to expand the being method:
Code:
	bool begin(uint8_t csPin = 10) {

to allow the additional information to be passed in, something like:
Code:
	bool begin(uint8_t csPin = 10, uint32_t maxSpeed=SD_SCK_MHZ(16), uint8_t opt=0) {
Where we could save away that data as part of the SD object, and use it for cases like this. Also change the examples to
use this instead of doing the begin to sdfs.

And in cases like this maybe we for example don't support format or card present... Not sure how you are now detecting SPI versus builtin... Although maybe you don't need to as you are maybe simply talking to card...

As for on our own. It is interesting that Bill mentioned they are using the older style SPI.trasnfer(buffer, count), which causes you to have to copy or fill buffers...
Maybe we should look and try fixing it to use other transfer method and see how much that helps
 
@Paul/@mjs513 I pushed up first version hack wrapper class of SD object. Where for both Format and usedSize I try to go into DEDICATED mode...

Not working yet, but in case you are interested. Note: I pushed up one test program with it in it.
 
Back
Top