Teensyduino File System Integration, including MTP and MSC

Oh, *that* board. Seems like I would have built one for myself, but can't find it now. Fortunately I do have spare parts.

View attachment 26818

Can you remind me of which chip is on each CS pin? Would be nice if I build another one the same way.

@defragster I ordered new set where I extended out the pads, to see if my thumbs will have any success on soldering it then... :D
screenshot3.jpg

As for Pins and which chip is on it... As you mentioned it is nice at times to have the same ones on same pins...

But as I am now playing around with different boards and for example trying to avoid Pin6 such that for example can also plug in Audio board with memory or Propshield...

So at least in one sketch up on our MTP stuff, I have the SPI and QSPI sort of abstracted out, with pretty simple wrapper classes:
Like:
Code:
//----------------------------------------------------------------------------
// Experiment with LittleFS_SPI wrapper
//----------------------------------------------------------------------------
class lfs_spi {
public:
  lfs_spi(uint8_t pin) : csPin(pin) {}
  bool begin();
  inline LittleFS * fs() { return plfs;}
  inline const char * displayName() {return display_name;}
  // You have full access to internals.
  uint8_t csPin;
  LittleFS_SPIFlash flash;
  LittleFS_SPIFram fram;
  LittleFS_SPINAND nand;
  LittleFS *plfs = nullptr;
  char display_name[10];
};
bool lfs_spi::begin() {
  //Serial.printf("Try SPI Pin %u", csPin);
  if (flash.begin(csPin)) {
    //Serial.println(" *** Flash ***");
    sprintf(display_name, (const char *)F("Flash_%u"), csPin);
    plfs = &flash;
    return true;
  } else if (fram.begin(csPin)) {
    //Serial.println(" *** Fram ***");
    sprintf(display_name, (const char *)F("Fram_%u"), csPin);
    plfs = &fram;
    return true;
  } else if (nand.begin(csPin)) {
    //Serial.println(" *** Nand ***");
    sprintf(display_name, (const char *)F("NAND_%u"), csPin);
    plfs = &nand;
    return true;
  }
  //Serial.println(" ### Failed ###");
  return false;
}

Tried some more complex wrappers, but decided (at least for me) to go with something like the above (KISS) or to build a super class that handles all three of them natively.

Not sure if it makes sense to include wrapper classes like this somewhere?
 
Ok back to sflash5. Normally for testing I have been using the MTP_test_integrity sketch so I can get access to all devices. But just did a hack to the LFS_MTP_SPI_SIMPLE_DATALOGGER sketch to just focus on sflash5.

I ran the code from msg #640, but I'm not sure what to do to see the problem. I copied LargeIndexedTestfile.txt onto the sflash5 volume using Windows 10. Then I unplugged and reconnected it and copied to a different folder. Then compared the 2 files. They're identical.

But connecting it a Linux machine, reading the file fails. Is that the issue?

Or do I run the datalogger stuff?
 
One of the main issues we run in to, is to copy some larger files to it. After a fresh low level format they often work. But if you then delete a few files or the like and then try to upload large file, the writes go really slow and the host gets tired and bails on us. You will see the zero length packet and now we detect it and bail with incomplete transfer status.

We print out statistics like max time, average and the last 10…

I have another branch with experiment to read packets on interval timer. To try to keep host happier
 
As a follow on to @KurtE's comments what I am typically seeing from the sketch should be:
Code:
LP:41500 CMD: 100d(SEND_OBJECT)l: 12 T:29
MTPD::SendObject: len:13617870

MTPD::SendObject *** USB Read 0 bytes ***

MTPD::SendObject *** USB Read 0 bytes ***
>> Updating timestamps
Size of mdt = 1609459200
>> File Close
 # USB Packets: 461 total: 0 avg ms: 0 max: 0
 # Write: 116 total:3056 avg ms: 26 max: 54(18)
  >> >=100ms: 0 50:12 40:46 30:0 20:0 10:0
  >> Last write times
  1 6 47 6 45 6 46 6 43 6 44 5 51 5 44 5 48 6 52 5 51 5 46 6 45 6 44 6 50 6 47 6
>>>Total Time: 5064150
46565 RESP:2007(RSP:INCOMPLETE_TRANSFER)l: 12 T:29

As Kurt said after a low level Format copying large files seems to work but once you delete them and then try to copy the large files again you get incomplete transfers.
 
As an example of this. If I select sflash5 by doing a 21 that will select sflash1 and do a lowlevel format:
Code:
Dump Storage list(2)
store:0 storage:10001 name:RAM fs:20004bb8
store:1 storage:20001 name:sflash5 fs:20004ce8

Drive # 1 Selected

*** Erase/Format started ***
..................................
Completed, sending device reset event
*MTPD::send_Event(400b)
I can successfully transfer 2 large index files
Code:
SendObjectInfo: 131073 4294967295 20003d60: ST:0 F:3000 0 SZ:22015988 3000 0 0 0 0 0 0 0 0 0 0 : LargeIndexedTestfileOrig.txt
Read DateTime string: 20211208T232413.0
>> date/Time: 61b13e9d 12/8/2021 23:24:13
Created: 61b13e9d
Read DateTime string: 20211114T020456.0
>> date/Time: 61906ec8 11/14/2021 2:4:56
Modified: 61906ec8
18147 RESP:2001(RSP:OK)l: 24 T:9 : 20001 ffffffff 14
LP:18147 CMD: 100d(SEND_OBJECT)l: 12 T:a
MTPD::SendObject: len:22015988
>> Updating timestamps
Size of mdt = 1636855496
>> File Close
 # USB Packets: 43000 total: 62 avg ms: 0 max: 1
 # Write: 10750 total:73577 avg ms: 6 max: 167(8173)
  >> >=100ms: 1 50:1 40:0 30:0 20:0 10:0
  >> Last write times
  6 7 6 7 6 7 6 7 6 7 6 7 7 6 8 6 7 6 7 6 7 6 7 6 7 6 8 5 8 5 8 5
>>>Total Time: 73667574
SendObject() returned true

=================================
SendObjectInfo: 131073 4294967295 20003d60: ST:0 F:3000 0 SZ:22015988 3000 0 0 0 0 0 0 0 0 0 0 : LargeIndexedTestfileOrig1.txt
Read DateTime string: 20211208T232530.0
>> date/Time: 61b13eea 12/8/2021 23:25:30
Created: 61b13eea
Read DateTime string: 20211114T020456.0
>> date/Time: 61906ec8 11/14/2021 2:4:56
Modified: 61906ec8
94904 RESP:2001(RSP:OK)l: 24 T:20 : 20001 ffffffff 15
LP:94905 CMD: 100d(SEND_OBJECT)l: 12 T:21
MTPD::SendObject: len:22015988
>> Updating timestamps
Size of mdt = 1636855496
>> File Close
 # USB Packets: 43000 total: 76 avg ms: 0 max: 1
 # Write: 10750 total:74197 avg ms: 6 max: 407(9686)
  >> >=100ms: 3 50:0 40:0 30:0 20:0 10:0
  >> Last write times
  7 8 7 8 6 8 7 8 7 8 6 8 8 7 9 7 7 7 9 6 8 7 8 7 7 7 8 7 7 6 8 7
>>>Total Time: 74302201
SendObject() returned true
note the times.

Now if I delete them and try to copy them again one seems to succeed and the second fails with read 0 byte packets.
Code:
Size of mdt = 1636855496
>> File Close
 # USB Packets: 43000 total: 55 avg ms: 0 max: 1
 # Write: 10750 total:75856 avg ms: 7 max: 146(7111)
  >> >=100ms: 1 50:1 40:0 30:0 20:0 10:0
  >> Last write times
  7 7 6 8 6 8 6 8 6 8 6 8 7 6 9 6 7 6 9 6 7 6 8 6 8 6 8 6 7 6 8 6
>>>Total Time: 75937270
SendObject() returned true
and
Code:
LP:1776064 CMD: 100d(SEND_OBJECT)l: 12 T:40
MTPD::SendObject: len:22015988

MTPD::SendObject *** USB Read 0 bytes ***

MTPD::SendObject *** USB Read 0 bytes ***
>> Updating timestamps
Size of mdt = 1636855496
>> File Close
 # USB Packets: 2529 total: 5 avg ms: 0 max: 1
 # Write: 633 total:7912 avg ms: 12 max: 261(448)
  >> >=100ms: 1 50:1 40:91 30:0 20:0 10:0
  >> Last write times
  1 7 44 7 44 6 45 6 45 6 45 6 46 6 42 6 42 6 45 6 45 6 45 6 43 6 43 7 46 6 45 6
>>>Total Time: 9926840
1785991 RESP:2007(RSP:INCOMPLETE_TRANSFER)l: 12 T:40
You can also do a format from windows explorer and get the same thing and happens quicker
 
Got curious and you know what happens. Anyway I put the 512mb chip on my QSPI board

IMG-0556.png

and exactly the same thing is happening as described for it using SPI as described in the previous posts

EDIT: I did verify that transferring the 21mb index file to the 512 chip on a T3.6 does work even though it takes forever
 
Last edited:
One of the main issues we run in to, is to copy some larger files to it. After a fresh low level format they often work. But if you then delete a few files or the like and then try to upload large file, the writes go really slow and the host gets tired and bails on us. You will see the zero length packet and now we detect it and bail with incomplete transfer status.

We print out statistics like max time, average and the last 10…

I have another branch with experiment to read packets on interval timer. To try to keep host happier

Missed these posts while getting image of the memory board ...

So the LittleFS is formatting as it goes and taking too long? Or the internal block finding of lfs_fs_traverse() thrashes a long time.

That is where background formatUnused() could help 'garbage collecting'. Though with the limited implementation/intervention it:
> doesn't know where the lfs 'write head' is and starts at block 0, so it may not be acting on the next blocks to be used.
-> perhaps a clue would be the last block lfs asked to 'format' before use.
> not tracking the blocks in use and when 'dirty and unused' can't do garbage collection to a known point

The lookahead_size 'cache' reserve for 'lfs' usage holds a bitmap of the blocks used. If that cache is 'big enough' to cover the 'blocks of the media at hand' - it would never need to discard and discover where the next usable block is.
> walking the lfs_fs_traverse() dir tree block usage takes a long time. When it is done for formatUnused() usage IIRC it can be over 20ms on larger/slower media that isn't empty ( i.e. the many folders/files created in Integrity testing on these FLASH units )
-> This would add huge time overhead when this "brute force traversal" is in progress, way more than formatting some number of blocks.

Simple fix might be to assure the space provided to lfs will hold enough bits for all blocks on the given media. for 32K blocks that would be 4KB of RAM.
> seems that "config.lookahead_size = info->progsize;"is now a 'fixed' progsize == 256:
Code:
PROGMEM static const struct chipinfo {
	uint8_t id[3];
	uint8_t addrbits;	// number of address bits, 24 or 32
	[B]uint16_t progsize;	// page size for programming, in bytes[/B]
	uint32_t erasesize;	// sector size for erasing, in bytes
	uint32_t chipsize;	// total number of bytes in the chip
	uint32_t progtime;	// maximum microseconds to wait for page programming
	uint32_t erasetime;	// maximum microseconds to wait for sector erase
} known_chips[] = {
	{{0xEF, 0x40, 0x15}, 24, [B][U]256[/U][/B], 4096, 2097152, 3000, 400000}, // Winbond W25Q16JV*IQ/W25Q16FV

Comment previously added in code shows it should be = config.block_count/8

Looking at the code not memory from way back - this comment is there:
Code:
[B]	config.cache_size = info->progsize;
	config.lookahead_size = info->progsize;
	[COLOR="#FF0000"]// config.lookahead_size = config.block_count/8;[/COLOR][/B]

from DESIGN.md:
Code:
The block allocator in littlefs is a compromise between a [U]disk-sized bitmap[/U] and
a [U]brute force traversal[/U]. Instead of a bitmap the size of storage, we keep track
of a small, fixed-size bitmap called the [B]lookahead buffer[/B]. During block
allocation, we take blocks from the lookahead buffer. If the lookahead buffer
is empty, we scan the filesystem for more free blocks, populating our lookahead
buffer. In each scan we use an increasing offset, circling the storage as
blocks are allocated.

Though it may be that lfs would always reuse the first abandoned blocks and never 'wear level' with "circling the storage" if that was the scheme to assure that, instead of keeping a start 'write head' location of the last space used from the lookahead.

<edit> seeing I missed more that seen without a refresh before posting. Sample board already soldered.
 
defragster said:
Simple fix might be to assure the space provided to lfs will hold enough bits for all blocks on the given media. for 32K blocks that would be 4KB of RAM.
> seems that "config.lookahead_size = info->progsize;"is now a 'fixed' progsize == 256:

That is one of the things I tried to no avail. Some of the things I tried on the LFS side:
1. Adjust cache size
2. Adjust lookahead size
3. Adjust cache and lookahead size
4. Adjust erase size to 32k from 4k - that got a little bit farther but still failed with same systems
5. Specifically putting the chip into 4-byte mode (not really needed but had to check).

Now of course it could be a combination of these, not using the right setting or something else. But size it works on the T3.6 makes me think its something outside of LFS.
 
That is one of the things I tried to no avail. Some of the things I tried on the LFS side:
1. Adjust cache size
2. Adjust lookahead size
3. Adjust cache and lookahead size
4. Adjust erase size to 32k from 4k - that got a little bit farther but still failed with same systems
5. Specifically putting the chip into 4-byte mode (not really needed but had to check).

Now of course it could be a combination of these, not using the right setting or something else. But size it works on the T3.6 makes me think its something outside of LFS.

Odd - that post is a repeat of the past reading when doing the formatUnused. Not sure all listed was tried here. And not sure now what WIP updates make a current system with intermittent attention here.

Wonder where and when how much time is lost in the lfs calls? Might give an idea of the depth/nature of the time sink. i.e. large lumps (dir traverse), or many small repeat delays (formatting)

One quick test would be rather than low level format - do a complete myfs.formatUnused( 0, 0 ); run and see if it ever fails on the next restart after that given the same starting image that broke. For a file of that size it may just be the collective formatting time - not helped by the search for free space.

I saw large traverse times with multiple dirs/files - but it may be just as bad for single large files if it has to follow the alloc chain to ID the blocks used when the files are 20MB and can't say that was differentiated in the times reported.
 
=defragster]Odd - that post is a repeat of the past reading when doing the formatUnused. Not sure all listed was tried here. And not sure now what WIP updates make a current system with intermittent attention here.
Er - yes all was tried and showed same behavior where usb packet transfer got stuck.

One quick test would be rather than low level format - do a complete myfs.formatUnused( 0, 0 ); run and see if it ever fails on the next restart after that given the same starting image that broke. For a file of that size it may just be the collective formatting time - not helped by the search for free space.
Image didn't break because I can still transfer small files as well as open files. If the dir pair (0/1) was corrupted wouldn't be able to do that and mounting would become an issue.

One quick test would be rather than low level format - do a complete myfs.formatUnused( 0, 0 ); run and see if it ever fails on the next restart after that given the same starting image that broke. For a file of that size it may just be the collective formatting time - not helped by the search for free space.
Seems to be causing issues with MTP - on completion looses the drive and hangs the sketch :(
 
Good Morning all,

Lots of posts since I last responded. Luckily most of the things I would mention, Mike has already mentioned:D But will mention a few, plus a few more background/I don't understand like comments.

Paul (and others): Some basic understandings of USB stuff, like:

Bulk Transfers:

a) T4 will try to use High speed (512), T3.x will use full speed (64 byte). Note: if T4 connects at full speed it will also use 64 byte packets. Need a setup to force T4 to full speed to see if any differences? Note: I did add some code to cores and MTP code to not always assume 512 bytes, but I know probably missed some.

b) How big is the bulk transfer? On T4x on Windows WireShark made it look like it was receiving 127 packets followed by 1, by 127, 1, ... for SendObject, where Ubuntu looked like 32... Have not looked if that is different on T3.6 with 64 byte packets. Also don't understand if that is the Host saying I want these sizes? Does the Teensy have any say? Is this a real thing or only how WireShark interprets/reports the data?

MTP Timeouts: (some could be under bulk... as well)

a) Send Object What criteria does MTP use to fail/timeout? I don't think it is the total time of the operation, as Mike mentioned he was able to receive the large file on T3.x that took forever. But instead I think it is not seeing enough progress in receiving the data. But I don't know what I mean here. But have experimented a few different ways:
1) Make T4 act like T3... Which is receive a packet, write the data out, receive the next... There is a version in the code (//#define T4_USE_SIMPLE_SEND_OBJECT) which still times out and runs real slow.
2) Do reads in the background while a write operation is in progress. (In the branch throwing_darts) Start up an interval timer, that runs maybe every 15ms and if room in second output buffer it reads in the next... So as to try to keep it busy... But this was still timing out. Was experimenting with larger fifo buffer, where maybe we try to read up to 10K of data ahead... But that brings up other issues. ...

But again how does it decide to fail? That is for example with Ubuntu code for playing with RawHID 512 bytes and trying to make a version that uses BULK transfers. The host code, might call usb_bulk_write(hid->usb, hid->ep_out, buf, len, timeout);

So for example if I am going to send a 3mb file, is it likely that they are calling something like this, in the case of Ubuntu with a 16kb buffer (32*512), and to avoid MTP failing we need to progress fast enough to read in each chunk of 16K? Windows worse: As bigger transfers? Again is there any way to negotiate this?

b) Responding to commands, timeouts - If we don't respond to a message fast enough, MTP will abort on the PC and require something like a reboot of the teensy.

1) SendObject - again like before, but if for example if we did a large cache to read everything in, and then continued to write the data out, MTP will fail if we don't send the response packet within what it thinks is a reasonable time.

2) Startup - Code in place now that starts an interval timer, to respond to a few packets, and then respond with we are busy. That is why sometimes the MTP disk will show up on the PC but be empty unless you do a refresh... Otherwise it just simply failed

3) CopyObject/MoveObject - If asked to copy a file from one of our storage to another and that takes too long, MTP will quit... I have not done anything on this one other than verify it is an issue. For example if we copy a 3mb bitmap to a littleFs and each 4k we had to erase and each took an average of 50ms, than this operation might easily take over 30 seconds. For sure MTP would timeout.

So again need to probably do Interval timer that after a certain time, it responds back to host, probably saying OK, then it needs to respond to query messages, with we are busy, and then when copy is done, it would need to then send an event saying the object has changed...

...

Background/House Keeping - @defragster formatUnused...

That was the reason in the post about possible FS changes about maybe some method like (idle, loop), that a sketch could call, when it has nothing else to do, that maybe each time called, the LittleFS, might erase one sector if it knows it has dirty ones...

Probably enough for this post ;) Sorry for the book!

EDIT: The Bulk transfers on Windows for SendObject alternated 261659 bytes (511 512 byte packets plus overhead I think) and 539...
 
Last edited:
I can reproduce the problem and I'm looking at the USB analyzer now. Indeed it seems Windows simply becomes impatient that we're consuming the data too slowly and gives up.

The fundamental problem seem to be that the raw media erase + write speed is just too slow and Windows becomes impatient.

Erasing blocks in advance can only work if we're always able to do so before Windows sends a large file. Maybe we can get that to work in some cases if we recognize idle times without communication and use those moments to pre-erase any free blocks LittleFS has left filled with old data. But I don't see how that can work in all possible cases, like a DeleteObject request followed by a large SendObject write.

It seems the only 2 possible solutions are to get Windows to be more patient, or to speed up our sustained erase+write performance.
 
When Windows does give up, we may be suffering a rough experience because we're not implementing various control transfers (endpoint 0) which Windows sends. They're all getting answered with a stall response.
 
4. Adjust erase size to 32k from 4k - that got a little bit farther but still failed with same systems

When you tried 32K erase, did you do it as a single erase 32K block command (0x52) or as a set of 8 erase 4K sector commands (0x20 or 0x21)?

According to the datasheet, we should usually get 120ms for 32K erase, which is a lot faster than suffering 50ms 8 times. Erasing 64K at a time should be nearly twice as fast as 32K, but again only if done using the 0xD8 or 0xDC erase 64K block command.

screenshot.png
 
Last edited:
On the matter of soothing Windows to be more patient, is there anything we can put in the GetStorageInfo response to tell Windows this storage isn't so speedy?

Or can we send events on endpoint 4 to keep Windows occupied and prevent it from getting bored by the slow speed?
 
I can reproduce the problem and I'm looking at the USB analyzer now. Indeed it seems Windows simply becomes impatient that we're consuming the data too slowly and gives up.
...
It seems the only 2 possible solutions are to get Windows to be more patient, or to speed up our sustained erase+write performance.

The other ideas I have/had include:

A) Doctor, Doctor It hurts when I hit my thumb with hammer :D - Answer don't do that :D - i.e. Put a warning label somewhere that these chips have issues with doing large transfers...

B) Try to reduce the frequency of this: Again with the idea of background erase operations. But addition to this, can we put dirty blocks to the end of the free list instead of where maybe they are now at the beginning? That is if I have a 64MB flash chip and I copy 2 3mb files to it, it should take up < 10% of flash... So if I delete them and try again there is still 90% of the flash with not dirty blocks... Again not sure how hard this is.

c) BIG BUFFER- If on a T4.1 that has an 8mb PSRAM that is not used, then use as big of buffer as possible, and receive the data into the PSRAM, and then if we feel like it might time out, again do as I mentioned and respond OK, then continue to write out the rest of the data. to the file... The writeback could be down a few different ways, that is who is on the Interval timer? the write back or reporting busy? ... On other T4s again maybe allow for a good size buffer in DMAMEM... Configured by sketch...

d) Cheat if possible... Example if you also have SD card on it with enough room, maybe allow the file to be redirected to SD and then when completed background task moves the file to the slow chip...


But not sure which of these is worth doing
 
I can reproduce the problem and I'm looking at the USB analyzer now. Indeed it seems Windows simply becomes impatient that we're consuming the data too slowly and gives up.

The fundamental problem seem to be that the raw media erase + write speed is just too slow and Windows becomes impatient.

Erasing blocks in advance can only work if we're always able to do so before Windows sends a large file. Maybe we can get that to work in some cases if we recognize idle times without communication and use those moments to pre-erase any free blocks LittleFS has left filled with old data. But I don't see how that can work in all possible cases, like a DeleteObject request followed by a large SendObject write.

It seems the only 2 possible solutions are to get Windows to be more patient, or to speed up our sustained erase+write performance.

The W25Q512 chip seems to be the only chip suffering from this malady when using MTP - just using LFS to read and write large files does seem to work. And curious that transferring large files on the T3.6 works albeit very slowly.

I did try one experiment this morning where instead of using a 4k sector erase we used a 32k block erase (wouldn't recommend it since small files would suffer) along with increasing the disk buffer in MTP. It kind of sort of worked but too many other issues cropped up. In going through the RM I don't think there is going to be a easy fix for this with the 512 chip that I can see. Unless someone has a better idea.

The one possibility is if we can get the start block and total size we might be able to erase the region on file open before the transfer starts. But not sure how you would do that and not sure we want to go that route or even if it would work. In dumping LFS data I did notice that when the file is open it does a traverse I think of the dir/pair to find a starting block for a file. We could then try and use formatUnused(blockstart, size) function to erase all the blocks associated with the file before grabbing data from usb. Again not sure how to do this from a practical standpoint and does seem alot of work to implement for 1 chip. These are just my ramblings
 
But not sure which of these is worth doing
Yep. One other point if I am going to solder a large package like the 512 chip I might as well go with the N01G instead. Besides on Mouser the 512 is about 6.50 and the n01 is about 3.50 so....
 
When you tried 32K erase, did you do it as a single erase 32K block command (0x52) or as a set of 8 erase 4K sector commands (0x20 or 0x21)?

According to the datasheet, we should usually get 120ms for 32K erase, which is a lot faster than suffering 50ms 8 times. Erasing 64K at a time should be nearly twice as fast as 32K, but again only if done using the 0xD8 or 0xDC erase 64K block command.

View attachment 26834

I used the 52h command :)
These were the setting I tried:
Code:
{{0xEF, 0x70, 0x20}, 32, 256, 32*1024, 67108864, 3500, 400000}
Note left the 400000 as max since 120ms = 120000 microseconds I think.

Then in erase I set:
Code:
	const uint8_t cmd = (addrbits == 24) ? 0x20 : 0x52; // erase sector

Then I loaded our test sketch and it would hang immediately. No crashreport or even a print of the file date and time (yes I added serial flush)

EDIT: One question I was unclear about is do we have to throw it into 4byte address mode - didn't think so but...
 
Back
Top