USBHost Mass Storage problem

mborgerson

Well-known member
I'm exploring the USBHost MassStorageDriver to learn more about USB Host functionality with the ultimate goal of writing code to connect a USB webcam to the host port and receiving uncompressed image frames for storage on the SD card of a T4.1. I started testing transfer times for reading and writing data to a reasonably fast SanDisk 128GB USB3 thumb drive. I soon discovered that the mass storage driver does not like to send data from EXTMEM to the thumb drive.

Here's the minimalist code to show the issue:
C++:
// Minimalist demo to demonstrate problem in
// USB Host Mass storage writing.  Files don't write to
// thumb drive if source data is in EXTMEM.
// Compiled with TeensyDuino 1.58 and Arduino IDE 2.0.4
// M. Borgerson   11/23/2023

#include "SD.h"
#include <USBHost_t36.h>
#include "TimeLib.h"

USBHost myusb;
USBHub hub1(myusb);
// For now, use a file on a USB Thumb drive and the Mass Storage Driver
// in place of USB Camera, in the hope that a VGA frame from the camera
// will transfer at least as fast as a file from the thumb drive.
USBDrive mbDrive1(myusb);
USBFilesystem usbFS(myusb);

#define BUFFSIZE 131072   // 128KB buffer

uint8_t DTBuffer[BUFFSIZE];  // no specifier--ends up in DTCM
uint8_t DMBuffer[BUFFSIZE] DMAMEM; 
uint8_t EXBuffer[BUFFSIZE] EXTMEM; 

const int pinLED = 5;

#define LEDON digitalWriteFast(pinLED, HIGH);
#define LEDOFF digitalWriteFast(pinLED, LOW);
#define LEDTOGGLE digitalToggleFast(pinLED);



void setup() {
  // Wait for USB Serial
  // while (!Serial) {
  //   yield();
  // }
  delay(1000);
  int32_t wtime;
  pinMode(pinLED, OUTPUT);
  Serial.println("\n\nUSB Mass storage test");
  if (CrashReport) {
    Serial.print(CrashReport);
    Serial.println("Press any key to continue");
    while (Serial.read() != -1)
      ;
    while (Serial.read() == -1)
      ;
  }
  memset(DTBuffer,0xDD, BUFFSIZE);
  memset(DMBuffer,0xDE, BUFFSIZE);
  memset(EXBuffer,0xDF, BUFFSIZE); 
  // Start USBHost_t36, HUB(s) and USB devices.
  myusb.begin();

  Serial.println("Waiting for USB Filesystem");

  while (!usbFS) {
    myusb.Task();
  }
  Serial.println("USB Filesystem ready");


  Serial.println("Writing buffer in DTCM to USBHost Mass Storage . . .");
  wtime = USBWrite(&DTBuffer[0], sizeof(DTBuffer));
  Serial.printf(" took %ld uSeconds\n",wtime);
  delay(10); myusb.Task();
 
  Serial.println("Writing buffer in DMAMEM to USBHost Mass Storage . . .");
  wtime = USBWrite(&DMBuffer[0], sizeof(DMBuffer)); // !!! Sometimes it works !!!
  Serial.printf(" took %ld uSeconds\n",wtime);
  delay(10); myusb.Task();


  Serial.println("Writing buffer in EXTMEM to USBHost Mass Storage . . . ");
  wtime = USBWrite(&EXBuffer[0], sizeof(EXBuffer));    // !!!! Doesn't finish  !!!!
  Serial.printf(" took %ld uSeconds\n",wtime);
}



void loop() {

   myusb.Task();
}



// Write Buffer to USB file, keeping track of timing
int32_t USBWrite(uint8_t *bptr, uint32_t buffsize) {
  elapsedMicros utotal;
  uint32_t totaltime;
  File testFile;
  if(usbFS.exists("testfile.dat")) usbFS.remove("testFile.dat");
  testFile = usbFS.open("testfile.dat", FILE_WRITE);

  if (testFile) {  // If file is open,  do the test
    utotal = 0;
    testFile.write(bptr, buffsize);
    totaltime = utotal;
    testFile.close();
  } else {
    Serial.println("Could not open testFile!");
    totaltime = -1;
  }
  return totaltime;
}

and here's the output:


USB Mass storage test
Waiting for USB Filesystem
USB Filesystem ready
Writing buffer in DTCM to USBHost Mass Storage . . .
took 9250 uSeconds
Writing buffer in DMAMEM to USBHost Mass Storage . . .
took 7375 uSeconds
Writing buffer in EXTMEM to USBHost Mass Storage . . .

Note that there is no time for the write from EXTMEM and the program hangs---but does not crash.
 
Can confirm it fails to return - even after some random edit efforts.

Moving from compile alloc to extmem_malloc() also failed but works with standard DMAMEM malloc()
Code:
//  EXBuffer = (uint8_t*)extmem_malloc(BUFFSIZE); // also fails to complete
  EXBuffer = (uint8_t*)malloc(BUFFSIZE);
 
Basic back-of-a-napkin maths says it simply can't work - USB2 runs at 480mbps while the PSRAM is quad SPI @ 88MHz; even with zero overhead that would be a maximum of 4x88 = 352mbps, not fast enough to keep up with the EHCI controller.
The reason for the freezing is probably because a transaction fails with a data buffer underrun, which is non-fatal and keeps being retried over and over with the same result.
 
Basic back-of-a-napkin
Quick test for this: With 16KB on BUFFSIZE this should be retained in 32KB DATA data cache?
C:
memset(EXBuffer, 0xDF, BUFFSIZE);
  int jj = 0;
  for ( int ii = 0; ii < BUFFSIZE; ii++ ) jj += EXBuffer[ii];
  Serial.println("Writing buffer in EXTMEM to USBHost Mass Storage . . . ");
  if ( BUFFSIZE == jj / 0xDF )
    Serial.println("\tEXTMEM is 0xDF . . . ");
  wtime = USBWrite(EXBuffer, BUFFSIZE);    // !!!! Doesn't finish  !!!!
  Serial.printf(" took %ld uSeconds\n", wtime);

The EXBuffer write was moved to FIRST of the three in posted code and this is the output (perhaps super smart compiler) printing "is 0XDF - then nothing - but the MCU should be holding this memset() write in cache?:
USB Mass storage test
Waiting for USB Filesystem
USB Filesystem ready
Writing buffer in EXTMEM to USBHost Mass Storage . . .
EXTMEM is 0xDF . . .
 
RE p#4 wondering if DMA in use and {FULL SPEED} CACHE gets flushed? {though shouldn't touch this area?}

Or if alternate buffers consume more of the cache and 16KB half of cache gets swapped? Reduced to 4KB and still fails to complete:
USB Mass storage test
Waiting for USB Filesystem
USB Filesystem ready
Writing buffer in EXTMEM to USBHost Mass Storage . . .
EXTMEM is 0xDF . . . BUFFSIZE is =4096

Reduced to 512 Bytes and still fails to complete, but reduce to 256 Bytes for a single transfer and it completes:
...
Writing buffer in EXTMEM to USBHost Mass Storage . . .
EXTMEM is 0xDF . . . BUFFSIZE is =256
took 2004 uSeconds

Writing buffer in DTCM to USBHost Mass Storage . . .
took 1 uSeconds

Writing buffer in DMAMEM to USBHost Mass Storage . . .
took 1 uSeconds
Though more slowly - though timing seems off if order restore to PSRAM third:
Writing buffer in DTCM to USBHost Mass Storage . . .
took 1 uSeconds
Writing buffer in DMAMEM to USBHost Mass Storage . . .
took 1 uSeconds

Writing buffer in EXTMEM to USBHost Mass Storage . . .
EXTMEM is 0xDF . . . BUFFSIZE is =256
took 4 uSeconds
 
The EHCI controller is a bus master, it can only access memory directly and can't use the CPU's cache.
It has two separate buffers for transmitting and receiving (1KB each) but there are also settings for how it fills them e.g. how much of the buffer needs to be full before starting a transaction, and what burst size is used to fill it. From the looks of it the default settings are a pretty bad fit for PSRAM (and possibly FlexSPI in general which would include constant data read directly from Flash).
 
Further testing has revealed that problems with the usage of EXTMEM as a source or destination for USBHost mass storage device transfers is limited to transfers FROM EXTMEM to the USB drive. Transfers TO EXTMEM from the USB drive seem to proceed without problems.

My testing procedure was:

1. Transfer a set of 10 contiguous VGA data frames to a thumb drive (SanDisk 128GB USB3 drive) from an SD card file written by a Teensy test application. The transfer was carried out on a PC with an SD-card reader.

2. With the T4.1 powered off, reinsert the SD card and connect the thumb drive to the host port.

3. Using a new test application, verify that the EXTMEM buffer contains whatever is in the chip at power up. Verification showed 3071962 errors. A perfect miss would give 307200 errors, but some positions in EXTMEM matched the expected values.

4. Use the test application to read 10 VGA frames from the thumb drive into EXTMEM.

5. Use the verify command to check the EXTMEM buffer. Result: no errors.

6. Write the EXTMEM frame buffer to SD card to check transfer timing.

The Teensy output from these steps was:

Code:
USB Mass storage simulating USB UVC camera data path.
Starting SD initialization.
initialization done.
Type is exFAT
SD File System ready.
Waiting for USB Filesystem
USB Filesystem ready
Enter a command character:
 'r' to read 10 frames from USB thumb drive to buffer in EXTMEM
 's' to write buffer of 10 frames in EXTMEM to SD file
 'v' to verify expected data pattern in EXTMEM


Checking EXTMEM frame buffers for expected line data.
Found 3071962 errors in 10 contiguous VGA buffers


Reading data from USB Disk to EXTMEM Frame Buffer
Average transfer time:  35987 uSeconds/frame for 17.073 MB/Second
Maximum frame transfer Time:  38121 uSeconds at 0


Checking EXTMEM frame buffers for expected line data.
Found 0 errors in 10 contiguous VGA buffers


Writing data from EXTMEM Frame Buffer to testFile.dat on SD Card
Average transfer time:  29544 uSeconds/frame for 20.796 MB/Second
Maximum frame transfer Time:  40703 uSeconds at 0

This shows that transferring from a USB mass storage device to EXTMEM works nicely with a transfer rate of about 17MB/second. This would allow a theoretical transfer rate of 29 uncompressed VGA YUV or RGB565 frames per second--for about a third of a second until EXTMEM was full. This is good news. I hope that the bulk transfer of VGA frames from a UVC camera on demand will proceed at a similar rate.

The test application is attached. I will monitor this thread to keep up with the analysis of the EXTMEM problem. However, further discussion of the USB camera project will move to the General Discussion or Project Guidance forums.
 

Attachments

  • SimUSBCam.ino
    6.7 KB · Views: 79
I think I see a flaw with this testing. USB mass storage devices use Bulk endpoints, which transfer data whenever the host controller requests it. The host can stop and start the data transfer as it needs to, depending on how full or empty the receiving buffer is. FlexSPI also has a buffer which means writing to PSRAM is faster than reading (as the bus transfer can finish before the SPI transfer does), but if you start mixing reads and writes at the same time the performance will be greatly reduced.
A camera will use isochronous endpoints which are time-sensitive - if the host is not ready to receive data at the appropriate time, it will be dropped and permanently lost.
(The USBHost library for Teensy also currently lacks support for isochronous endpoints.)
 
@mborgerson - I played with this today and came up with a not so glorious workaround. I broke up the 128KB into smaller chunks of varying sizes like 16KB, 32KB and 64KB. I created another transfer buffer in DTCM that I could memcpy size_t chunks to from EXTMEM and then write it to the USB drive. Obviously you are taking a memory and slight speed hit.
My results with a ExFat formatted 32G SanDisk DUAL thumb drive:
Code:
USB Mass storage test
Available EXTMEM: 8 MB
Waiting for USB Filesystem
USB Filesystem ready
Writing buffer in DTCM to USBHost Mass Storage . . .
 took 7874 uSeconds
Writing buffer in DMAMEM to USBHost Mass Storage . . .
 took 7874 uSeconds
Writing buffer in EXTMEM to USBHost Mass Storage . . .
 took 12499 uSeconds

This was with a chunk size of 65536. This should give you an idea of the PSRAM read speed at 88Mhz. The ratios are pretty close to the same for all three buffer sizes.
The code:


C++:
// Minimalist demo to demonstrate problem in
// USB Host Mass storage writing.  Files don't write to
// thumb drive if source data is in EXTMEM.
// Compiled with TeensyDuino 1.58 and Arduino IDE 2.0.4
// M. Borgerson   11/23/2023

#include "SD.h"
#include <USBHost_t36.h>
#include "TimeLib.h"

USBHost myusb;
USBHub hub1(myusb);
// For now, use a file on a USB Thumb drive and the Mass Storage Driver
// in place of USB Camera, in the hope that a VGA frame from the camera
// will transfer at least as fast as a file from the thumb drive.
USBDrive mbDrive1(myusb);
USBFilesystem usbFS(myusb);

#define BUFFSIZE 131072   // 128KB buffer
#define CHUNKSIZE 65536   // 64KB  xferBuf size

uint8_t DTBuffer[BUFFSIZE];  // no specifier--ends up in DTCM
uint8_t DMBuffer[BUFFSIZE] DMAMEM;
uint8_t EXBuffer[BUFFSIZE] EXTMEM;
uint8_t xferBuf[CHUNKSIZE]; //FASTRUN

const int pinLED = 5;

#define LEDON digitalWriteFast(pinLED, HIGH);
#define LEDOFF digitalWriteFast(pinLED, LOW);
#define LEDTOGGLE digitalToggleFast(pinLED);

extern uint8_t external_psram_size;

void setup() {
  // Wait for USB Serial
  while (!Serial) {
     yield();
  }
  delay(1000);
  int32_t wtime;
  pinMode(pinLED, OUTPUT);
  Serial.println("\n\nUSB Mass storage test");

  Serial.print("Available EXTMEM: ");
  Serial.print(external_psram_size,DEC);
  Serial.println(" MB");

  if (CrashReport) {
    Serial.print(CrashReport);
    Serial.println("Press any key to continue");
    while (Serial.read() != -1)
      ;
    while (Serial.read() == -1)
      ;
  }
  memset(DTBuffer,0xDD, BUFFSIZE);
  memset(DMBuffer,0xDE, BUFFSIZE);
  memset(EXBuffer,0xDF, BUFFSIZE);
  // Start USBHost_t36, HUB(s) and USB devices.
  myusb.begin();

  Serial.println("Waiting for USB Filesystem");

  while (!usbFS) {
    myusb.Task();
  }
  Serial.println("USB Filesystem ready");


  Serial.println("Writing buffer in DTCM to USBHost Mass Storage . . .");
  wtime = USBWrite(&DTBuffer[0], sizeof(DTBuffer));
  Serial.printf(" took %ld uSeconds\n",wtime);
  delay(10); myusb.Task();
 
  Serial.println("Writing buffer in DMAMEM to USBHost Mass Storage . . .");
  wtime = USBWrite(&DMBuffer[0], sizeof(DMBuffer)); // !!! Sometimes it works !!!
  Serial.printf(" took %ld uSeconds\n",wtime);
  delay(10); myusb.Task();

  Serial.println("Writing buffer in EXTMEM to USBHost Mass Storage . . . ");
  wtime = USBWrite(&EXBuffer[0], sizeof(EXBuffer));    // !!!! Doesn't finish  !!!!
  Serial.printf(" took %ld uSeconds\n",wtime);
}



void loop() {

   myusb.Task();
}



// Write Buffer to USB file, keeping track of timing
int32_t USBWrite(uint8_t *bptr, uint32_t buffsize) {
  elapsedMicros utotal;
  uint32_t totaltime;

uint32_t i;
  File testFile;
  if(usbFS.exists("testfile.dat")) usbFS.remove("testFile.dat");
  testFile = usbFS.open("testfile.dat", FILE_WRITE);
  if (testFile) {  // If file is open,  do the test
    utotal = 0;

    for(i = 0; i < buffsize/CHUNKSIZE; i++) {
      memcpy(xferBuf, bptr, CHUNKSIZE);
      testFile.write(&xferBuf[0], CHUNKSIZE);
      bptr+=CHUNKSIZE;
    }

//    testFile.write(bptr, buffsize);
    totaltime = utotal;
    testFile.close();
  } else {
    Serial.println("Could not open testFile!");
    totaltime = -1;
  }
  return totaltime;
}

I checked the file size and it matched the size of data written to it.
Hope this helps a little...
 
memcpy size_t chunks to from EXTMEM
Sounds like a usable solution. PJRC allocates USB buffers in RAM2/DMAMEM and that works taking from that 'less commonly used/filled' memory. Example code above when non EXTMEM used both DTCM/RAM1 and DMAMEM/RAM2 and they showed same completion times. Was copy done selectively when the incoming *ptrBuf was outside normal Ram1/Ram2?
RAM2: variables:12672 free for malloc/new:511616 // PJRC USB buffer space

The EHCI controller is a bus master, it can only access memory directly and can't use the CPU's cache.
So that would be the DMA type exception noted in p#5. Which 'seemed' to be the case when it worked with the 256 Byte test that took longer, but not TOO long.
 
Back
Top