Update on Boson Camera project.

mborgerson

Well-known member
I've managed to modify the USBHost serial connection to communicate with the Boson camera. I can now control the camera through its USB Serial interface. This port communicates using a SLIP protocol that requires bit stuffing and unstuffing which means reading and writing one byte at a time. As a result, it takes about 6.3 seconds to upload a 655KB uncompressed 640x512 16-bit image. Not exactly blazing speed--a bit like drinking beer through a cocktail straw. It takes a long time, but you get the same result. Communicating with the Boson requires a modified host Serial interface. Left to its own devices, the USBHost_t36 library doesn't recognize the camera and treats it as a CDC ACM device. When it tries to send the default ACM control packets, something goes wrong and the program crashes. When recognized by the modified serial.cpp code, the boson is recognized as a serial port than requires no control packets and the interface can talk to the camera.

I've started experimenting with methods for faster uploads using the BOSON dedicated bulk input endpoint. That has brought to light some issues with the Teensy Host serial interface that will be the subject of another thread in the Tech Support forum.

Here's a sample image. It's 9AM on Sunday morning, so the cold beverage is just ice water--to add a bit of contrast. The uploaded frame is a series of 16-bit integers which are (Kelvin Degrees) * 100. The temperatures were converted to Degrees C. and the image was plotted in MatLab.

Icewater.jpg
 
As part of my project to write a USB host driver for the FLIR Boson camera, I needed to develop the code to accept the bulk data stream from the camera. With its current configuration, the Boson at a 20FPS frame rate, sends a 491KB frame every 50 milliseconds. That's about 9.8MB/second. My early test code, which requested 512-byte packets as fast as I could process them, only managed about 4MB/second. Coincidentally, that's about as fast as I could collect packets from another T4.1. over USB Serial.

Just before Christmas, I realized that there's transfers, and there's TRANSFERS! A lot of the source code I looked at defined the maximum size of a transfer on a high-speed bulk endpoint as 512 bytes. My examination of the USBHost Serial driver seemed to confirm that 512-byte limit as it never asked for more than that in a single call to queue_Data_Transfer(pipe, buffer,size, driver). On the other hand, the EHCI hardware specs defined the maximum transfer size at 20KB for perfectly aligned buffers and 16KB for buffers with any alignment. It was like a strobe light had gone off! (People my age might refer to a flash bulb going off--but no one has seen that outside the movies in decades!).

A few hours of experimentation showed that you could, indeed, ask the EHCI to read 16KBytes at a time. The hardware would break that 16KB up into 32 512-byte transfers and, following some arcane criteria, get as many smaller transfers as possible in a 125 microsecond USB micro frame. Now I was able to suck up the camera data as fast as it was being sent. In fact, when writing to DTCM (fast) RAM, the Teensy host can read camera data at 36 to 40 MBytes/second! Even better, the EHCI can read data and store it in PSRAM (EXTMEM) at about 18-19MBytes/second. Wait,...There's more! Those transfers are done totally with the EHCI DMA and leave the CPU free for other chores in the foreground loop. (Unlike the Host Serial stream, I never need individual byte-by-byte access, so no time-wasting memcpy() calls.)

What other chores, you ask? Well, there is the small issue of what to do with 6 to 10MBytes of frame data per second. My answer is "Write it to an SD file and have MatLab sort it out later." It seems that the SD driver plays nicely with the DMA from the camera in the background. If the SD card doesn't hiccup and take more than the usual 40milliSeconds to write a frame, all goes smoothly.

You can see the result on my YouTube page at:
The tentative title is "Grumpy programmer on swivel chair."

For the more technically inclined, I've attached the results of a test program which illustrates the USB Host transfer timing. It shows the difference in transfer rates between fast RAM as a destination, and PSRAM.

Code:
// Transfer to fast (DTCM) RAM
Tr #,  time(ms)   TrTime, B[0], Tr Len,   Frm. Bytes   

  0,   37.878,    37.875, 0C,   16384,    16384  // 0x0C indicates payload header at beginning
  1,   38.375,     0.500, 45,   16384,    32768
  2,   38.875,     0.500, 45,   16384,    49152
  3,   39.250,     0.375, 42,   16384,    65536
  4,   39.751,     0.500, 41,   16384,    81920
  5,   40.251,     0.500, 3B,   16384,    98304
  6,   40.750,     0.500, 42,   16384,   114688
  7,   41.250,     0.500, 7C,   16384,   131072
  8,   41.625,     0.375, 3A,   16384,   147456
  9,   42.000,     0.375, 44,   16384,   163840
 10,   42.375,     0.375, 38,   16384,   180224
 11,   42.750,     0.375, 41,   16384,   196608
 12,   43.251,     0.500, DA,   16384,   212992
 13,   43.751,     0.500, 3F,   16384,   229376
 14,   44.250,     0.500, 48,   16384,   245760
 15,   44.750,     0.500, 2C,   16384,   262144
 16,   45.125,     0.375, 4B,   16384,   278528
 17,   45.500,     0.375, DB,   16384,   294912
 18,   45.875,     0.375, 3A,   16384,   311296  16384 bytes in 375uSec is 43.7MB/Sec burst rate
 19,   46.250,     0.375, AB,   16384,   327680
 20,   46.375,     0.125, 29,      12,   327692
 21,   46.875,     0.500, 0C,   16384,   344076
 22,   47.250,     0.375, 80,   16384,   360460  // 0x80 byte  is an Offset zero for chrominance
 23,   47.625,     0.375, 80,   16384,   376844
 24,   48.000,     0.375, 80,   16384,   393228
 25,   48.375,     0.375, 80,   16384,   409612
 26,   48.500,     0.125, 80,      12,   409624
 27,   49.000,     0.500, 0C,   16384,   426008
 28,   49.375,     0.375, 80,   16384,   442392
 29,   49.750,     0.375, 80,   16384,   458776
 30,   50.125,     0.375, 80,   16384,   475160
 31,   50.500,     0.375, 80,   16384,   491544
 32,   50.625,     0.125, 80,      12,   491556  // Frame data + 36 header bytes

 33,   87.875,    37.250, 0C,   16384,    16384  overall rate is 491556 bytes in 13375uSec or 36.7MB/sec
 34,   88.250,     0.375, 45,   16384,    32768


// Transfer to (EXTMEM) PSRAM
Tr #,  time(ms)   TrTime, B[0], Tr Len,  Frm. Bytes
  0,   31.383,    31.381, 0C,   16384,    16384   // Note shorter inter-frame delay for EXTMEM
  1,   32.256,     0.875, 42,   16384,    32768   // as well as addition transfer times
  2,   33.131,     0.875, 41,   16384,    49152
  3,   34.006,     0.875, 41,   16384,    65536   16384 bytes in 875uSec = 18.7MB/sec burst rate
  4,   34.881,     0.875, 41,   16384,    81920   
  5,   35.756,     0.875, 3A,   16384,    98304
  6,   36.631,     0.875, 3F,   16384,   114688
  7,   37.506,     0.875, 89,   16384,   131072
  8,   38.381,     0.875, 3B,   16384,   147456
  9,   39.256,     0.875, 49,   16384,   163840
 10,   40.131,     0.875, 35,   16384,   180224
 11,   41.006,     0.875, 40,   16384,   196608
 12,   41.881,     0.875, 97,   16384,   212992
 13,   42.756,     0.875, 3E,   16384,   229376
 14,   43.631,     0.875, 52,   16384,   245760
 15,   44.506,     0.875, 29,   16384,   262144
 16,   45.381,     0.875, 46,   16384,   278528
 17,   46.256,     0.875, D3,   16384,   294912
 18,   47.131,     0.875, 3A,   16384,   311296
 19,   48.006,     0.875, BE,   16384,   327680
 20,   48.125,     0.119, 29,      12,   327692
 21,   49.006,     0.881, 0C,   16384,   344076
 22,   49.881,     0.875, 80,   16384,   360460  // 0x80 is Offset zero for chrominance
 23,   50.756,     0.875, 80,   16384,   376844
 24,   51.631,     0.875, 80,   16384,   393228
 25,   52.506,     0.875, 80,   16384,   409612
 26,   52.625,     0.119, 80,      12,   409624
 27,   53.506,     0.881, 0C,   16384,   426008
 28,   54.381,     0.875, 80,   16384,   442392
 29,   55.256,     0.875, 80,   16384,   458776
 30,   56.131,     0.875, 80,   16384,   475160  // overall rate is 491556 bytes in 25,742uSec or 19.1MB/sec
 31,   57.006,     0.875, 80,   16384,   491544
 32,   57.125,     0.119, 80,      12,   491556  // Frame data + 36 header bytes

 33,   81.381,    24.256, 0C,   16384,    16384
 
Something else that can help increase throughput a lot is having two (or more) transfers scheduled, so that when the first one completes and the host code goes off to process its callback function, data can continue to be received from the device by the other queued transfer.
 
Something else that can help increase throughput a lot is having two (or more) transfers scheduled, so that when the first one completes and the host code goes off to process its callback function, data can continue to be received from the device by the other queued transfer.
I use a technique that I ran across in the USBDrive mass storage driver: I queue up the next transfer in the callback function. That way, when the callback returns from interrupt space, the next transfer is already queued up.

I have a simple state machine in the callback that differentiates between two states: Skipping and Capturing. In the Skipping state, the driver continuously sucks up the camera data, but writes it to the same 16KB buffer space over and over. To collect a frame, a nextState variable is set up to go to the Capture state at the end of the frame. In that state, the incoming data goes to a frame buffer--which will soon become a circular queue of frame buffers to handle the inevitable slow SD card writes.
 
I use a technique that I ran across in the USBDrive mass storage driver: I queue up the next transfer in the callback function. That way, when the callback returns from interrupt space, the next transfer is already queued up.
That's what you want to avoid for uninterrupted high speed transfer though. There's a delay between the transfer finishing and the interrupt triggering (the USBHost code sets the interrupt frequency to 1 frame = 1ms, which means a possible delay of 8 idle microframes), and also the overhead of exception handling and interrupt processing all delaying the start of the next transfer.

On the other hand if you have another transfer already queued up, data can continue to flow in the background while the callback is processed.
 
That's worth some investigation. One issue I'm concerned about: If I queue up two transfers of 16KBytes and the first transfer is a short transfer (12-bytes in the Boson transfers), the transfer is aborted after the short packet. In that case, subsequent packets have to adjust their buffer addresses to cope with the short transfer. If all incoming packets were 16384 bytes, this would not be an issue. However the BOSON USV implementation inserts 12-byte payload headers into the packet stream at the end of the Y data and after each of the two chromaticity planes. That makes maintaining sync and coping with transfer sizes a bit more complex.
 
True, if the transfer sizes are variable then you're better off sticking with the endpoint max size (512 bytes) and using dedicated buffers for the USB transfers, copying into a destination buffer during the callback to "pack" the variable length data. But if you know how big the planes are / how much data to expect for them, you'd know when the 12-byte payloads will occur? Or is it sending compressed data, so the size is unpredictable?
 
True, if the transfer sizes are variable then you're better off sticking with the endpoint max size (512 bytes) and using dedicated buffers for the USB transfers, copying into a destination buffer during the callback to "pack" the variable length data. But if you know how big the planes are / how much data to expect for them, you'd know when the 12-byte payloads will occur? Or is it sending compressed data, so the size is unpredictable?
With the proper study of the sequence I showed in the earlier post today, I could probably request the expected packet lengths for multiple transfers. Since I'm limited by transfer rates to EXTMEM and camera output rates, I'm not sure it is worth the added complexity for the driver. I am starting to appreciate the complexity of writing USB Host drivers for many different devices.

One of the nice things about the EHCI is, that if I request 16384 bytes and the device returns less data, the EHCI aborts and tells me what it got. So, no matter what the returned transfer lengths may be, I know how many bytes I have received and I can adjust my buffer pointer for the next transfer accordingly. At this point, I don't see any benefit to falling back to requesting only 512 bytes, as the EHCI will tell me what it got, no matter the size of my request. The returned transfer links are a good clue to the returned data structure.
 
Back
Top