Does anyone know if there's a bare metal version of the USB code kcuzner created for the Teensy 3.2?
This is his code, right? Looks about as bare metal as you can get.
https://github.com/kcuzner/teensy-oscilloscope/blob/master/scope-teensy/src/usb.c
I've been looking at the Paul's core code and it's too complex to figure out what I need as a bare minimum (still looking at it).
The basic idea is the USB hardware responds to the tokens from the USB host by read/writing your buffers. You have give to the hardware your buffers in advance, via the BDT table. Then it gives you interrupts to tell you when it's used the buffers, and you have to give the hardware more buffers, and of course actually do something with ones it has finished using.
If you're used to thinking of code as a reactionary model, where the hardware gives you an interrupt and then your code does the work of moving data, perhaps this model of setting up buffers in advance can seem complex. But this is the way modern high performance DMA-based peripherals work.
Perhaps a bare minimum would use just two buffers per endpoint. Or maybe even just 1 buffer. The BDT is set up for ping-pong with 2 pointers, so 2 buffers might be simpler than 1.
But the key to good USB performance is always having a buffer ready to receive data when the host sends the OUT token, or having a buffer filled with data ready to go when the host sends the IN token. USB host controllers tend to be quite bursty in their access, where (with bulk & control) you tend to be rewarded immediately with another IN or OUT token if you were ready for the last one, and then as soon as you don't have a buffer ready the hardware replies with a NAK token. The host controller usually moves on to other devices or enters a brief idle pause, so you *really* hurt performance if you're ever not ready. The host controller also spends a chunk of each frame servicing isochronous & interrupt endpoints. If your application is generating or consuming data quickly, the key to performance involves buffering up enough during those quiet times so you can always have buffers ready to go in the BDT slots during those times of rapid IN-ACK / OUT-ACK sequences.
My USB code has quite a lot of work invested in achieving those performance goals. Maybe you'll manage something just as good with whatever "bare minimum" ends up looking like, but I'd put the odds performance will likely suffer.