Here's my thoughts... if you want to give it a try.
First, you'll probably start with usb_serial_getchar(), keeping this code intact:
Code:
intr_state = SREG;
cli();
if (!usb_configuration) {
SREG = intr_state;
return -1;
}
UENUM = CDC_RX_ENDPOINT;
retry:
c = UEINTX;
if (!(c & (1<<RWAL))) {
// no data in buffer
if (c & (1<<RXOUTI)) {
UEINTX = 0x6B;
goto retry;
}
SREG = intr_state;
return -1;
}
This is the code you'll try replacing.....
Code:
// take one byte out of the buffer
c = UEDATX;
// if buffer completely used, release it
if (!(UEINTX & (1<<RWAL))) UEINTX = 0x6B;
Probably the simplest thing to do is read UEBCLX, to find out how many bytes are available in the currently viewable packet. You already know a packet is pending with at least 1 byte, so you should always get a number between 1 to 64.
A very simply approach would be to use a fixed 64 byte buffer, and just read UEDATX for each byte that's available and put it into the buffer. Then at the end, you can return that number, so the caller can know how many bytes were received.
If you always read every byte, the "!(UEINTX & (1<<RWAL))" test will always be true, so you could just always do "UEINTX = 0x6B" after reading the bytes, to return the buffer to the USB.
If you allow the user (...you...) to pass in a pointer to a buffer and its size, of course you'll have to check if the buffer is large enough, only copy the data that fits, and avoid returning the packet buffer if you didn't read all its bytes.
The simplest way to read bytes is with a loop. A faster way involves a switch/case block with the read code unrolled. See the highly optimized usb_serial_write() function for an example. The code size is larger, but it's much faster. I'd recommend getting it working first the simplest way possible, and later attempt the switch/case optimization if you need it. Doing that really requires verifying the compiled output with "avr-objdump -d" on the generated .elf file, so you can check if the compiler really is generating a long sequence of only the 2 necessary instructions, with a compute jump into the sequence. Lots of seemingly random stuff will cause the compiler to generate much slower code. But even a simple loop with looping overhead is probably much faster than calling usb_serial_getchar() for each byte.
A really good approach would allow the user to pass in a pointer to any size buffer and automatically read as many packets as necessary, or as available, probably with interrupts enabled briefly between each packet. That's how the usb_serial_write() function works... and let me tell you, many hours of work went into those 148 lines of code... analyzing generated code, testing & benchmarking, etc. It's a lot of work!
If you get something simple to function reliably, even a fixed buffer and single packet read with slower looping, I hope you'll share the confirmed-working code here? Hopefully this description will give you a good start?