I have been watching this thread and learning a bit about DMA which looks interesting.
But before I think I could give any suggestions, I think I would need to understand the problem better. That is you say that the data comes at about 4mhz. But what I am not sure if I heard, is how much data? Is it continuous or does it come in bursts... If the data is continuous at at 4mhz, I think the USB issue is far more of an issue than how fast you can read the IO port.
Your call to Serial.write() will surely block in that call, as there is no way you have 6000 bytes available in the output buffer, so it will hard loop waiting for space to be available in the output queue...
All during this time you will not be reading the IO port.
If it were me here are some of the things I would investigate and experiment with.
a) If I am only interested in one IO port, I would probably pack the data going out over USB. That is if I pack 8 samples per byte, instead of sending 4MB of data per second, I would only send .5mb per second.
b) I would probably look at other USB output methods. For example I believe the Saleae logic analyzer uses a form of USB output, called bulk transfer? (
http://support.saleae.com/hc/en-us/...-interfering-with-other-attached-USB-devices-) Not sure if that maps to anything we can do with Teensy 3.2? But I would probably experiment with RAW Hid (
http://www.pjrc.com/teensy/rawhid.html), which sends 64 bytes at a time. I would probably also see if I could configure that to a larger size if that would help or not
c) Assuming I stayed with DMA reading of the IO port, I would look into setting up multiple DMA buffers. I would size the DMA transfers to maybe be the size of my RAW HID transfer size. Maybe times 8 if I did a), so when One DMA is done, have the system keep sampling with the 2nd DMA buffer, while I read the data out of the first, pack it into buffer to call the RAW hid send to initiate the output over USB. Repeat when 2nd DMA buffer has been received (telling the first to start reading again)....
d) maybe look again at how the DMA reads are working. That is if it is reading something like 23M samples per second, but your data is only 4MBS, maybe you can either somehow slow down DMA reads? Or maybe compress more data toward the output side? ...
e) if c) can not keep up, but the data is of some reasonable size, I would try to add more buffering and compression of the data and have it continue to fill up as much memory as I could, and hope I reach the end of the sampling before I run out of memory, and have the USB output continue to catch up...
I am not sure if any of this helps, but that is what I would look into.
Good Luck
Kurt