Can Teensy 4.1 bit stream at 480Mbps with USB 2?

ee2inf

Member
Hi,

Is it doable to capture signals from an AD9238 module with dual ADCs (12-bit/65Msps each), calculate the moving average of every 4 bits, and then stream the bit data through a USB device FIFO?
 
No, definitely not feasible.

Maximum theoretical USB 480 Mbit/sec speed with protocol overhead is 53,248,000 bytes/sec. See page 55 (83rd page in the PDF) of the USB 2.0 spec for details. But that doesn't include data-dependent bitstuffing overhead or trade-offs all USB host controllers make for bandwidth planning to assure SOF packets transmit precisely on schedule. So in practice you'll see even the best USB hardware achieve only some fraction of this theoretical maximum. We've often seen about 50% with this benchmark which includes binary to ascii conversion overhead, though results vary quite substatially depending on which software on the PC side receives the data.

But even if you could get 53 Mbyte/sec, you're talking about 2 channels of 12 bits at 65 Msamples/sec (just for data, not including any sort of framing info). My calculator says that's 195 Mbyte/sec... almost 4 times the theoretical maximum.

Even if you had no protocol overhead at all, 480 Mbit/sec is 60 Mbyte/sec... not even 1/3rd of the data rate of that ADC chip.
 
Last edited:
But maybe by "moving average of every 4 bits" you intend for Teensy to downsample the data? Really just guessing here, as I really don't have any idea what those words are meant to say.
 
But maybe by "moving average of every 4 bits" you intend for Teensy to downsample the data? Really just guessing here, as I really don't have any idea what those words are meant to say.
Thanks for your response! You're correct in assuming that by 'moving average of every 4 bits,' I intend to downsample the data. Essentially, I plan to average groups of four consecutive bits to reduce the data rate while retaining essential information. So, The effective sampling rate would be approximately 390Mbps (65Msps x 12 bits x 2 ADCs / 4).
 
Do you mean averaging groups of four consecutive 12-bit values? To say "four consecutive bits" is incorrect and confusing.
Yeah, you are totally right! that's what I meant, lost with my lingo.

But either way, it does not seem feasible with Teensy 4.1 consider that benchmark is far off from 49MB\s. I kind expected it to performed way better than CY7C68013A.
 
Essentially, I plan to average groups of four consecutive bits to reduce the data rate while retaining essential information. So, The effective sampling rate would be approximately 390Mbps (65Msps x 12 bits x 2 ADCs / 4).

Well that moves everything from impossible to only extremely unlikely.

I mentioned the Serial printing speed (a.k.a. lines / second) benchmark where we saw a pretty wide range of performance depending on the PC side software. Maybe Defragster will comment, as he ran it many times with different conditions, as I recall. But I don't remember ever seeing speeds anywhere close to 90% of theoretical USB throughput. Even just 50% of the theoretical 53 Mbyte/sec speed requires very good software on the PC side (probably Linux).

Acquiring from the ADC at that speed will also be quite a challenge. FlexIO2 with DMA is probably the only realistic hope. Recently 16 bit high speed FlexIO reception has been discussed. Maybe you can find those conversations, or someone might give a link to those.

Implementing the downsampling code might also be quite a challenge. On average, you'll have only about 9.23 CPU cycles per incoming data. Or maybe a better way to think about it would be almost 37 CPU cycles per downsampled output. You'll need to fetch 4 raw data from memory, add them together, shift (divide by 4), and store to the output buffer. Since the data will be 16 bits (FlexIO can't do 12, so you must have 16 with 4 zeros) maybe you can use techniques like in the audio library to bring in 2 samples per 32 bit bus read and use the DSP extension instructions to do the addition step. Complicated, but that's probably the sort of effort needed to get this to happen with 2 streams ot 65 Msample/sec using only a 600 MHz CPU.

Not impossible, but pretty unlikely you'll succeed in pushing the hardware this far.
 
Acquiring from the ADC at that speed will also be quite a challenge. FlexIO2 with DMA is probably the only realistic hope. Recently 16 bit high speed FlexIO reception has been discussed. Maybe you can find those conversations, or someone might give a link to those.

Implementing the downsampling code might also be quite a challenge. On average, you'll have only about 9.23 CPU cycles per incoming data. Or maybe a better way to think about it would be almost 37 CPU cycles per downsampled output. You'll need to fetch 4 raw data from memory, add them together, shift (divide by 4), and store to the output buffer. Since the data will be 16 bits (FlexIO can't do 12, so you must have 16 with 4 zeros) maybe you can use techniques like in the audio library to bring in 2 samples per 32 bit bus read and use the DSP extension instructions to do the addition step. Complicated, but that's probably the sort of effort needed to get this to happen with 2 streams ot 65 Msample/sec using only a 600 MHz CPU.

Not impossible, but pretty unlikely you'll succeed in pushing the hardware this far.
I aimed to overclock it to overcome any CPU cycle limitations and dedicate a USB port on a machine to achieve full speed. But it seems even if I managed to use two samples per 32 bits bus to gain CPU cycles, any drop in the rate from USB side would cause data loss without a huge pool of RAM to maintain data flow. I am not sure if Teensy DMA can handle SDRAM and two 12bit ADC.
 
RAM won't be the bottleneck. DTCM is plenty fast enough.

The 3 difficult tasks (in likely order of difficulty) are 1: FlexIO with DMA to acquire the data, 2: Optimized code on both Teensy and PC to sustain USB transfer speed, 3: Optimized code to process / downsample the data.

Don't underestimate how hard the PC will be. You might think 600 MHz Teensy would be the limiting factor, that a fast PC with gigabytes of RAM and many multi-GHz CPU cores wouldn't be a problem, but over and over on this forum we've had people discover (including me in the early days of Teensy 4.0) that performance the PC side often ends up the speed limiting factor.
 
Would a USB host storage device be able to maintain such a write speed if paired with NVMe USB storage? I am not aiming for a real-time solution, as the data will be post-processed on a PC anyway.
 
Back
Top