USB 2.0 Teensy port at 8000 Hz poll rate?

GDouglas

Active member
Hello,

Is there way to custom-modify the Teensy USB implementation to permit an 8000 Hz poll rate for HID transmissions?

USB 2.0 supports this poll rate with Windows 10 (0.125ms per poll), and Teensy 4.0 has a USB 2.0 port.
 
The descriptor in usb_desc.h sets the poll rate already at the minimal value 1 which corresponds to 8kHz (125µs) for an USB-2.0 device. To test it, I did the following quick experiment with a win10 host. The Teensy firmware (below) provides Raw HID reports to be polled by the PC as fast as possible and writes the start time into the first 4 bytes of the report.

(T4.0, Mode RAW HID):
Code:
uint8_t report[64];

void setup(){
}

void loop()
{
    uint32_t ticks = micros();
    memcpy(report, &ticks, sizeof(uint32_t));

    usb_rawhid_send(report, 100);
}

The PC Software (here c#) reads in reports as fast as possible, extracts the send time from the data and displays the duration between two frames (µs). Ideally this time would be 125µs.
Code:
using HidLibrary;
using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;

namespace receiver
{       
    class Program
    {
        static HidFastReadDevice teensy;
        static BackgroundWorker worker = new BackgroundWorker();               
        static UInt32 oldTime = 0;
        static UInt32 dtTeensy = 0;
                
        static void Main(string[] args)
        {
            // set up a  worker thread to read in HID frames as quickly as possible. 
            // read out the data (send time, filled in by the Teensy) and calculate dt

            worker.DoWork += async (o, s) =>
            {                                    
                while (true)
                {                  
                    var report = await teensy.FastReadReportAsync();                          
                                                         
                    var newTime = BitConverter.ToUInt32(report.Data, 0);
                    dtTeensy = newTime - oldTime;
                    oldTime = newTime;
                }
            };
                        
            // open the RAW Hid interface
            var enumerator = new HidFastReadEnumerator();
            teensy = (HidFastReadDevice)enumerator.Enumerate(0x16C0, 0x0486)    // 0x486 -> usb type raw hid
                     .Where(d => d.Capabilities.Usage == 0x200)              // usage 0x200 -> RawHID usage
                     .FirstOrDefault();

            if (teensy != null)
            {
                teensy.OpenDevice();
                worker.RunWorkerAsync();

                while (!Console.KeyAvailable)
                {
                    Console.WriteLine($"dt: {dtTeensy} µs");
                }
                teensy.Dispose();
            }
        }
    }
}

Here the output:
Code:
dt: 125 µs
dt: 125 µs
dt: 375 µs
dt: 125 µs
dt: 125 µs
dt: 250 µs
dt: 250 µs
dt: 250 µs
dt: 250 µs
dt: 1875 µs
dt: 250 µs
dt: 500 µs
dt: 125 µs
dt: 133 µs
dt: 374 µs
dt: 124 µs
dt: 250 µs
dt: 375 µs
dt: 375 µs
dt: 250 µs
dt: 375 µs
dt: 250 µs
dt: 125 µs
dt: 125 µs
dt: 250 µs
dt: 375 µs
dt: 250 µs
dt: 375 µs

You can see that the max HID-Report rate is 8kHz (125µs between HID reports) indeed. However, sometimes it takes more than one USB frame (multiples of 125µs). Difficult to tell if this is caused by the Teensy or by the PC. I suspect that PC is not polling fast enough and not the Teensy sending too slow. But this is a gut feeling only.
 
Last edited:
I'm so glad you tested this! Whether 125us response is actually possible from PCs been on my list of things to someday check. Almost all of my testing has involved looked at the USB packets with a protocol analyzer displaying the communication on another machine, which only confirms that it should theoretically be possible, but not whether it can actually be achieved given all the limitations the PC's operating system imposes.

To get more consistent performance, you would probably need to do something special so this program runs with "real time" or otherwise higher priority scheduling.
 
To get more consistent performance, you would probably need to do something special so this program runs with "real time" or otherwise higher priority scheduling.

I did some more experiments on that with a (at least for me) surprising result: I always assumed, that some PC application needs to actively read the HID reports to 'remove them from the bus'. This is not the case.
Instead, the HID driver always reads available reports in the background and buffers them for potential user applications.

I therefore rewrote the test in a more reasonable way. The firmware now simply copies a running number into the report and prints the time between sending of two reports. The running number is used to check for missed reports on the PC side.
Code:
uint8_t report[64];

void setup(){
}

uint32_t t0 = 0;
uint32_t cnt = 0;

void loop()
{
    memcpy(report, &cnt, sizeof(uint32_t));  // copy the report number into the report
    usb_rawhid_send(report, 1000);

    uint32_t now = micros();                 // measure time between sending of two reports
    Serial.println(now - t0);
    t0 = now;
    cnt++;
}
This results in a very stable rate of 125µs per loop.

I also rewrote the windows part of the test. It now reads 5000 reports (5000*64 = 320'000 byes) from the bus without printing and analyzes the results later. Here the analysis result:
Code:
Reports:    5000 
Data:       320000 bytes
Total time: 0.62 s
Throughput: 501.3 kB/s  (target: 64B/125µs = 500 kB/s)
t_min:      3.5 µs (@ 12)
t_max:      1689.7 µs (@ 2177)

It is interesting to see that the througput (501kB/s) i sa little bit higher than the theoretical rate of 500kB/s. There where no reports lost.

Looking at the beginning of the time series one sees the buffering effect:
Code:
Nr: 1 delta: 60.3 µs
Nr: 2 delta: 4.9 µs
Nr: 3 delta: 4.1 µs
Nr: 4 delta: 3.8 µs
Nr: 5 delta: 4.1 µs
Nr: 6 delta: 3.6 µs
Nr: 7 delta: 3.7 µs
Nr: 8 delta: 3.9 µs
Nr: 9 delta: 3.9 µs
Nr: 10 delta: 3.9 µs
Nr: 11 delta: 3.7 µs
Nr: 12 delta: 3.6 µs
Nr: 13 delta: 3.5 µs
Nr: 14 delta: 3.6 µs
Nr: 15 delta: 3.5 µs
Nr: 16 delta: 3.6 µs
Nr: 17 delta: 3.8 µs
Nr: 18 delta: 108.7 µs
Nr: 19 delta: 123.4 µs
Nr: 20 delta: 125.5 µs
Nr: 21 delta: 125.0 µs

The first frames obviously come from a buffer which is quite fast. After starving the buffer they arrive with the expected data rate. Buffering also works if the application can't read the current reports. See #2178 after which the code catches up the large delay of 1689.7 µs.

Code:
Nr: 2174 delta: 135.7 µs
Nr: 2175 delta: 107.4 µs
Nr: 2176 delta: 59.9 µs
Nr: 2177 delta: 125.9 µs
Nr: 2178 delta: 1689.7 µs  <======
Nr: 2179 delta: 6.9 µs
Nr: 2180 delta: 3.9 µs
Nr: 2181 delta: 11.8 µs
Nr: 2182 delta: 78.9 µs
Nr: 2183 delta: 5.7 µs
Nr: 2184 delta: 3.9 µs
Nr: 2185 delta: 3.7 µs
Nr: 2186 delta: 3.7 µs
Nr: 2187 delta: 3.6 µs
Nr: 2188 delta: 3.5 µs
Nr: 2189 delta: 3.7 µs
Nr: 2190 delta: 3.6 µs
Nr: 2191 delta: 3.6 µs
Nr: 2192 delta: 130.5 µs
Nr: 2193 delta: 106.5 µs
Nr: 2194 delta: 107.9 µs
Nr: 2195 delta: 161.7 µs
Nr: 2196 delta: 108.6 µs

All in all: On Win10, the HID transfer works very stable at the advertised rate of 8000 reports per second (500kB/s). Even with a high level language like C# which is not famous for speed :). I also changed the poll rate in the descriptor to 2 and 4 which throttles the transmission exactly as it should.


Here the C# code in case someone wants to repeat the experiments:
Code:
using HidLibrary;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace receiver
{
    class Program
    {
        class dataPoint
        {
           public UInt32 count;
           public TimeSpan time;
        }

        const int nrOfReports = 5000;
                
        static void Main(string[] args)
        {                                  
            var enumerator = new HidFastReadEnumerator();                      

            var teensy = (HidFastReadDevice)enumerator.Enumerate(0x16C0, 0x0486)   // Get all devices with vid/pid 0x16C0/0x486 -> usb type raw hid
                         .Where(d => d.Capabilities.Usage == 0x200)                // filter by usage 0x200 -> RawHID device (not serEmu!)
                         .FirstOrDefault();                                        // take the first of the found devices or return null

            if (teensy != null)
            { 
                var data = new List<dataPoint>();                             
                var stopwatch = new Stopwatch();
                stopwatch.Start();
                teensy.OpenDevice();                             // open the device
                            
                for (int i = 0; i < nrOfReports; i++)            // read the reports, store report number and timestamp
                {                    
                    var report = teensy.FastReadReport();
                    data.Add(new dataPoint
                    {
                        time = stopwatch.Elapsed,
                        count = BitConverter.ToUInt32(report.Data, 0)
                    }); 
                }

                int errors = 0;
                
                for (int i = 1; i < data.Count; i++)             // print data
                {
                    var dt = data[i].time - data[i - 1].time;
                    var cnt = data[i].count;
                    if (data[i].count - data[i - 1].count != 1) errors++; // check for missed reports

                    Console.WriteLine($"Nr: {cnt - data.First().count} delta: {dt.TotalMilliseconds*1000:F1} µs");
                }

                var totalTime = data.Last().time - data.First().time;
                var totalData = data.Count * 64;
                var througput = totalData / totalTime.TotalSeconds / 1024;
                
                Console.WriteLine();
                Console.WriteLine($"Reports:    {data.Count} s");
                Console.WriteLine($"Data:       {totalData} bytes");
                Console.WriteLine($"Total time: {totalTime.TotalSeconds:F2} s");
                Console.WriteLine($"Throughput: {througput:F1} kB/s  (target: 64B/125µs = 500 kB/s)");

                var times = Enumerable.Range(1, data.Count - 1).Select(i => data[i].time - data[i - 1].time).ToList();
                var tMin = times.Min();
                var tMax = times.Max();
                var maxIdx = times.IndexOf(tMax);
                var minIdx = times.IndexOf(tMin);

                Console.WriteLine($"t_min:      {tMin.TotalMilliseconds*1000:F1} µs (@ {minIdx})");
                Console.WriteLine($"t_max:      {tMax.TotalMilliseconds*1000:F1} µs (@ {maxIdx})");

                teensy.Dispose();
                while (Console.ReadKey().Key != ConsoleKey.Escape);
            }
        }
    }
}
 
I'm so glad you tested this! Whether 125us response is actually possible from PCs been on my list of things to someday check.
It certainly is. Reliable 0.125us is possible with an optimized Windows 10 computer (cherrypicked hardware, clean install, and setup). Some commercial gaming peripherals are now testing 4000 Hz and 8000 Hz operations already!

I'll have to figure out where the buffering is occuring though; for best realtime performance I do find it best to make sure the high-pollrate device is on its own dedicated USB chip though. For example, if you use a PCI Express USB card, you have to make the 8KHz device the only USB device plugged in, for the maximum reliability for timing-critical applications, since poll reliability can be sensitive to contention by other USB devices.
 
It certainly is. Reliable 0.125us is possible with an optimized Windows 10 computer (cherrypicked hardware, clean install, and setup). Some commercial gaming peripherals are now testing 4000 Hz and 8000 Hz operations already!
Actually I tested this on my old computer which I bought used some 3 years ago (i5, 2.9GHz, 6GB). It doesn't even have USB3 and is crammed with stuff accumualted over the years. Zoom and Teamviewer run in the background, the Teensy was connected on a cheap (<5 EUR) hub together with another Teensy... Together with the HID transfer the same Teensy spit out the time stamps over USB-Serial to TyCommander. So, nothing special at all.
 
Actually I tested this on my old computer which I bought used some 3 years ago (i5, 2.9GHz, 6GB). It doesn't even have USB3 and is crammed with stuff accumualted over the years. Zoom and Teamviewer run in the background, the Teensy was connected on a cheap (<5 EUR) hub together with another Teensy... Together with the HID transfer the same Teensy spit out the time stamps over USB-Serial to TyCommander. So, nothing special at all.
Interesting!

Either way, computer side can go 0.125us (with acceptable microjitter) for sustained periods without multi-poll clumping, if recent USB implementation and you optimize the computer properly (and keep only one high-pollrate branch per USB trunk, no background software).

In theory the Teensy 4.0 is powerful enough that it should not be the weak link if it already supports 0.125us polling on USB 2.0. Some gaming peripherals testing out 4000 Hz and 8000 Hz are using weaker microcontrollers than the Teensy 4.0

You may have heard of the Razer 8000 Hz gaming mouse and the AtomPalm Hydrogen 8000 Hz gaming mouse. (Some testers are showing a worthwhile human-visible difference, since it co-operates better with less jittering/rounding with new ever higher-refresh-rate monitors getting too close to 1000 Hz mouse poll rate)
 
Is there a way to connect a 125hz (8ms response) USB mouse to the teensy and have the teensy connect to the PC to allow the mouse to operate faster than 125hz (8ms)?
 
Raspberry Pi Pico lacks 480 Mbit/sec USB, so I don't see how it could possibly achieve 8 kHz HID polling interval. At most 1 kHz should be possible.
 
Oh, sorry.
I posted the link because there is an example of converting to 1kh using a raspberry pi pico and I thought it might be possible to do 8000Hz with Teensy 4. It is just an example.
 
Each endpoint has an interval number. Set it to 1 for maximum polling rate.

For example, on the joystick interface you'll find these lines in teensy4/usb_desc.h.

Code:
  #define JOYSTICK_INTERFACE    3       // Joystick
  #define JOYSTICK_ENDPOINT     6
  #define JOYSTICK_SIZE         12      //  12 = normal, 64 = extreme joystick
  #define [COLOR="#FF0000"]JOYSTICK_INTERVAL[/COLOR]     2
 
I've tested it and it appears to be 1000 Hz. 
Code:
#define NUM_BUTTONS 15
const uint8_t BUTTON_PINS[NUM_BUTTONS] = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};


void setup()
{
    // set up buttons
    for (int i = 0; i < NUM_BUTTONS; i++)
    {
        pinMode(BUTTON_PINS[i], INPUT_PULLUP);
    }
}
boolean button0State = false;

void loop()
{
    // Toggle the state of button 0
    button0State = !button0State;

    // read the digital inputs and set the buttons
    for (int i = 1; i < NUM_BUTTONS; i++)
    {
        if (i == 1)
        {
            Joystick.button(1, button0State);
        }
        else
        {
            Joystick.button(i, !digitalRead(BUTTON_PINS[i])); // active low buttons
        }
    }

    delayMicroseconds(125);
}

tester script
Code:
import joystickapi  
import msvcrt  
import time  

print("start")  

num = joystickapi.joyGetNumDevs()  
ret, caps, startinfo = False, None, None  
for id in range(num):  
  ret, caps = joystickapi.joyGetDevCaps(id)  
  if ret:  
    print("gamepad detected: " + caps.szPname)  
    ret, startinfo = joystickapi.joyGetPosEx(id)  
    break  
else:  
  print("no gamepad detected")  

run = ret  
prev_time = time.time() * 1000000  # initialize prev_time
while run:  
  if msvcrt.kbhit() and msvcrt.getch() == chr(27).encode(): # detect ESC  
    run = False  

  ret, info = joystickapi.joyGetPosEx(id)  
  if ret:  
    current_time = time.time() * 1000000
    time_diff = current_time - prev_time
    prev_time = current_time
    print("Time difference:", time_diff, "μs")  # print the time difference

   
print("end")

output
Time difference: 0.0 μs
Time difference: 1001.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 999.25 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
 
Connected at high speed.

Capture.jpg
 
That thread is about Xinput, so I think it is natural that the threads are separate. Here I am talking about Joystick. 
 
I did the same test on keyboard.
Code:
#include <Keyboard.h>

boolean uKeyState = false;

void setup()
{
    // Start Keyboard
    Keyboard.begin();
}

void loop()
{
    uKeyState = !uKeyState; // Toggle the state

    if (uKeyState)
    {
        // Press 'U' key
        Keyboard.press(KEY_U);
    }
    else
    {
        // Release 'U' key
        Keyboard.release(KEY_U);

        // Delay to make the key press noticeable
        delayMicroseconds(125);
    }
}

Code:
import keyboard
import time

prev_time = time.time() * 1000000

def on_key_event(e):
    global prev_time
    current_time = time.time() * 1000000
    time_diff = current_time - prev_time
    prev_time = current_time
    print('Time difference:', time_diff, 'μs')

keyboard.on_press(on_key_event)
keyboard.wait()

Looks like 1000 Hz. 
Time difference: 0.0 μs
Time difference: 999.0 μs
Time difference: 1000.25 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 1000.5 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 999.75 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 1000.0 μs
Time difference: 0.0 μs
Time difference: 999.5 μs
Time difference: 0.0 μs
Time difference: 1000.25 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 1000.0 μs
Time difference: 0.0 μs
Time difference: 1000.25 μs
Time difference: 0.0 μs
Time difference: 0.0 μs
Time difference: 1000.0 μs
 
I used time.perf_counter() and it improved

Time difference: 186.20000457763672 μs
Time difference: 433.99999237060547 μs
Time difference: 477.9000015258789 μs
Time difference: 235.70000457763672 μs
Time difference: 355.4000015258789 μs
Time difference: 385.7999954223633 μs
Time difference: 249.5 μs
Time difference: 161.1999969482422 μs
Time difference: 308.9000015258789 μs
Time difference: 390.5 μs
Time difference: 258.5999984741211 μs
Time difference: 330.6999969482422 μs
Time difference: 437.50000762939453 μs
Time difference: 247.6999969482422 μs
Time difference: 153.60000610351562 μs
Time difference: 335.1999969482422 μs
Time difference: 373.8999938964844 μs
Time difference: 224.50000762939453 μs
Time difference: 176.6999969482422 μs
Time difference: 395.3000030517578 μs
Time difference: 376.3999938964844 μs
Time difference: 154.3000030517578 μs
Time difference: 188.8000030517578 μs
Time difference: 387.0999984741211 μs
Time difference: 376.8999938964844 μs
Time difference: 184.4000015258789 μs
Time difference: 192.4000015258789 μs
Time difference: 443.1999969482422 μs
Time difference: 396.00000762939453 μs
 
Some of the comments on the other thread apply...

Code:
void loop()
{
    uKeyState = !uKeyState; // Toggle the state

    if (uKeyState)
    {
        // Press 'U' key
        Keyboard.press(KEY_U);
    }
    else
    {
        // Release 'U' key
        Keyboard.release(KEY_U);

        // Delay to make the key press noticeable
        delayMicroseconds(125);
    }
}
Actually, I don't know what you are really trying to measure. That is you only delay on the release?
Also don't know how much time that is taken with other things...

Maybe something like:
Code:
void loop()
{
  for(;;) {  // avoid yield timing. 
    uKeyState = !uKeyState; // Toggle the state
    uint32_t start_time = micros();

      if (uKeyState)
      {
        // Press 'U' key
        Keyboard.press(KEY_U);
      }
      else
      {
        // Release 'U' key
        Keyboard.release(KEY_U);
      }
      // Delay to make the key press noticeable
      uint32_t delta_micros = micros() - start_time;
      if (delta_micros < 125) delayMicroseconds(125 - delta_micros);
    }
}
Could also do with elapsedMicros. Also could do with loop() calls, by making variable global of last time sent and do the math from that.
 
Back
Top