How to Achieve 8000Hz with a Xinput Controller on Teensy 4.0?

iori

Member
Hello,

I purchased a Teensy 4 with the aim of creating a high polling rate game controller, focusing specifically on the high-speed USB capabilities of the device.

I created a loop that sends Xinput at 125us as shown in the code below:
Code:
#include <XInput.h>

#define NUM_BUTTONS 16
const uint8_t BUTTON_PINS[NUM_BUTTONS] = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17};

boolean buttonAState = false;

void setup()
{
    for (int i = 0; i < NUM_BUTTONS; i++)
    {
        pinMode(BUTTON_PINS[i], INPUT_PULLUP);
    }
    XInput.setAutoSend(false);
}

void loop()
{
    buttonAState = !buttonAState; // Toggle the state
    XInput.setButton(BUTTON_A, buttonAState);

    XInput.send();

    delayMicroseconds(125); // 800
}

To verify the transmission intervals, I checked them using a Python program:
Code:
from inputs import get_gamepad
import time

prev_time = time.time() * 1000000

while True:
    events = get_gamepad()
    for event in events:
        current_time = time.time() * 1000000
        time_diff = current_time - prev_time
        prev_time = current_time
        print("Time difference:", time_diff, "μs")
        print(event.ev_type, event.code, event.state)

I was expecting an output of 125μs, but the actual output is 1000μs, indicating that the program seems to be running at 1000Hz.

I've also tried changing the bInterval in the Teensy Xinput library's usb_desc_c to 1, but this did not seem to have any effect.

What could I be overlooking here? Any suggestions or insights would be highly appreciated.

https://github.com/dmadison/Arduino...fa4/teensy/avr/cores/teensy4/usb_desc.c#L2819
Code:
#ifdef XINPUT_INTERFACE
    // configuration for 480 Mbit/sec speed
    // Interface 0
    9,    // bLength (length of interface descriptor 9 bytes)
    4,    // bDescriptorType (4 is interface)
    0,    // bInterfaceNumber (This is interface 0)
    0,    // bAlternateSetting (used to select alternate setting.  notused)
    2,    // bNumEndpoints (this interface has 2 endpoints)
    0xFF, // bInterfaceClass (Vendor Defined is 255)
    0x5D, // bInterfaceSubClass
    0x01, // bInterfaceProtocol
    0,    // iInterface (Index of string descriptor for describing this notused)
    // Some sort of common descriptor? I pulled this from Message Analyzer dumps of an actual controller
    17, 33, 0, 1, 1, 37, 129, 20, 0, 0, 0, 0, 19, 2, 8, 0, 0,
    // Endpoint 1 IN
    7,          // bLength (length of ep1in in descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x81,       // bEndpointAddress (0x81 is IN1)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 4 frames)
    // Endpoint 2 OUT
    7,          // bLength (length of ep2out in descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x02,       // bEndpointAddress (0x02 is OUT2)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 8 frames)
    // Interface 1
    9,    // bLength (length of interface descriptor 9 bytes)
    4,    // bDescriptorType (4 is interface)
    1,    // bInterfaceNumber (This is interface 1)
    0,    // bAlternateSetting (used to select alternate setting.  notused)
    4,    // bNumEndpoints (this interface has 4 endpoints)
    0xFF, // bInterfaceClass (Vendor Defined is 255)
    0x5D, // bInterfaceSubClass (93)
    0x03, // bInterfaceProtocol (3)
    0,    // iInterface (Index of string descriptor for describing this notused)
    // A different common descriptor? I pulled this from Message Analyzer dumps of an actual controller
    27, 33, 0, 1, 1, 1, 131, 64, 1, 4, 32, 22, 133, 0, 0, 0, 0, 0, 0, 22, 5, 0, 0, 0, 0, 0, 0,
    // Endpoint 3 IN
    7,          // bLength (length of ep3in descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x83,       // bEndpointAddress (0x83 is IN3)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 2 frames)
    // Endpoint 4 OUT
    7,          // bLength (length of ep4out descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x04,       // bEndpointAddress (0x04 is OUT4)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 4 frames)
    // Endpoint 5 IN
    7,          // bLength (length of ep5in descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x85,       // bEndpointAddress (0x85 is IN5)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 64 frames)
    // Endpoint 5 OUT (shares endpoint number with previous)
    7,          // bLength (length of ep5out descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x05,       // bEndpointAddress (0x05 is OUT5)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 16 frames)
    // Interface 2
    9,    // bLength (length of interface descriptor 9 bytes)
    4,    // bDescriptorType (4 is interface)
    2,    // bInterfaceNumber (This is interface 2)
    0,    // bAlternateSetting (used to select alternate setting.  notused)
    1,    // bNumEndpoints (this interface has 4 endpoints)
    0xFF, // bInterfaceClass (Vendor Defined is 255)
    0x5D, // bInterfaceSubClass (93)
    0x02, // bInterfaceProtocol (3)
    0,    // iInterface (Index of string descriptor for describing this notused)
    // Common Descriptor.  Seems that these come after every interface description?
    9, 33, 0, 1, 1, 34, 134, 7, 0,
    // Endpoint 6 IN
    7,          // bLength (length of ep6in descriptor 7 bytes)
    5,          // bDescriptorType (5 is endpoint)
    0x86,       // bEndpointAddress (0x86 is IN6)
    0x03,       // bmAttributes (0x03 is interrupt no synch, usage type data)
    0x20, 0x00, // wMaxPacketSize (0x0020 is 1x32 bytes)
    1,          // bInterval (polling interval in frames 64 frames)+
    // Interface 3
    // This is the interface on which all the security handshaking takes place
    // We don't use this but it could be used for man-in-the-middle stuff
    9,    // bLength (length of interface descriptor 9 bytes)
    4,    // bDescriptorType (4 is interface)
    3,    // bInterfaceNumber (This is interface 3)
    0,    // bAlternateSetting (used to select alternate setting.  notused)
    0,    // bNumEndpoints (this interface has 0 endpoints ???)
    0xFF, // bInterfaceClass (Vendor Defined is 255)
    0xFD, // bInterfaceSubClass (253)
    0x13, // bInterfaceProtocol (19)
    4,    // iInterface (Computer never asks for this, but an x360 would. so include one day?)
    // Another interface another Common Descriptor
    6, 65, 0, 1, 1, 3
#endif // XINPUT_INTERFACE
};
 
The library you're using is not from PJRC (Teensy), but rather an independently-developed library with a modified Teensy core, so it might be best to pose your question to the library author via the github page.
 
As mentioned, the code you pointed to modifies the Teensy core code. So you may want to ask the owner of that github project, for information/help.

But a few things...

I believe that USB 2 protocol does 1000 frames per second. There are micro-frames - but I don't remember how those apply in these cases.

Side note, I noticed that the code was modified to use Endpoint 1...I know Paul reserved endpoint 1 for some future undisclosed reason. So if/when he starts using it for that purpose you run the risk...

Don't really know what your timing is...
That is:
Code:
void loop()
{
    buttonAState = !buttonAState; // Toggle the state
    XInput.setButton(BUTTON_A, buttonAState);

    XInput.send();

    delayMicroseconds(125); // 800
}

Will not run 8000 iterations per second. As your delayMicroseconds assumes that all of the rest of the code takes no time to execute.
How far off it is, I don't know. For sure entering and leaving loop back to main() will take time as there is a call to yield() between the calls to loop().

But more importantly, how long does the call to XInput.send() take?
I believe it probably is calling:
Code:
// Function used to send packets out of the TX endpoint
// This is used to send button reports
int usb_xinput_send(const void *_buffer, uint8_t nbytes)
{
  if (nbytes > TX_BUFSIZE) return -1;
  if (!usb_configuration) return -1;
  uint32_t head = tx_head;
  transfer_t *xfer = tx_transfer + head;
  uint32_t wait_begin_at = systick_millis_count;
  while (1) {
    uint32_t status = usb_transfer_status(xfer);
    if (!(status & 0x80)) {
      if (status & 0x68) {
        // TODO: what if status has errors???
        //printf("ERROR status = %x, i=%d, ms=%u\n",
        //        status, tx_head, systick_millis_count);
      }
      transmit_previous_timeout = 0;
      break;
    }
    if (transmit_previous_timeout) return -1;
    if (systick_millis_count - wait_begin_at > TX_TIMEOUT_MSEC) {
      // waited too long, assume the USB host isn't listening
      transmit_previous_timeout = 1;
      return -1;
    }
    if (!usb_configuration) return -1;
    yield();
  }
  delayNanoseconds(30); // TODO: why is status ready too soon?
  uint8_t *buffer = txbuffer + head * TX_BUFSIZE;
  memcpy(buffer, _buffer, nbytes);
  usb_prepare_transfer(xfer, buffer, nbytes, 0);
  arm_dcache_flush_delete(buffer, TX_BUFSIZE);
  usb_transmit(XINPUT_TX_ENDPOINT, xfer);
  if (++head >= TX_NUM) head = 0;
  tx_head = head;
  return 0;
}

It looks like the code has setup for 4 TX Buffers and transfers. Is that enough for high speed?

Again some of my USB stuff is rusty and I most of my time was on the host side...
 
First, there was a problem with the test script.
prev_time = time.perf_counter_ns() / 1000
should have been used. 
 
Back
Top