Teensy 3.1 raw hid dying intermittently.

Status
Not open for further replies.

greg

Well-known member
Hi,

I have a head scratcher of a problem I would really appreciate any help or thoughts on. We are Building a teensy 3.1 - controlled video camera tracking system.
Basically Teensy is controlling 2 HiTech servos (pan and tilt) on which camera sits. Position commands are sent from a PC to the T3.1 using Paul's rawhid library. Camera is also connected to the same PC and is pumping video frames back to the PC in realtime.
Unfortunately, the PC (Surface Pro) has only one USB 3 port, so both the teensy and the hidef camera traffic are going through the same port (via an un-powered USB hub).
PC CPU is at around 50-70% utilization, which includes a fair amount of processing that I do on vide frames once they arrive.

Everything works fine ... until RawHid messages stop arriving at Teensy after a period between 1 minute and 1 hour. Once it happens - that's it, no matter what I do on my PC side - rawhid is dead until I reboot the microcontroller. Interestingly the "loop()" is still running, I can verify that using serial monitor, controller is not frozen, its just that RawHID.recv(...) returns 0 when called from the loop. Boot loader is also running, i.e. I can re-flash the sketch and it all starts working again.

Any thoughts? Can USB traffic be a problem? Should I be looking for some electrical issues (I installed ferrite chokes all around, no efect)? Is it possible to I am crashing the rawhid library somehow by sending commands simultaneously (the frequency can be high sometimes).
More specifically, are there any guidelines on things are not advised to do on the PC side of raw USB library that may cause a crash on the microcontroller side? I have an weak hunch that the problem shows up earlier if the CPU on the PC is pegged.

The wiring diagram is quite trivial. Teesny is powered from USB (from the USB hub), the GND of teensy is connected to the GND of my power supply, and servos are sitting on pins 9 and 10, feeding from the power supply.

Thanks for any hints,
Greg
 
Any takers?
Has anyone seen teensy-side raw hid library dying, while the rest of the sketch works fine (loop is still running)?

Not much to the sketch ... can be condensed to:

void loop()
{
unsigned char readBuffer[64] = {0};
int n = RawHID.recv(readBuffer, 0); // <-- after a while - dies w/o recovery. No packets sent from PC appear here, returns 0 always, though loop() is still pumping

if (n>0)
{
// Do something useful
}
}
 
Looks like it may be something on the windows side. After this condition happens - (which can take from minutes to a few hours) - debugging the hid_WINDOWS.cpp reveals that the overlapped write times out:

Here is the raw hid client code, calling it with 100ms timeout (and of course other params are valid)
int rawhid_send(int num, void *buf, int len, int timeout)
{
hid_t *hid;
unsigned char tmpbuf[516];
OVERLAPPED ov;
DWORD n, r, lastErr = 0;

if (sizeof(tmpbuf) < len + 1)
return -1;

hid = get_hid(num);

if (!hid || !hid->open)
return -2;

EnterCriticalSection(&tx_mutex);
ResetEvent(&tx_event);
memset(&ov, 0, sizeof(ov));
ov.hEvent = tx_event;
tmpbuf[0] = 0;
memcpy(tmpbuf + 1, buf, len);
if (!WriteFile(hid->handle, tmpbuf, len + 1, NULL, &ov))
{
if (GetLastError() != ERROR_IO_PENDING)
goto return_error;

r = WaitForSingleObject(tx_event, timeout); // <-- TIMES OUT

if (r == WAIT_TIMEOUT)
goto return_timeout;

if (r != WAIT_OBJECT_0)
goto return_error;
}
if (!GetOverlappedResult(hid->handle, &ov, &n, FALSE))
goto return_error;

LeaveCriticalSection(&tx_mutex);

if (n <= 0)
return -1;

return n - 1;

return_timeout:
CancelIo(hid->handle);
LeaveCriticalSection(&tx_mutex);
return 0;

return_error:
lastErr = GetLastError();
// print_win32_err();
LeaveCriticalSection(&tx_mutex);
SetLastError(lastErr);
return -1;
}

Nothing I do on the windows side works. Teensy is still running (can see debug serial traces I put in the main loop) - but no hid packets arrive. Re-flashing the firmware immediately solves the problem, but its not an acceptable solution for the field.

Same code works fine on T3, the only difference is that the client machine running T3 is Win7, whereas T31 is ran from Win8
T31 is running at the same clock speed - 48mhz
 
I didn't get much from the attached code fragment and have done little w/ HID stuff on Teensy, but out of curiosity, how confident are you that the timeout-handling code does what it should? Perhaps the timeout handling needs work, but you were never running into timeouts on the T3+W7 setup?

I'm reluctant to suggest such a brute-force work around, but here goes: if all else fails you could pull the reset pin LOW for a soft reset. It should have the same "fix" as re-flashing, but without the impracticality/overhead.
 
Thanks for the suggestion. I thought about reset - but wanted to leave it as a measure of absolute last resort. Resetting MCU will also destroy internal sketch state (which controls servo acceleration).
The code is from Paul's sample hid client. I tried different timeout values to see if I can avoid the issue, or even make it easier to reproduce - and it has no effect. Its like the HID device is dead. Same exact code on the windows and teensy side works for months now in a different device (Win7+T3)
 
I resolved the problem, though cant say I understand what's been going on, or that I like the solution, which turned out to be mysteriously simple: switch the order of USB devices connected to the USB hub. I don't know enough about how USB operates to tell why I was having this problem - but the solution was just that - just swapped the positions of Teensy and video camera - and its been running for 30 hours now w/o an issue (previously, HID would die within an hour). I was using a fairly good quality hub.
 
I will have to hazard a guess it is Windows. I have seen some really weird stuff with Windows. USB devices will randomly reattach, or Windows decides it needs to inform you that a device could work better on another port even though it has worked perfectly on that port for months. Sometimes I think it is because a USB port had too much on the internal USB hub (yes there are USB hubs embedded in computers sometimes). So switching to another port and moving to a different hub within the computer might help.
 
Status
Not open for further replies.
Back
Top