USB Host port hardware questions

lukexyz

Active member
Why are surface solder pads used rather than vias for the D+/D- pins of the USB Host port on the Teensy 4.0 and 4.1? This makes mounting to a USB daughter board more difficult.

Do the D+/D- pins require dropping the voltage down from 5V to 3.3V when connecting the Teensy to a USB device? The documentation indicates this is required for all data pins.

Can the USB Device port be used in OTG mode, so that it can serve as a second USB host port?
 
Why are surface solder pads used rather than vias for the D+/D- pins of the USB Host port on the Teensy 4.0 and 4.1?

On Teensy 4.1 the USB host port is 5 through hole pins. Here's how it's meant to work with Teensy 4.1.

teensy41_usbhost2.jpg


On Teensy 4.0 the USB host signals are 2 bottom side pads. There simply was no room to do anything larger on Teensy 4.0. It was felt that bringing the signals to those pads was better than leaving them inaccessible.


Do the D+/D- pins require dropping the voltage down from 5V to 3.3V when connecting the Teensy to a USB device?

No. Those pins are meant to connect directly.

USB doesn't use 5V signals. Only the power is 5V.


Can the USB Device port be used in OTG mode, so that it can serve as a second USB host port?

This really depends on the meaning of the word "can".

No software support exists. Even if such software were published, you would also have a very unpleasant experience of needing to unplug whatever USB device(s) you're using and plug into a cable to your PC, for each time you wanted to upload any new program onto your Teensy. So if "can" means doing so in with provided libraries and software, the answer is no.

But the hardware on each port is OTG capable. In theory you could probably make a modified copy of USBHost_t36 which access the other port. That would probably be the "easy" path where little or no new code is needed, but a mountain of minor edits are needed so the global names don't conflict between the 2 copies. Or maybe even just modify the code to allow 2 instances of the main EHCI driver, if you have more appetite for writing some code than renaming tons of stuff. Either way, you'll also need to build up some sort of USB signal mux or other way to take the pain out of swapping cables around for every code update. If you're willing to go to all that trouble, then yes, the hardware capability is there so you can do it with some large amount of work.

However, USBHost_t36 supports hubs, so if you want to connect more than 1 USB device just using a hub is the easy path.
 
On Teensy 4.1 the USB host port is 5 through hole pins. Here's how it's meant to work with Teensy 4.1.

Ah, OK, sorry, I was looking at the wrong section of the pinout (USB device rather than USB host). That's great.

This really depends on the meaning of the word "can".

No software support exists. Even if such software were published, you would also have a very unpleasant experience of needing to unplug whatever USB device(s) you're using and plug into a cable to your PC, for each time you wanted to upload any new program onto your Teensy. So if "can" means doing so in with provided libraries and software, the answer is no.

But the hardware on each port is OTG capable. In theory you could probably make a modified copy of USBHost_t36 which access the other port. That would probably be the "easy" path where little or no new code is needed, but a mountain of minor edits are needed so the global names don't conflict between the 2 copies. Or maybe even just modify the code to allow 2 instances of the main EHCI driver, if you have more appetite for writing some code than renaming tons of stuff. Either way, you'll also need to build up some sort of USB signal mux or other way to take the pain out of swapping cables around for every code update. If you're willing to go to all that trouble, then yes, the hardware capability is there so you can do it with some large amount of work.

However, USBHost_t36 supports hubs, so if you want to connect more than 1 USB device just using a hub is the easy path.

Since I need exactly two USB host ports during deployment, I may actually attempt this, so I do want to understand this better.

Is there any way to update the code on the Teensy 4.1 other than via the USB device port?

Does the standard method of updating code in the Teensy rely upon USBHost_t36? (I'm trying to understand what you are saying about needing to have two versions of the code.)

I do need the ability to update code during development. I don't mind swapping back and forth between a regular USB cable and an OTG cable for this. But if the code updating system in the Teensy doesn't depend upon USBHost_t36 (i.e. if only my code depends upon that library), then I don't understand why I would need two versions of the library. (Sorry, I am not familiar with the Teensy system yet -- I am at the stage of trying to pick a system for my project.)

Also I understand from what you wrote that the modifications to USBHost_t36 which would be needed to access the other port would be just changing of the primary port number or something, but that there are no actual changes needed to that library to support USB OTG per se, assuming that an OTG cable is used?

You said USBHost_t36 is designed to support hubs, so I assume it can access more than one port, but that the current code is hardwired to use just the host port for the first step in the USB chain? Is the library capable of treating the first step in the USB chain as a hub, so that both the on-board USB host port and the on-board USB device port (in OTG mode) are usable at the same time?

Will there be any hardware issues with using DMA transfers to/from both ports simultaneously (e.g. to copy a file from one USB Mass Storage device to another)? Is there sufficient memory bandwidth to support two concurrent transfers at 480Mbit/sec, so that reading from one drive can overlap with writing to the other drive, at the full USB 2.0 bandwidth?
 
Follow-up question: how can a USB OTG cable signal to the Teensy 4.1 that it is an OTG cable? Normally the Sense pin is shorted with GND in an OTG cable. But according to the Teensy 4.1 documentation, in the 5-pin USB Host header, there are two GND pins, and no Sense pin. Does switching to OTG mode therefore have to be done in software?
 
There is no support for OTG mode switching. USBHost_t36 is host only. I believe WZMX has an experimental port of some of the device code to run on the 2nd USB port, but that is also device only with no support for mode switching.

If you need OTG mode switch, you'll need to build everything yourself, from the ID pin detection to how power is handled to all the software.


Is there any way to update the code on the Teensy 4.1 other than via the USB device port?

USB device is the only supported way.


Does the standard method of updating code in the Teensy rely upon USBHost_t36?

No.


I'm trying to understand what you are saying about needing to have two versions of the code.

USBHost_t36 is hard coded to only use the 2nd USB port.

If you want it to use the 1st port, there are 2 ways. You could make a complete duplicate copy of the library which is hard coded to use only the 1st port. Then to use both ports, your program could use 2 separate libraries. Lots of inefficient duplication of code, lots of legwork to rename all global scope names, but relatively "easy" as you just change the hardware register names to access the 1st port. No "real" programming changes.

Or you could try to modify the library to use both USB ports. Obviously that is a more elegant and "cleaner" solution. Some programming required...


You said USBHost_t36 is designed to support hubs, so I assume it can access more than one port, but that the current code is hardwired to use just the host port for the first step in the USB chain?

Yes, that's correct.

You can create as many hub instances as you need and each manages the downstream ports on the connected hub.


Is the library capable of treating the first step in the USB chain as a hub, so that both the on-board USB host port and the on-board USB device port (in OTG mode) are usable at the same time?

Wow, so many questions rolled into one!

Yes, USBHost_t36 supports hubs. Since there is only 1 physical USB host port on Teensy 4.1, the *only* way to use a hub is plug it into Teensy's USB host port (which I assume is what you mean by "first step in the USB chain"). You can plug hubs into hubs into hubs if you need a lot of ports. Just make sure to put enough hub instances into your program, and remember most 7 & 10 port hubs on the market today are internally just a network of 4 port hubs.

Yes, the 1st and 2nd USB ports on Teensy are fully independent. So you can use USB host while the 1st port is working as USB device.

No, there is no support for OTG mode on either port. Maybe with a lot of software work you might get either or both ports to work as OTG mode switching, but we don't have any support for that in any way at this time. You're completely on you own there! You'll need to dive into NXP's reference manual for all questions on making OTG work.

I really want to emphasize how you'll be essentially starting from zero and need to everything yourself to get OTG mode switching to work.


Will there be any hardware issues with using DMA transfers to/from both ports simultaneously (e.g. to copy a file from one USB Mass Storage device to another)? Is there sufficient memory bandwidth to support two concurrent transfers at 480Mbit/sec,

Probably not an issue. The DMA goes over the AXI bus which is 64 bits wide at 150 MHz. That's a theoretical 1200 MByte/sec bandwidth. Both USB ports at sustained max 480 Mbit speed (not deducting substantial USB protocol overhead) is only 1/10th of that theoretical AXI bandwidth.

But as far as I know, we've never managed to get either port to run as sustained max theoretical bandwidth. It's not DMA, but software that ultimately limits the speed. With my PC, the fastest I've seen on device mode is about 50%. I believe Frank ran some tests which were faster, but still not at 80% (the amount bulk transfer is supposed to be able to use if everything is perfectly fast on both host and device). I recently did some testing with USB host MIDI transmit. The simple 2 ping-pong style buffer and normal code sending the messages achieves about 25%. It's pretty easy to get to 10-20% of the USB bandwidth, and with quite a bit of optimization work 50% or more is doable. But sustained use of it all for real applications requires extremely difficult optimizations. In many cases with device mode we've seen the main limitation isn't on the Teensy side, but overhead on the PC side.
 
I looked through the NXP documentation to try to find what would be necessary to switch port 1 from device to host mode. I found the info on the USB_nUSBMODE register for setting the mode. However, this code for USBHost_t36's ehci.cpp makes it look like the code changes to support two ports would not be trivial:

Code:
	if (stat & USBHS_USBSTS_TI0) { // timer 0 - used for built-in port events
		//println("timer0");
		if (port_state == PORT_STATE_DEBOUNCE) {
			port_state = PORT_STATE_RESET;
			// Since we have only 1 port, no other device can
			// be in reset or enumeration.  If multiple ports
			// are ever supported, we would need to remain in
			// debounce if any other port was resetting or
			// enumerating a device.
			USBHS_PORTSC1 |= USBHS_PORTSC_PR; // begin reset sequence
			println("  begin reset");
		} else if (port_state == PORT_STATE_RECOVERY) {
			port_state = PORT_STATE_ACTIVE;
			println("  end recovery");
			//  HCSPARAMS  TTCTRL  page 1671
			uint32_t speed = (USBHS_PORTSC1 >> 26) & 3;
			rootdev = new_Device(speed, 0, 0);
		}
	}

I really don't know enough about USB to make deep changes to USBHost_t36. I'm sure I will miss something like this, and would have no idea how to debug the problem.

I guess I'll just submit this as a feature request then, for a future version of USBHost_t36: since both supported USB ports are OTG-capable, it would be great if both could be operated as host ports. Thanks!

For now I guess I'll have to use a second hub board to get two host ports.
 
Or you could try to modify the library to use both USB ports. Obviously that is a more elegant and "cleaner" solution. Some programming required...

I changed USBHost_t36 to support both USB1 and USB2 in host mode:

https://github.com/PaulStoffregen/USBHost_t36/pull/61

I was able to get USB1 powered up and switching on devices. However there's a strange issue that I was not able to figure out, where the isr() interrupt callback function is not being called at the end of asynchronous transfers. I didn't change anything fundamental about the functionality of the library -- in fact by default, user code should still work exactly the same for using USB2 as the only host interface, even with these changes to USBHost_t36. I did everything I could to try to debug the code, and I looked closely at the spec for the chip to try to understand the interrupt system, but I couldn't figure it out.

Paul -- I'm guessing you're the only person who deeply understands this library, since you read all the specs and wrote the library. Could you please take a look at why my changes caused async interrupt callbacks to stop being called?

In general these changes are useful for USBHost_t36, since they move all host-port-specific code into a new USBHostPort class (rather than using global variables), which makes the library more futureproof. Please consider merging the pull request once the above issue is fixed.
 
I looked briefly at your pull request. You've made choices which require a massive number of changes throughout all the files. :(


Could you please take a look at why my changes caused async interrupt callbacks to stop being called?

The very first step is to look for whether you're getting "Async Followup" printed.

Code:
        if (stat & USBHS_USBSTS_UAI) { // completed qTD(s) from the async schedule
		[B][COLOR="#B22222"]println("Async Followup");[/COLOR][/B]
		//print(async_followup_first, async_followup_last);
		Transfer_t *p = async_followup_first;
		while (p) {
			if (followup_Transfer(p)) {

If the interrupt is happening, add more prints inside that while loop so you can see if it's actually finding the expired qTDs.

I see you've already uncommented printing inside followup_Transfer(). If that too is printing, so the interrupt is happening and the expired qTDs are found, then add more printing here:

Code:
		if (transfer->qtd.token & 0x8000) {
			// this transfer caused an interrupt
			if (transfer->pipe->callback_function) {
				// do the callback
				(*(transfer->pipe->callback_function))(transfer);
			}
		}

I see you changed the callback function pointer to a C++ lamba scheme. So if you're getting all the way to "// this transfer caused an interrupt", then maybe all you need to do is consider whether testing the pointer for non-NULL works with your changes to the callback, and whether this code is the proper way to call the completion function.

Of course if followup_Transfer() isn't getting called but you do get the async interrupt, then dig into the code which maintains the list of transfers.
 
I looked briefly at your pull request. You've made choices which require a massive number of changes throughout all the files. :(

Thanks for taking a look at the code changes.

Yes, the changes are extensive. However, I tried multiple entirely different approaches, and I really don't think it's possible to do this in any other reasonable way. Also I think these changes will improve the ability to change and improve the library in the future, by making the library more generic, and therefore more flexible and modifiable. The use of global memory allocations in the trunk version that are not tied to any specific USBHost instance is really not ideal.

Let me explain why the changes are so extensive though. There is a domino effect of required changes due to the topology of dependencies between the hardware registers, code, and memory allocations. Starting from just the very first requirement of using different registers and constants for each host port, every other step pretty much inevitably follows.

(1) There are 30 different register addresses and constants that are different for USB1 and USB2, e.g. USBHS_USBCMD needs to expand to either USB1_USBCMD or USB2_USBCMD depending on which host port is targeted. This means that these macros need a new host port number parameter, and each USBHost instance needs to store an integer host_port, so that the correct registers and constants can be used. (In general programs may instantiate multiple USBHost instances, one for each host port.)

(2) However, USBHost in your git repo contains only static methods, and no instance variables. Adding an instance variable like uint8_t host_port requires making some of these methods non-static so that they can write to hardware registers. Then there's a long and complex process of figuring out the entire call tree of which currently-static methods call these non-static methods, since only non-static methods can call non-static methods. This ends up turning the vast majority of methods in USBHost into non-static methods.

(3) Once most of USBHost is no longer static, it becomes obvious that the total number of global memory preallocations will be insufficient if the number of host ports increases. But more importantly, the memory ownership has to be looked at closely: for example, mixing transfers or pipes from multiple different host ports into a single linked list could cause all sorts of issues. So really every host port needs its own copy of the global memory allocations. Initially I tried making an array of multiple versions of these allocations, one per host port, then I indexed into these allocations using the USBHost::host_port field. But this was much uglier, and I realized the real issue here was that the topology of memory allocations needed to reflect the physical topology of the hardware. Therefore, I created the USBHostPort class, which moved all the global memory allocations into a new object instance, one per host port. The USBHost class now contains USBHostPort* usb_host_port, and uint8_t host_port was moved into USBHostPort, since the port number is a property of the port. All other host state variables were moved into USBHostPort too, e.g. the enumeration state.

(4) Tons of the methods in USBHost now contained code that repeatedly looked up usb_host_port->something, just so that the correct registers could be written and/or the correct memory allocations could be used. It became obvious that a lot of these methods exclusively acted on memory allocations and state variables in USBHostPort -- which makes conceptual sense, because they were doing work specific to a host port instance, and not something general to all active USB hosts. This made it obvious that a lot of the USBHost methods should actually be moved to USBHostPort, starting at the deepest call stack frames that wrote to hardware registers, and moving up from there.

(5) This all leads inevitably to the deepest and probably the trickiest change. Two methods are used as callbacks: isr() is attached as an interrupt vector, and enumeration() is added as a callback function that is called when a transfer is complete. However both of these need to access host-port-specific registers, which means that they need access to the host port number. However pointers to these functions have to be stateless -- you cannot use instance methods of a specific USBHostPort instance as a function pointer, due to C++ limitations. In other words, there need to be N different function pointers for each of the isr() and enumeration() functions, given N available host ports. This is what I am using C++ capture-less lambdas for, which you spotted in imxrt_usbhs.h. Each lambda expands to a piece of code that looks up the isr() or enumeration() instance method for a specific USBHostPort instance. That instance method now has access to its own host port number, so the correct host-port-specific registers and constants can be used.

(6) The only real change that I made that was arguably optional was that at the end of all the above refactoring, USBHost didn't contain many methods anymore, except for a large number of static println_ and printf_ methods etc. -- I moved these into a new class, PrintDebug, just to make things cleaner. This change doesn't have much impact, ultimately, because every source file that used them had #define println USBHost:: println_ etc. at the top, this could just be changed to #define println PrintDebug:: println_. This change could be in a separate pull request, but I had already refactored out so much of USBHost that it made sense to move these last static methods that were unrelated to USBHost into their own class.

The very first step is to look for whether you're getting "Async Followup" printed.

Code:
        if (stat & USBHS_USBSTS_UAI) { // completed qTD(s) from the async schedule
		[B][COLOR="#B22222"]println("Async Followup");[/COLOR][/B]
		//print(async_followup_first, async_followup_last);
		Transfer_t *p = async_followup_first;
		while (p) {
			if (followup_Transfer(p)) {

Sorry, I should have explained this better -- when I said in the PR that "the isr() callback is never called for async-scheduled transfers, i.e. where (stat & USBHS_USBSTS_UAI) is nonzero in the isr() function", I was trying to say that this "Async Followup" code is never called. I then mentioned "However, isr() is still called for other non-async transfer types", which is referring to the fact that other blocks in this same method, such as (stat & USBHS_USBSTS_PCI) for "port change detected", are in fact called. This is why this is so mysterious: the isr() callback does in fact work, but only for non-async interrupts.

If the interrupt is happening, add more prints inside that while loop so you can see if it's actually finding the expired qTDs.

I see you've already uncommented printing inside followup_Transfer(). If that too is printing, so the interrupt is happening and the expired qTDs are found, then add more printing here:

followup_Transfer() is not being called for asynchronous transfers, because isr() is never called when (stat & USBHS_USBSTS_UAI) is nonzero. I expect if the isr() call was called when it was supposed to be, all the other followup mechanisms would work fine.

I see you changed the callback function pointer to a C++ lamba scheme. So if you're getting all the way to "// this transfer caused an interrupt", then maybe all you need to do is consider whether testing the pointer for non-NULL works with your changes to the callback, and whether this code is the proper way to call the completion function.

Yes, I looked at that too, but that code is never even called, for the above reason.

Of course if followup_Transfer() isn't getting called but you do get the async interrupt, then dig into the code which maintains the list of transfers.

I looked over the code that maintains the list of transfers many times, but I didn't change that code, as far as I can tell. The only thing I can think I may have missed is some assumption about the way you were doing memory allocations, which broke something due to changing reuse of an object to non-reuse, or vice versa, once I moved the global memory allocations to instance variables of USBHostPort.

I ran into one issue like this already in driver_ready_for_device(USBDriver *driver). Depending on how I called the library, it was possible for drivers to try to add themselves twice to the list of available drivers, which meant that the same Driver_t allocation could be added twice to available_drivers. In that case the driver's next field would point back to the driver itself -- therefore anything traversing this list would run into an infinite loop. Making the following changes fixed the infinite loop:

Code:
void USBHostPort::driver_ready_for_device(USBDriver *driver)
{
	driver->device = NULL;
	driver->next = NULL;
	if (available_drivers == NULL) {
		available_drivers = driver;
	} else if (available_drivers != driver) {
		// append to end of list
		USBDriver *last = available_drivers;
		while (last->next) {
			if (last == driver) {
				// Driver already in list (avoid infinite loop if added twice)
				return;
			}
			last = last->next;
		}
		last->next = driver;
	}
}

However for the issue of async callbacks not working, I haven't found any other infinite loops that may be getting triggered -- the transfer is completely scheduled, then the scheduling code exits.

Do you have any other ideas?
 
Last edited:
By the way, while even the i.MX RT1170 processor still has just two USB OTG ports, it is entirely conceivable that future chips will have four USB OTG ports, and at some point the single-host limitation of USBHost_t36 will be an issue (it already is an issue for my particular application, where I need to use two USB devices but I can't use a hub, due to size constraints). It will be worth getting USBHost_t36 ready for a multi-host future.

It was an enormous amount of work to tease apart the logical structure of the USBHost_t36 library to figure out how to get it to work with multiple hosts, but the work is already basically done. The issue I mentioned needs to be fixed, and the pull request probably needs some further finessing so that you're happy with how the changes integrate. But I hope we can find a way to come up with a solution that satisfies all requirements.
 
I'm still trying to debug this.

If I insert the following code at the exit from queue_Transfer:

Code:
	while (!(halt->qtd.token & 0x40)) {
		println("TOKEN: ", halt->qtd.token, HEX);
		halt = (Transfer_t *)(halt->qtd.next);
	}

then for the queue_Control_Transfer that fails to trigger the asynchronous transfer interrupt on completion, I get:

Code:
TOKEN: 80280
TOKEN: 80080180
TOKEN: 80008080

Bit 15 is set as expected in the third qTD (i.e. 0x8000), which should trigger an interrupt on completion, according to this document:

https://www.nxp.com/docs/en/application-note-software/AN3520.pdf

The third of these output lines corresponds with the third init_qTD call inside queue_Control_Transfer, which uses irq = true in the last parameter position to request an interrupt:

Code:
    init_qTD(status, NULL, 0, status_direction, 1, true);

And yet adding println("ISR: ", stat, HEX); to the beginning of isr() shows that this interrupt vector is never called when the irq bit (bit 15) is set inside qtd.token. I would expect this debug line in isr() to print ISR: 4xxxx after an interrupt-on-completion transfer is scheduled (and indeed for the original library, that is what I see).

I'll keep digging, but I'm really not sure what is going on here.
 
Last edited:
I can also verify that this code in the new_Pipe method is executing with host_port==2 before the async transfer is initiated, so async interrupts should be enabled on the host:

Code:
			pipe->qh.capabilities[0] |= 0x8000; // H bit
			pipe->qh.horizontal_link = (uint32_t)&(pipe->qh) | 2; // 2=QH
			USBHS_ASYNCLISTADDR_(host_port) = (uint32_t)&(pipe->qh);
			USBHS_USBCMD_(host_port) |= USBHS_USBCMD_ASE; // enable async schedule

I really don't know what else to look at at this point. I have studied both the spec and the code, and debugged the code as best I could with numerous print statements. I even inserted print statements on entry to every function in the entire library, and compared the original library's execution path to my version -- the execution path is exactly the same for both library versions, up to the point where the async interrupt is not generated.

Paul -- I would appreciate it if you could please take a closer look. Thanks!
 
Back
Top