Turn the Teensy3.1 into a JTAG Adapter

Status
Not open for further replies.

sled

New member
Hi,

I just received my Teensy3.1 (nice work!), and my goal is to create a JTAG Adapter for programming/debugging other chips. The plan is to provide an USB interface to the Teensy3.1 and writing a driver with libusb for the urJTAG utility (maybe OpenOCD in the future).

The JTAG uses a TCK clock signal and TDI and TMS are sampled on the rising edge of TCK, and TDO changes on the falling edge of TCK.

In my previous home-made JTAG Adapter, using an arduino, I simply used the delay function and toggled the TCK pin high/low. This worked great since the speed was limited anyway. With the powerful 72MHz of the Teensy3.1 I think it's not far fetched to reach for a 12MHz TCK clock signal, isn't it? But how would you implement this in an efficient way? My idea at the moment is:

- Bulk Transfer of TDI, TMS data (~2kB) into the buffer of the Teensy
- Starting a 12MHz timer (maybe make this user-selectable ranging from 10kHz to 12MHz?)
-> On Timer Interrupt:
-> Check if we are going to pull TCK high, if yes, fetch TDI, TMS state from the buffer and toggle pins
-> Check if we are going to pull TCK low, if yes, read TDO pin state and store in output buffer
-> No Data left? Stop Timer, set complete flag, bulk transfer buffer back to Host device

My Questions:

- What do you think, would this be possible?

- Is it a good idea to solve this with a timer or should I use a for-loop with delays?

- How would you go about sending a single clock pulse?

- Do you have some sort of code snippet or existing library code about setting up timers on the Teensy3?

And of course I'd make this an open-source project, maybe with a nice little PCB populating JTAG headers where you can put in your Teensy.

---
Simon
 
Last edited:
I've used the J-Link JTAG device for a long time. Its complexity, and the complex interface with the IDEs (like IAR, Keil, Eclipse), seems to me to be way too much to undertake.
The flash breakpoints are, for me, the great feature of the J-Link products. But that's complex too.
 
I don't know enough about you application to give a lot of intelligent answers. However, I can confirm that the teensy 3 manages to produce 12MHz PWM signals just fine. The library is pretty well documented and it's literally 'set and forget' if you like.

I'm using the 12MHz PWM pulse train as a oscillator input on a MCP3911. Saves a component or two, space, and works well. So that's where I would start for a generated clock signal.
 
I believe 12 MHz TCK is pretty ambitious, but might be possible, at least in pretty good bursts where the speed matters.

I would try to use the SPI port for the lengthy stretches where TMS doesn't need to change. Then I'd fall back to bit bashing when TMS is changing to navigate the JTAG state machine, and also for short sequences of less than 8 bits, or the last few bits in a long sequence. You can leave the SPI port configured and switch the pins between SPI and GPIO, but of course you'll need to access the hardware registers to do this.

For bit bashing, use digitalWriteFast().

However, even if you achieve 12 Mbit/sec, the main issue is likely to be latency associated with whatever protocol you end up using on the USB side. There are a few older threads on this forum with a LOT of info about USB latency, including a benchmark I wrote quite some time ago. If the search doesn't find them, use google with "site:forum.pjrc.com" and hopefully you can find that info.

Briefly, the main issue occurs with simple query-response protocols. If your PC sends a command and then waits for Teensy to reply before it sends the next command, you'll be doomed to slow performance. Most people think about communication this way. This type of code is the simplest to write. Time and time again I've seen people go down this path, always with poor results (at least performance-wise). If you want to achieve good performance, you really must avoid this design pattern.

Unfortunately, higher levels of JTAG protocols are complex. After a scan operation that writes and reads a DR chain, the next operation might be one of several things, depending on the data returned by that DR scan. If the PC needs to fully receive the DR scan data, parse it (involving latency of user level scheduling on the PC's operating system), before it can transmit another command, you'll be doomed to slow performance by the latency associated with USB's 1ms framing. Well, unless your scan chains are *extremely* long, but the typical 30 to 100 bit lengths will scan much faster than the USB latency for command-response.

To really achieve good throughput, you'll probably have to push some of the decision logic down to the Teensy. Of course that becomes far more complex than a simple protocol where the PC can request capture IR & DR scans and make all the decisions itself.


Regarding your specific question about timers, the simplest way on Teensy 3 is IntervalTimer. Here's the details.

http://www.pjrc.com/teensy/td_timing_IntervalTimer.html

I'm not sure how this helps, but there it is if you want it.
 
Hi,

sure I could get a J-Link, even at a reasonable price as a student but it's more like a learning project to get some experience with ARM, USB and JTAG :)

The complexity isn't that huge in fact... I spent quite some time researching various JTag adapters, most of them do nothing more than toggling PINs and all the logic is done on the host computer, setting break points is nothing more than toggling PINs in a certain sequence so that the state machine is in the correct state and the data register are written/read by toggling pins in a certain sequence too.

Most IDEs use an abstract JTAG adapter which generates the toggle sequence as byte array and then a specific driver for the connected adapter (J-Link, ByteBlaster, ...) is used to send the byte buffer, so there's not much logic on the JTAG device itself.

Actually a parallel port on your computer provides almost the same functionality as any commercial/open-source JTAG adapter.

If you're interested you can check out the USB protocol for the J-Link adapter (http://www.segger.com/admin/uploads/productDocs/RM08001_JLinkUSBProtocol.pdf). It has some nice features though, like wrappers for accessing the memory on an ARM7/9.

It would even be possible to emulate a JLink adapter with the Teensy, but as stated in the document above this isn't allowed and also not my intention :)

For the hardware, the J-Link uses a AT91SAM7S64 (ARM7, 55MHz, 32kB flash, 16kB RAM). Fun fact: the product page on ATmel shows the PCB of a J-Link adapter, compare http://www.atmel.com/devices/sam7s64.aspx to http://www.jstuber.net/lego/nxt-programming/jlink_internal.jpg
 
Last edited:
As is common, the GUI is 80% or more of the effort!

Re breakpoints... yes, hardware breakpoints, usually limited to 2 or 3.
Flash breakpoints are unlimited and make the debugger feel like a RAM based computer debugging session.
 
However, even if you achieve 12 Mbit/sec, the main issue is likely to be latency associated with whatever protocol you end up using on the USB side. There are a few older threads on this forum with a LOT of info about USB latency, including a benchmark I wrote quite some time ago. If the search doesn't find them, use google with "site:forum.pjrc.com" and hopefully you can find that info.

I don't think I'll need 12Mbit/sec over the USB bus, even the commercial adapters like J-Link don't achieve more than 600-800kb/s over the USB bus (by the datasheet). Only the JTAG clock runs at 12MHz and flushes the buffer (about 2kB) with 12Mbit/s. Also most of the processors don't even have that much RAM or Flash that 12Mbit/s from end-to-end is feasible.

Briefly, the main issue occurs with simple query-response protocols. If your PC sends a command and then waits for Teensy to reply before it sends the next command, you'll be doomed to slow performance. Most people think about communication this way. This type of code is the simplest to write. Time and time again I've seen people go down this path, always with poor results (at least performance-wise). If you want to achieve good performance, you really must avoid this design pattern.

Unfortunately, higher levels of JTAG protocols are complex. After a scan operation that writes and reads a DR chain, the next operation might be one of several things, depending on the data returned by that DR scan. If the PC needs to fully receive the DR scan data, parse it (involving latency of user level scheduling on the PC's operating system), before it can transmit another command, you'll be doomed to slow performance by the latency associated with USB's 1ms framing. Well, unless your scan chains are *extremely* long, but the typical 30 to 100 bit lengths will scan much faster than the USB latency for command-response.

Very true, I'm used to asynchronous transfers since I'm a web-developer primarily. I don't intend to write a complete JTAG library from scratch, what I want to do is to integrate my Teensy adapter into an existing JTAG framework by writing a driver and an abstraction layer for the framework. I chose urJTAG as it is has a good documentation and very readable code also most of the logic is done internally and a bit-bang byte buffer is sent to the device.

Unfortunately it uses a synchronous transfer (send, busy wait, receive). But I think that's not really a problem, as I want to start small.

For anyone who is interested in this, the abstracted interface for a JTAG adapter is defined here: https://github.com/luke-jr/urjtag/blob/at32uc3a/urjtag/include/urjtag/cable.h#L75 and implementations for existing adapters (like J-Link, etc.) can be found here: https://github.com/luke-jr/urjtag/tree/at32uc3a/urjtag/src/tap/cable

To really achieve good throughput, you'll probably have to push some of the decision logic down to the Teensy. Of course that becomes far more complex than a simple protocol where the PC can request capture IR & DR scans and make all the decisions itself.

This will be the goal for Version 2.0, especially direct reading and writing to memory.

I'm not sure how this helps, but there it is if you want it.

It helps for sure! A big thank you, it's great to see that such a cool projects like the Teensy is actively maintained by its author :)
 
Last edited:
I would try to use the SPI port for the lengthy stretches where TMS doesn't need to change. Then I'd fall back to bit bashing when TMS is changing to navigate the JTAG state machine, and also for short sequences of less than 8 bits, or the last few bits in a long sequence. You can leave the SPI port configured and switch the pins between SPI and GPIO, but of course you'll need to access the hardware registers to do this.

Just to be sure I understand it correctly, is your idea to use the SPI port to transfer the TDI data via the MOSI line? Could you explain your idea a little bit more? Especially the benefits of using the SPI port, it would be really interesting for me. Also whether I could take advantage of the SPI SCLK as TCK (although SPI sends data on the falling edge whereas JTAG sends data on the rising edge).

Thanks!
 
Status
Not open for further replies.
Back
Top