Software Debugger Stack


Well-known member
Paul, all,

while I perfectly agree with anything that is said on the post about JTAG/SWD debugging, I would like to propose another solution that doesn't require any hardware modification.

Some time ago, Thomas Roell implemented the GDB debug protocol on the ARM in software, using a simple UART connection (external FTDI board) and mentioned it in the diyRovers group (!topic/diyrovers/GtfSQDLP2_E). Throughout the last few days, we tried to port it to the Teensy and the first results were inspiring.

However, it turned out that the Mini54 bootloader processor uses the breakpoint feature of the K20 in order to detect bootloader requests through JTAG. It programs a bit called C_DEBUGEN in the SCB_DHCSR register, which cannot be undone by software. From that point on, software breakpoints are routed to the external JTAG pins and trigger the bootloader process instead of generating debugger exceptions in software.

Unfortunately, as said before, the only way to change C_DEBUGEN is through external debug hardware - there is no way to clear it by software. This means that this bit effectively prohibits the use of a software debug stack, as it never will receive a debug exception.

My suggestion is to implement a mechanism that permits to clear this bit. As far as I can see, breakpoint arguments ("BKPT #<nn>") seem to be a feasible way to do this, maintaining compatibility with existing versions. On a breakpoint event, the Mini54 could recover the argument <nn> and, if it matches a predefined number, clear C_DEBUGEN in DHCSR through JTAG. From that point on, a software based debugger stack could take over.

This would be a pure software solution, requiring only an update of the bootloader code in the Mini54.
I've put C_DEBUGEN clearing on my feature list to implement on the next Mini54 update.

Originally I had hoped to include a Mini54 update in 1.20, to fix the pin 33 issue. But the SPI sharing issue has turned out to be a LOT of work, and numerous other smaller improvements and optimizations in libraries are pending. The Mini54 update is almost certainly going to get pushed back to 1.21, which has the upside of allowing time to implement this C_DEBUGEN clearing feature.

Jörg Bliesener and Thomas Roell have sent me several private emails in recent days, about this issue. Thomas pointed out that MBED defines breakpoint 0xAB with R0 set to 0x105 to request disable of C_DEBUGEN. I'll very likely use these same numbers. You can expect the Mini54 to typically take 0.25 to 0.5 seconds to respond, because it runs in a very slow low-power mode when not reprogramming the MK20 chip.

I do not have an exact time-frame for this feature. I can tell you no work will be done on it until after Teensyduino 1.20 is fully released. If you look at the announcements forum, you can see new Teensyduino releases happen about every 3 to 4 months.

When I do work on this, I'll be seeking a few beta testers. To "sign up" for beta testing, follow this forum thread. When I'm ready, I'll post a call for beta testers to this forum thread. Beta testers may need to work with very low-level stuff. This feature doesn't magically turn Teensy into a debugger. It merely will allow a monitor-style debug agent to disable C_DEBUGEN, so it can then gain access to debug events using the debug exception vector. I will give preference to beta testers who have a deep knowledge of low-level ARM debug features, and people who have previously tested and found issues in Teensyduino release candidates, experimental library releases or other test software.

Please, DO NOT SEND PRIVATE EMAIL to request beta testing or inquire about the status of this feature. Do not post "me too" or other trivial messages here. Simply follow this thread, and if necessary, whitelist in your email filtering, so you'll be sure to get notifications as this thread is updated.
Seeing the new posts on the other thread, I can't stop myself from asking about this feature. I'm closely following the commits on Github, but I didn't see any clue that C_DEBUGEN is being released. Any news?
I may be entirely wrong here (not well versed with ARM low-level stuff, my experience is mostly on the computer side), so bear with me if I say something stupid.

But couldn't we use the FPB to trigger a software interrupt instead of an hardware breakpoint, thus bypassing the need for the debug monitor ISR and avoiding the C_DEBUGEN problem?

Thomas Roell suggested the FPB track some time ago and found it feasible. However, in order to single step, you would have to analyze the object code and discover the length of the next instruction or the target of a jump, which may not be trivial. Here's what he wrote back in August, when we talked about the issue:

1) The FBP allows remapping of instructions. So instead of a break point instruction, one could use "permanently undefined" instruction coding, which would trigger a UsageFault instead of a debug exception. This way break points could work. Single stepping can be implemented using breakpoints (just figure out the length next instruction that would be executed and put a breakpoint after it, unless it changes the PC). The stub just had to implement software breakpoints as well, because otherwise gdb would use BKPT instructions to do so. For that I needed the value of FP_REMAP from the Teensy 3.x.

So, while it may be feasible, it would add a significant overhead to the code. Right now, the debug stub (using hardware breakpoints) seems to use about 512 bytes of RAM and 8K of code, which I consider reasonable. Don't know what FPB handling would consume additionally. Maybe, Thomas Roell might chime in and give us a clue...
This assumes the device has to be the one doing all the work. But instead we can implement an intermediate GDB server on the computer side, and relay lower-level commands to the Teensy. Instead of "place breakpoint here", the Teensy would receive "add/remove a flash patch here". Maybe we could even reduce the size of the stub by making it as stupid as possible:
- add/remove patch X
- read/write memory/registers

And we'd get to reuse existing code from binutils or LLVM to do most of the work regarding the binary code. It would be less practical than directly connecting GDB to /dev/ttyACM* but on the other hand, I think most Teensy devices run in HID mode now. So we need a gateway anyway.

I know it would be slow, though. And it's a lot of work to make this work. I'd love to spend much time on this, but my exams are approaching fast and I'm already behind schedule. I should have time to spend on that this summer (June) if nothing has progressed by that time.
And gdb low level support can certainly do this. Many, many years ago, I wrote a gdb stub to interface with an on chip debugger.
As I said, the stub implementation has been done already, only waiting for the Mini54 releasing C_DEBUGEN...
As I was responding to the FLOPS thread, I was thinking of the tools I use in the server world to debug and tune applications. In addition to a debugger, one of the tools is a profiler that every clock tick stashes away the pc and some machine events that can tell about various stalls, etc. I realize in an embedded context, things are often roll your own, but it may make sense to provide support in the library, so you can see how often the PC is within a region (perhaps with a number of buckets in the region), and how often it is outside. Perhaps also tying in with the -p option to gather call counts. Now most Teensy programs probably spend most of their life busy waiting for some external event, and are not bound by the CPU. But there are pieces of code that are performance driven (such as the sound libraries).
The audio library already does this type of profiling. You can query each object, or the entire library, for last-run and worst-ever CPU usage.
Some good news from Paul:

Teensy LC and 3.2 do clear C_DEBUGEN when the KL02 detects a low power mode and puts itself to sleep. If you want C_DEBUGEN cleared, you must enter a sleep mode for about 0.1 second or more.
I ordered my T3.2 on Monday and I'm anxiously waiting for it to arrive. Robin beat every record in shipping it: The time between the order and shipping confirmations was 40 minutes!!!

The thing is that I'm in Brazil and that customs also beats every record in handling it, however in the opposite sense. Believe it or not, it once took them SEVEN MONTHS to clear a Teensy. When I told Paul, he made a very relevant comment: Where do they store all that stuff?

Any news, I'll let you know...
What would be required to be able to single step through source code inside an Eclipse IDE using the latest Tinsy (3.6?) verion?
Finally found this thread after spending some hours :)
I was playing with the FP feature. It's nice because it should be possible to triggers a debug-monitor exception.
It seems, that instead (escalates?) it triggers a hard fault. I think, it must be the C_DEBUGEN bit.
So it gets still set by the bootloader, and debugging via software is not possible (we can not reset it) . Paul, if you plan a bootloader-update in the future... can you set it to 0 for all ARM-Teensys, please?