Options for 'bare-metal' development

Status
Not open for further replies.
I made a first experiment to have DMA generate a configurable waveform.
https://github.com/PaulStoffregen/StepperPulse/blob/master/k66_dma_stepper.ino...

I think there is an opportunity for DMA to solve the following problem :

Stepper-motor-drivers have a STEP signal which is pulsed for each step, and a DIR (Direction) signal which controls the clock/counterclock direction of stepping.
Now all stepper-drivers require that a DIR signal is stable some 'setup' time before the stepper pulse, and remains stable some 'hold' time after the step pulse.
Further, they also have minimum timings for the step pulse signal.

See for example this page with an inventory of common used stepper/servo motor drivers and the timings they expect.

So per step , you actually need 3 events :
* setup DIRection - wait setup time
* set STEP pulse (high or active low, depending on driver) - wait max(step pulse time, DIR hold time)
* end STEP pulse

Example of these timings for the A3967-Driver :

EasyDriver Timing.PNG

We could pre-calculate the bit patterns needed on the output pins for these 3 events, in the step timer interrupt.
Then DMA could take care of outputting them with the right timing, with no further interrupts needed.

The alternative is an extra timer-interrupt, firing 3 times per step.

EDIT : Don't spend too much time on this - I've found a simple and elegant solution using the fact that we're 'oversampling' the timer-interrupts anyway. As soon as we've detected we need to make step, DIR is set (taking care of the setup timing constraint), then the next interrupt STEP is set, then the next one STEP will be cleared automatically because of the way we calculate it.
 

Attachments

  • EasyDriver Timing.PNG
    EasyDriver Timing.PNG
    44.3 KB · Views: 111
Last edited:
I've also created a spreadsheet that shows the alternate pin functions of a T3.5 (see attachment)

Thanks,
I merged our sheets together, this way you can for each GPIO (which is brought to a K64 pin) choose the function from the available functions on that pin.
It's done thru a simple data validation lookup in excel - your original sheet has been added to the workbook.

Screen Shot 02-22-17 at 11.44 AM.PNG

Updated sheet is available in the Github repo
 
The first post in this thread was:
"Hi, I am doing a development for a CNC/Motion Controller. As a first target HW I want to run it on a Teensy3.5"

I've been worked on this off an on for two years, having Teacup and and Marlin both running on Teensy 3.1, and was interested in this new thread.
I had boards made up to manage five axis (X/Y/Z1/Z2/E) plus heaters for bed and extruder.
I'll admit that MY implementation used servo-motors for each axis, but that's just extra overhead.

I'm not sure if I'm expected to apologise for butting in, or not...
 
I'm not sure if I'm expected to apologise for butting in, or not...
No apologies needed, I think your post was legitimate.
I will take a look at your work one of the next days - we can always learn from it.
My project is targeting CNC, and as such there is no heated bed.. but once we are up and running I'm willing to extend the scope to the 3D-printing and Laser-cutting community.
 
...Of course, if you want to start with a blank slate and do everything yourself from scratch, you certainly can. That first link will show you the way. But odds are you'll spend a year or more getting up to the already very good performance offered by Teensy's core library. It's your choice, but I hope you'll leverage what already exists and focus your efforts where they will matter most. Either way, get Joseph's Yiu's book!

Paul,

I am looking into the code that TeensyDuino is generating by compiling an 'empty' project and looking at the assembly output.
I found that the different 'Project Options' you can choose from, are defined in C:\Program Files (x86)\Arduino\hardware\teensy\avr\boards.txt and that the selected option will do some #define which will then include or exclude code from all source files in folder C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy3

So I guess I can add an option item 'Minimal' which will boot the K64 but exclude any USB, Serial, and Timer stuff..

teensy35.menu.usb.justboot=Minimal

When this option is selected, I (only) want to have the NVIC, CPU Clocks, FPU & Watchdog initialized.

Furthermore, I can see that most of the startup-code is coming from mk20dx128.c in C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy3 but I don't understand why
For my understanding, could you explain why the build-proces finds the startup-code in mk20x128.c

Finally, just confirming : when experimenting with the above, is there a risk of erasing the HalfKay bootloader ?
I understand that withouh USB support, the TeensyLoader requires a manual press on the reset button, but that's ok.

Thanks!
 
On the Teensy LC, 3.0, 3.1, 3.2, 3.5, and 3.6, the Halfkey loader is not used. That was used in the earlier Teensy 2.0 and 2.0++ boards. Instead a separate chip (MKL02/MKL04 in the LC, 3.2, 3.5, and 3.6; Mini54 in the 3.0 and 3.1 boards) does the boot loading. It has its own memory chip, and so normal operations won't erase it. If you want to make your own ARM board , you can buy pre-programmed MKL02/MKL04/MINI54 boards from PJRC.COM. Note at present, the MKL02/MKL04 available for public sale does not support the chips used in the Teensy 3.5 or 3.6.
 
Thanks Michael!
I intend to use off-the-shelf Teensy3.5 / 3.6
I just wanted to double-check my bare-metal experiments wouldn't lock me out by breaking the built-in programming mechanism.
 
Thanks Michael!
I intend to use off-the-shelf Teensy3.5 / 3.6
I just wanted to double-check my bare-metal experiments wouldn't lock me out by breaking the built-in programming mechanism.

Sure. I wrote an eForth for the 3.6 in assembler and it works just fine.
 
For my understanding, could you explain why the build-proces finds the startup-code in mk20x128.c

Short answer: because that's the filename I used when I wrote all the startup code.

In 2012, Teensy 3.0 was new and the only 32 bit Teensy, so the filename was based on its part number. I suppose a more forward-looking naming convention could have been used. But even now, I'm really not very concerned with such cosmetic details as (non-header) filenames deep within the core library.
 
Short answer: because that's the filename I used when I wrote all the startup code...

Ok, that's fine - I'm trying to understand what's in your startup code and why..

Next thing I don't understand is why outcommenting 'yield()' from main(), prevents the Teensy from booting properly..
I can see that yield() does some housekeeping on external interfaces etc, but I don't understand why it is needed for booting.

And what would happen if I wrote code in loop(), which would never exit, and so yield() would never run ?
 
Next thing I don't understand is why outcommenting 'yield()' from main(), prevents the Teensy from booting properly..
I can see that yield() does some housekeeping on external interfaces etc, but I don't understand why it is needed for booting.
It's not needed. Things work just fine, if you have an infinite loop in setup() and loop()/yield() are never called.

You have a bug somewhere.
 
I am using Visual Studio with Visual Micro plugin as IDE.
Just to be sure I repeated my experiments with the Arduino IDE, and even when yield() is outcommented, it runs fine, so it looks like a build issue..
 
I am using Visual Studio with Visual Micro plugin as IDE.
Just to be sure I repeated my experiments with the Arduino IDE, and even when yield() is outcommented, it runs fine, so it looks like a build issue..

sure it is a build issue, you have a bug somewhere else. I suggest to let yield in and comment the different functions called by yield to see which existing but unintended functionality screws up your execution.
 
In Arduino's File > Preferences, you can turn on verbose info wile compiling. Might be helpful to compare Arduino's compiler commands to whatever you're doing.
 
sure it is a build issue, you have a bug somewhere else. I suggest to let yield in and comment the different functions called by yield to see which existing but unintended functionality screws up your execution.

Ok,

First I reinstalled the Arduino and TeensyDuino to rule out any corrupt installation setup..
I also verified all experiments below on both the vanilla Arduino IDE, as well as on Visual Studio / Visual Micro - they behave the same.

I wrote a basic sketch that does the following :
1. it sets up PIT0 and UART0. Every PIT0 interrupt, it sends a byte through UART0.
2. it polls UART0, when receiving a '1' it switches the LED on, when receiving something else, it switches the LED off
With a vanilla setup, this program runs fine (Teensy3.2 and 3.5). See attached file for the source.

Here is the body of yield() :
Screen Shot 03-05-17 at 07.29 PM.PNG

Then I started commenting out parts of yield(), and here is what I found :
almost all of it can be commented out, except :
* if you build with 'USB Type : No USB', then line 44 - if (Serial1.available()) serialEvent1(); - can't be commented out or something blocks all execution. Program size is 2.448 bytes
* if you build with 'USB Type : Serial', then line 43 - if (Serial.available()) serialEvent(); - can't be commented out or something blocks all execution. Program size is 2.448 bytes

What works is :
* you build with 'USB Type : No USB', include line 44 - if (Serial1.available()) serialEvent1(); - Program size is 7.320 bytes
or
* you build with 'USB Type : Serial', include line 43 - if (Serial.available()) serialEvent(); - Program size is 6.392 bytes

Now the curious thing is that yield() is never reached, because I have an infinite loop inside loop().
But clearly the code inside yield makes the linker include some stuff which is needed to run.. It's about 4K of code.
Again, I don't care for the 4K extra code, but I'd like to fully understand what's going on.
 

Attachments

  • myBlink.ino
    3.1 KB · Views: 110
Some final observations for today :)

I was comparing the assembler output from a working and non-working version.
Seems that in the one which is not working, there is simply no NVIC table, boot-code etc...

For some reason the build process doesn't generate those once the executable file is below a certain size..
I am no expert in the whole build process, so I am stuck here (for the time being..)

Attached the output of the 2 different builds
 

Attachments

  • myBlink.asm.zip
    9.9 KB · Views: 115
Wonder if in the build the linker dropping 'unused' code? What happens with line 44 or 43 included in sketch setup() - or elsewhere as placeholders?
 
Paul,

I ran your demo from your post above with the scope image https://github.com/PaulStoffregen/St...ma_stepper.ino with my oscilloscope hooked up to pin 2 (output).

It looks very promising, I see the output waveforms on my scope follow the "recipe" I typed into the table.
As you said, now to find a way to chain recipes together, and add the dir pulse pin.
I eagerly await the next demo.

The scope shows 3 mhz with the table set as below, is this the narrowest pulse width pretty much?

Code:
uint32_t output[] = {
  (10u << 16) | 0xDD, // set on match
  (20u << 16) | 0xD9, // clear on match
  (30u << 16) | 0xDD, // set on match
  (40u << 16) | 0xD9, // clear on match
};
 
Last edited:
It looks very promising, I see the output waveforms on my scope follow the "recipe" I typed into the table.
As you said, now to find a way to chain recipes together, and add the dir pulse pin.
Chaining doesn't make much sense, given the usual timing restrictions on step / dir (dir needs to be set some time ahead of step). You can use an additional timer channel instead.

Instead of DMA chaining, I would set up output[] as ring buffer and simply add commands to that. The DMA controller has support for that.

The scope shows 3 mhz with the table set as below, is this the narrowest pulse width pretty much?
Based on DMA controller limitations, I would expect a maximum of around 5MHz with a Teensy 3.6. If you can't reach that, there is probably some additional latency imposed by the FTM timer (trigger value changed with each DMA transfer).
 
Both ideas make sense to me on the intuitive level, use a separate DMA channel for making the dir signal, and changing output[] into a ring buffer.

Towards the matter of how to implement an example, I assume it would have a source loop feeding the output[] ring buffer. (in chunks?) Then, DMA code would be rearranged to read output[] in "ring buffer mode" or equivalent. And I guess there are ways to sync the source and destination timing to prevent data loss, with the ring buffer absorbing brief jitters between in and out.

Much beyond that, I feel like a caveman looking at a spaceship, I am not at all familiar with DMA code and what the different bits do, but- I do want to climb in and take it for a spin!

Heh, I tried last night copying the code to alter output[] in loop() and it was an epic fail because I have no clue what I am doing. I saw some things shifting along, but it was a blurry mess on the o-scope, hah. I did not know about the ring buffer.
So, I need to know a lot more on how to do the ring buffer properly and how to mate it with the DMA commands. Many more questions than answers, ha.

Does anyone want to take a stab at extending the demo to use a ring buffer so we can set output[] from somewhere else like loop()? I would be willing to put time into this, but I am in way over my head so far as the details of DMA register commands go. I know there is the SPI DMA library, and the DMA libraries for driving small screens that play video, and apparently audio too, and the ADC library has an attempt at a ring buffer, though the author claimed some time ago it doesn't work very well yet.

But I don't see myself learning much there; I look at those and don't get warm fuzzy feeling of starting at a basic level. I feel the need for more bare bones demos, like a nice tutorial which builds up coding DMA pipes layer by layer, from bare minimum up, explaining each along the way.

I found this example from Paul interesting because it was small and approachable, and I was able to play around with it and effect changes to the output.

I assume adding a second channel for dir would be relatively easy, and not the crux of what is being attempted with using DMA; to output fast cnc pulses without as much CPU overhead.

I have a displacement sensor design prototype I would like run in conjunction with the cnc control software being discussed here. It runs on Teensy 3.6, but can be fairly CPU intensive, so having the DMA stunt shave off motion controller cpu cycles would be a beautiful thing, and allow my sensor to run and communicate on the same cpu as motion controller code. The sensor can scan a workpiece for depth correction.

I am looking for any other DMA examples I can play with, but I feel a bit intimidated by it, because it smells like a very deep learning dive. Maybe not as bad as I imagine. Can DMA marshal data from the ADC to SPI, or perhaps to USB Serial or a Hardware Serial somehow? Always wondered about that.

I will continue to watch for updates... This DMA for cnc pulses has great potential.
 
Last edited:
Does anyone want to take a stab at extending the demo to use a ring buffer so we can set output[] from somewhere else like loop()?
That should already work. Note the "dma.TCD->SLAST = -sizeof(output);". This resets the DMA source pointer to the beginning of the buffer after the whole buffer was transferred.

You can access the current DMA source address via dma.sourceAddress() so that you know what the DMA controller is currently doing.
 
Wonder if in the build the linker dropping 'unused' code?

I've put this on my low priority issue list. It's entirely possible the linker is discarding the entire vector table when none if it seems to be actually used.

Realistically, this is going to end up together with the low priority issue of the hardware serial code always getting linked even when it's not used. Would be nice to solve someday... but such a disruptive change to code so widely used it going to take a lot of careful testing. Unlikely to happen anytime soon.

On the vector table, this might be as simple as adding an attribute. Maybe??
 
I've put this on my low priority issue list. It's entirely possible the linker is discarding the entire vector table when none if it seems to be actually used.

Hard to believe since a good part of mk20dx128.c is the vector table.

The problem appears to be in the link stage. The startup code (mk20dx128.c.o) is bundled into core.a along with the rest of the libraries. During the link process if there are no undefined symbols to the left of core.a on the command line satisfied by something within mk20dx128.c.o, then it will not be linked. There are several ways to fix this with the best being to keep this crucial file out of core.a and explicitly reference it on the command line. First.
 
This would seem to be a problem with VisualMicro emulation of IDE build. Under the IDE 1.8.1 I get the following results with no problem compiled on a T_3.6 at 180 MHz.

With NO USB the PJRC yield() brings in 10KB of code - but taking yield() out compiles and runs fine from the IDE and much smaller, qBlink and USB with void yield() is only 4KB::

// compiled FASTER with LTO
// with PJRC yield():: Sketch uses 11472 bytes
// void yield():: Sketch uses 2440 bytes
// void yield() with qBlink & delayMicroseconds:: Sketch uses 3304 bytes
// compiled Smallest with LTO
// with PJRC yield() qBlink & delayMicroseconds:: Sketch uses 12928 bytes
// void yield():: Sketch uses 1132 bytes
// void yield() qBlink & delayMicroseconds:: Sketch uses 1804 bytes
// compiled Smallest with LTO :: USB Serial
// with PJRC yield() qBlink & delayMicroseconds:: Sketch uses 16040 bytes
// void yield() qBlink & delayMicroseconds:: Sketch uses 4412 bytes

Code:
void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  }
#define qBlink() (digitalWrite(LED_BUILTIN, !digitalRead(LED_BUILTIN) ))
void loop() {
  qBlink();
  delayMicroseconds(100000);
  }
void yield() {}

BTW - I think this is an issue FrankB noted - in creating this test with LTO I did run into this from the IDE (with or without USB Serial):
I:\arduino-1.8.1\hardware\teensy\avr\cores\teensy3/mk66fx1m0.ld:45 cannot move location counter backwards (from 00000408 to 00000400)

collect2.exe: error: ld returned 1 exit status

Error compiling for board Teensy 3.6.
Code:
void setup() {
  pinMode(LED_BUILTIN, OUTPUT); // removing this extraneous line (or haing qBlink in loop) removes the above ERROR
}
void loop() {
}
void yield() {}
 
This would seem to be a problem with VisualMicro emulation of IDE build. Under the IDE 1.8.1 I get the following results with no problem compiled on a T_3.6 at 180 MHz.

In my experiments building with VisualMicro behaves the same as doing it from within Arduino IDE (I reinstalled VM, Arduino and TeensyDuino with recent versions to rule out anything there)
I have no T3.6 but I'll try to reproduce your results with a T3.5 tonight. (in my experiments I can reproduce the problem with both the MK64FX512 and the MK20DX256)

I see you use digitalWrite and other functions. This also includes a certain amount of code which may result in an executable which is large enough not to have this problem.
You could maybe repeat your test but only using direct register acces

I had no in-depth knowledge about how linking actually works, but as my intuition tells me there is something not 100% there, I am reading a few tutorials on it now, and trying to reverse engineer what exactly is happening. Here is a good tutorial I found : http://www.emprog.com/support/documentation/thunderbench-Linker-Script-guide.pdf

Thanks to all of you for looking into this. I think we are taking good steps forward towards using the Teensy 'bare-metal' from within Arduino/TeensyDuino.
 
Status
Not open for further replies.
Back
Top