Low-level library and timer assistance -- TeensyStep library ???

epicycloid

Well-known member
We're currently dead in the water without T3.x's and no working T4.x library. Can anyone help us over the hurdle?


TL;DR

Is there anyone here who might be able to investigate / re-visit the low-level timer changes from T3.x to T4.x to re-implement the original TeensyStep library, *not* the incomplete TeensyStep4 library, on the T4.x?

Can anyone step in and pick up where @luni left off, and get the original TeensyStep library working again on T4.x, even if not at the performance level anyone would expect on the T4.x platform?


The rest of the story...

TeensyStep was a fantastic stepper library, that was unfortunately broken by the low-level timer architectural changes going from T3.x to T4.x.

@luni / luni64 on GitHub / Lutz Niggl is a brilliant guy, but other demands have pulled him away from working on or updating the TeensyStep library.

That said, he was part of identifying the issues in the early T4 beta, and he did take a stab at a completely different programming model, and introduced his partially re-architected TeensyStep4. Unfortunately it isn't finished, has numerous open issues, and is provided with the warning, "Please note: This library is very experimental and no support can be given at the moment." He hasn't edited or updated it in a while now.

TeensyStep on GitHub -- https://github.com/luni64/TeensyStep
TeensyStep4 on GitHub -- https://github.com/luni64/TeensyStep4

The trifecta of the pandemic, chip shortage and EoL of the T3.x all collided to leave our project dead in the water with no path forward. We have a complicated, coordinated motion project, in use by quite a few people, but with no way to move forward now that T3.x's have been discontinued, and a "replacement" T4.x library that isn't fully functional.

Our project, currently with about 40,000 lines of code, is working perfectly with TeensyStep, controlling multiple coordinated stepper motors, with a Nextion touchscreen for the UI.

We have added a controller and stepper motors to a specialized woodturning lathe, known as a rose engine. The project hatched about 15 years ago, and has evolved into a complex and robust system, capable of cutting a wide array of patterns. Examples of pre-controller work, and the lathe that serves as the underlying platform can be read about here: https://www.rogueturner.com/ewExternalFiles/Its small world.pdf and https://www.rogueturner.com/ewExternalFiles/RoseEngine.pdf. Additional articles here: https://www.rogueturner.com/ornamental-obsessions.html

The current project is posted on GitHub here: https://github.com/elfren/RoseEngine_SpindleAndAxis, and documented here: https://mdfre2.colvintools.com

@luni replied to "Is TeensyStep4 an update of TeensyStep or a whole different library?" in the GitHub issues thread here:
https://github.com/luni64/TeensyStep4/issues/4 and lays out a number of issues and why he isn't moving forward with the old model on the newer chips.

An attempted port of our project to the TeensyStep4 library has stalled when we hit various issues. Two of our open issues have yet to be acknowledged here: https://github.com/luni64/TeensyStep4/issues

The architectural changes that were made with the low-level timers in the move from the T3.x's MK64FX512VMD12 / MK66FX1M0VMD18 Cortex-M4F to the IMXRT1062DVJ6 Cortex-M7 of the T4.1, while above my pay grade, is what @luni wrote required a redesign of the entire library.

@luni describes the issue with the peripherals, including PIT timers, running on a separate bus from, and not synchronizing with the ARM core here: https://github.com/luni64/TeensyStep/issues/56

@luni and @defragster discussed here: https://forum.pjrc.com/index.php?threads/teensy-4-intervaltimer-max-speed.57959/#post-218577 and reference an earlier T4 beta thread with input from @Frank B, @KurtE, @manitou, @PaulStoffregen and others here: https://forum.pjrc.com/index.php?threads/teensy-4-0-first-beta-test.54711/page-48. Most of that is w-a-y above my pay grade, I'm just a lurker when I start reading their posts and discussions. But it was clear there was an identified timer and bus issue early on.

Is there anyone here willing pick up the torch and help update the original library to allow us move forward on the T4.x's?

TIA -- epicycloid & @Elf
 
I'm currently working on a project which will include some steppers. Thus, chances are good that I need to get TeensyStep working with a T4 anyway. I can't promise anything but I'll try. As you already mentioned: don't expect better performance compared to a T3.5/6 but for normal applications it should be fast enough.

@epicycloid: Do you have a deadline? And, since I was in Greece for a couple of days last summer: You might like this ;-)
1743528235579.png
 
I see that Luni just responded, but I'll post this anyway...

I ran the test program from 2019 that showed the problem with 4 x IntervalTimer (PIT), and got the same result as in the thread.

f:100.0 kHz Load: 41.7 (w/o interrupts: 6500008 with interrupts 11151196)
f: 50.0 kHz Load: 17.1 (w/o interrupts: 6500008 with interrupts 7838228)
f: 25.0 kHz Load: 8.5 (w/o interrupts: 6500008 with interrupts 7105345)
f: 12.5 kHz Load: 4.3 (w/o interrupts: 6500008 with interrupts 6792574)
f: 6.2 kHz Load: 2.1 (w/o interrupts: 6500008 with interrupts 6641502)
f: 3.1 kHz Load: 1.1 (w/o interrupts: 6500008 with interrupts 6571087)
f: 1.6 kHz Load: 0.5 (w/o interrupts: 6500008 with interrupts 6535767)

Then I modified the program to use TeensyTimerTool and 4 x PeriodicTimer (TMR1), and got 3-4 times faster. Is use of QuadTimer a solution?

f:100.0 kHz Load: 12.0 (w/o interrupts: 6500010 with interrupts 7385069)
f: 50.0 kHz Load: 6.0 (w/o interrupts: 6500010 with interrupts 6916715)
f: 25.0 kHz Load: 3.1 (w/o interrupts: 6500010 with interrupts 6706815)
f: 12.5 kHz Load: 1.5 (w/o interrupts: 6500010 with interrupts 6602343)
f: 6.2 kHz Load: 0.8 (w/o interrupts: 6500010 with interrupts 6550821)
f: 3.1 kHz Load: 0.4 (w/o interrupts: 6500010 with interrupts 6525032)
f: 1.6 kHz Load: 0.2 (w/o interrupts: 6500010 with interrupts 6512418)
 
The Quads are only 16bit which will limit the minmal possible steprate. Adjusting the prescaler on the fly would help but I'm afraid that this will open another can of worms. I'll stick with the PITs for the main timer and use the TMRs instead of the FTMs for the pulse generation. Lets see how this works out performance wise.
 
I'm currently working on a project which will include some steppers. Thus, chances are good that I need to get TeensyStep working with a T4 anyway. I can't promise anything but I'll try. As you already mentioned: don't expect better performance compared to a T3.5/6 but for normal applications it should be fast enough.

@epicycloid: Do you have a deadline? And, since I was in Greece for a couple of days last summer: You might like this ;-)
View attachment 37237
@luni -- It is so good to know you are alive and have come up for air! First trip back to Greece since 1983? ;) Nice"Greek Key" pattern, as we call it in ornamental tuning.

Deadline? Not per se, but we are truly dead in the water in terms of anyone building a new controller, or as T3.x's die, there is no way to move forward.

One user in our group, whose T3 died, bravely tried the T4.1 "beta" port of our system, and eventually gave up. Too many issues with TeensyStep4, and the "rose" function not able to work at all. I mailed him one of the last T3.5's I had in my personal stash, and he was able to get back up and running, and produced this box... the sinusoidal infill is the "rose" pattern, while "rocking" mechanically with a 5-lobe rosette. Thought you might like to see a current example of work...

Tom-RoseRock2.jpg
Tom-RoseRock1.jpg


As I think you know, we are less concerned about overall speed, but maintaining the same programming model would be incredibly valuable for us.

If you can get the "original" TeensyStep up and working on the T4 we would all be eternally grateful!! And I can't tell you how nice it is to "hear" your voice again.

--epicycloid
 
The Quads are only 16bit which will limit the minmal possible steprate. Adjusting the prescaler on the fly would help but I'm afraid that this will open another can of worms. I'll stick with the PITs for the main timer and use the TMRs instead of the FTMs for the pulse generation. Lets see how this works out performance wise.

Hi @luni, I would be interested in following along as you do this, just to learn more about the timers, or if there is something I can do to help, let me know.

QuadTimers are cascadable, though, so with 2 x QTMR you could have 4 x 32-bit timers, without the limitation of only one interrupt, right? Also, I don't know how much you've used FlexPWM, but it's pretty easy to operate with interrupts and duty cycle update on every cycle. I've done that up to 10 kHz for a motor control, and I'm pretty sure 100 kHz would work, too, but I don't know enough about how you use PIT and FTM in signal generation to know whether that helps.
 
@luni — I’m just checking back in to see if you have been able to spend any time on TeensyStep?

As I was sleeplessly laying bed the other night, I wondered if I could write a detailed enough request for Claude AI or Grok AI, with the chip versions between T3.6 and T4.1, and the TeensyStep github repository, to possibly generate a working revision… then I drifted off to sleep.

Any updates appreciated as we are really dead in the water now… no T3.5’s or T3.6’s to fall back on. Our controller board is ready for T4.1’s.

Cheers,

@epicycloid & @Elf
 
Hi @epicycloid
as I like to do some woodturning and like to make something "nice" I was happy to see your post and information. Very nice way to carve wood! I also do have some experiences with stepper motors too, for example, my lathe is controlled with such like a CNC machine.
The reason I write is, that I wanted to ask you about the maximum speed you really need in microsteps per second for your application? From what I have seen in the videos, this speed seems to be much lower than the 300.000 steps/sec that this sophisticated library can do. If this is so, then a much simpler approach might be possible, that does not need the in depth knowledge of the timers. As far as I understand, you use up to 5 steppers to have them move with a fixed relation of speeds (or rather travel)?
I would try the teensy threads library and do things in software. Put the movement of the 5 axis in one thread and use for example Bresenham algorithm. I did some experiments with the threads here: https://forum.pjrc.com/index.php?th...iciency-thread-priority-and-cycle-time.69475/ You can go down to about 3µsecs per time slice. I assume, that some jitter in absolute speed is of no importance, as the movements are slow, as long as the relation of speeds is held.
Instead of teensy threads you could also use interval timer. Use the fastest axis to rule Bresenham.
Just some ideas, Christof
 
Last edited:
Christof / @cebersp -- Thank you for your comments and suggestions. Yes, our maximum speed is nowhere near what TeensyStep is capable of. But the TeensyStep library handles the coordinated motion and timing in a very clean way for our needs.

That is a nice analysis of the TeensyThreads library you did. That and Bresenham's would certainly work at our required speeds (up to 5 motors on a T4.1), but the math, managing multiple motors, and acceleration / deceleration can all get very complicated when trying to produce some of the patterns we are after...

Here is an example in @luni's 'PathControl' branch, showing a perfect example of the complexity of the patterns we were trying to solve at the time. Those worked easily in the original version of the library.

That said, our biggest issue is 40,000+ lines of code based on using the original version of the TeensyStep library, multiple users with systems based on that code, and thorough documentation of the UX and building the lathes and controllers.

Way back when, we started with the AccelStepper library, which is also very good, but a very different programming model when controlling / coordinating multiple steppers. The TeensyStep library solves a lot of complicated challenges, in a very straightforward way. Unfortunately the internal changes in the microprocessors, moving from T3.x to T4.x caused the old architecture to break, and @luni hasn't had time to get it working yet. 🤞🤞🤞

Cheers,

--Jon
 
Hi Jon,
thank You for your answer and very much for nice pattern with underlaying math. I will have a look into that, because I love such things. :)

Perhaps you will like my easter eggs:
1761642508344.png


These patterns have been done in three steps on the fly:
1. Vary some parameters with a random function within given ranges. So each pattern will be different and also each egg.
2. Calculate positions.
3. Draw with only linear movements with the steppers.
This has been done with a different microcontroller and using the Forth language.
More is here: https://forums.parallax.com/discuss...eater-to-egg-painter-build-a-p2-eggbot#latest

My experience is, that you can even do such things using a Forth language system which is about 10 times slower than compiled code. And the Parallax P2 is much slower than T4. The eggs program did not use additional cores. My experience is also, that the benefit of acceleration is also only given, if the load generated by inertia is rather big in comparison to friction and other loads. So for your type of application and machine here, I do think, that you can directly start and stop with end-speed, because you need to keep enough torque in reserve at any time. It's different to a pick-and-place machine or a 3d printer.

I think, that the library does two different things in your project.
1. Generate linear movements with up to 5 axis. - If speed is less critical, this part can be done in software, I believe.
2. Do the movements as a background job, while software can still do other things in parallel like handling the user interface. - This is where "interval timer" or "teensy threads" com in. They should allow you to maintain the overall structure of the existing software.

I define the axis, which has the longest travel to be lead. The other axis follow the leading one. So it's acceleration and speed will be reflected by them. If X is lead and if started at 0,0 after each step of X: Y= travel.y * X / travel.x . If Y is new, do the Y-step. That is not difficult.

Of course I understand, that it will be simplest, if you get an update. I only thought, that you must be rather desparate, if you even considered to ask AI....
Cheers Christof
 
Hi Christof — Sorry for the slow reply, but your eggs are great! I enjoyed reading your post on the Parallax forum too. Your Lissajous / Lemniscates and variants are nice. Some would be better opened up, but they’re all pretty pleasing.

Thank you for all the good points on timers and performance. You are right about accelerations, since we run relatively slow compared to other applications. We’re mostly taking advantage of stepper motors for their torque, as opposed to speed. The big advantage of the TeensyStep library is the coordination. And yes, of course there are other options to get the same result. Likewise, we are not even close to needing T4 performance, as most everything has been working from the T3.2 days.

We’re still hoping for a reply / fix from @luni, which is the simplest. If we have to re-write everything, we may be going back to a “clean white board” along the lines of your suggestions.

Cheers,

—Jon
 
We’re still hoping for a reply / fix from @luni, which is the simplest. If we have to re-write everything, we may be going back to a “clean white board” along the lines of your suggestions.

Jon, I'm interested in trying to help get TeensyStep working with T4. Can you provide a small sketch that runs on T3 and captures the capabilities you need for your application? What I mean is, I'd like to understand how many stepper motors you use, what are the motor parameters, and at least one example of what you require in terms of coordination.
 
Joe — I’m not sure about a small sketch, since it’s kind of difficult to extract something like an individual function from a huge project.

The project / Teensy code is here — https://github.com/elfren/RoseEngine_SpindleAndAxis/tree/master/Teensy if you want to read through it.

Basically we’re using the library for coordinating the motion of 2 to 4 motors at a time, generally running at relatively slow speeds. The coordination allows for patterns as simple as helices, e.g. turning a spindle while moving a single axis, then rewind, index and repeat, to more complex patterns like rose curves. We’ve added ancillary motors to perform some on-the-fly adjustments, while a primary function (of 2 motors) is running.

I’ll talk to @Elf and see if maybe there’s a “building block” example that could serve as a test bed.

Thanks,

—Jon
 
Joe — I’m not sure about a small sketch, since it’s kind of difficult to extract something like an individual function from a huge project.

The project / Teensy code is here — https://github.com/elfren/RoseEngine_SpindleAndAxis/tree/master/Teensy if you want to read through it.

Basically we’re using the library for coordinating the motion of 2 to 4 motors at a time, generally running at relatively slow speeds. The coordination allows for patterns as simple as helices, e.g. turning a spindle while moving a single axis, then rewind, index and repeat, to more complex patterns like rose curves. We’ve added ancillary motors to perform some on-the-fly adjustments, while a primary function (of 2 motors) is running.

Thanks. Good to know that it's 2-4 motors. It would be helpful if you could synthesize requirements that are not specific to your application. For example, what are typical motor resolution (steps and microsteps per rev) and speeds (microstep/s). What is a simple example of motor coordination that would capture some, if not all, of your requirements? Would the MultipleSteppers example in the TeensyStep be a good starting point? If an existing example captures some of your requirements, could you extend that example to capture the others?
 
Joe — I think the Path_Following example would be a closer example for what we’re doing, using RotateControl, and was what @luni based some of the library changes on when we were trying to get things to work a few years ago.

[edit] I just thought to go look and confirm it was still there… You can see a lot of what @luni was trying to get working back then by digging through the PathControl branch of the TeensyStep library. I shared Frank Farris’ paper Wheels on Wheels on Wheels (linked in that branch), which @luni ended up using for a couple examples and test rigs. The PeriodicFunctionFollower was early trials with the rose curves and other cycloids mentioned earlier.

Axes and slides vary by individual machine and driver selections. I’ll try to get some representative values to help define the problem.

—Jon
 
Last edited:
This sketch shows one of the main issues with T4.1. Typical step rates are under 30,000/sec. Microsteps are usually set to 32 with a few machines using 16. 99.9% of the routines will use four or fewer steppers synchronously.
 
This sketch shows one of the main issues with T4.1. Typical step rates are under 30,000/sec. Microsteps are usually set to 32 with a few machines using 16. 99.9% of the routines will use four or fewer steppers synchronously.
Okay, thanks. It's going to take me some time to get into the low-level stuff. I have to understand how this program works on T3, then try to understand how Luni has mapped T3 timer functions onto T4 timers.
 
This sketch shows one of the main issues with T4.1. Typical step rates are under 30,000/sec. Microsteps are usually set to 32 with a few machines using 16. 99.9% of the routines will use four or fewer steppers synchronously.
Elf, I read the messages on github. Do you mean that this same program runs well on T3.5/3.6, with the only change to the source code being to #include "TeensyStep.h" rather than "TeensyStep4.h"?
 
No, the stepper/controller interface is different. The Path Following routine here is the T3.5/3.6 version.
 
I figured out why the RoseFollower_T4 program hangs, and I have a work-around for you to test.

The program hangs when the value speedFac is too close to 0. The work around is to add the code shown below to the tick() handler, before the call to slide.overrideSpeed(). I found that a minimum value of speedFac of 0.00005 was the lowest it could be to avoid the problem, but it might be safer to make it a little bit larger, like 0.0001, or any value that you think is sufficiently close to 0 speed, but not 0.

I'm going to try to fix overrideSpeed(), but that could take a little more time, so I thought it would be good if you could test the work-around in the meantime. More on the problem in overrideSpeed() tomorrow.

Please let me know how it goes.

Code:
    // make sure speedFac is not too close to 0
    const float minSpeedFac = 0.00005;
    if (speedFac >= 0 && speedFac < minSpeedFac)
      speedFac = minSpeedFac;
    else if (speedFac <= 0 && speedFac > -minSpeedFac)
      speedFac = -minSpeedFac;
   
    slide.overrideSpeed(speedFac);             // set new speed
 
Here is a fix to TeensyStep4 for the slide getting "stuck" at 9999 in example sketch RoseFollower_T4. Please use this instead of the work-around from my previous message.

In file stepperbase.cpp, modify function overrideSpeed() as shown below. This change is necessary because if argument float factor is close to 0.0f, the new v_tgt (int32_t) will be zero, which results in stopping the motor. With this fix, unless the argument float factor is exactly 0.0f, v_tgt will be non-zero (>= 1 or <= -1).

I was surprised to see that TeensyStep4 is not just an update to TeensyStep (T3) to use T4 timers. It's quite different. Does anyone know why? Is it meant to address some shortcomings in TeensyStep (T3)?

Code:
    void StepperBase::overrideSpeed(float factor)
    {
        if (mode == mode_t::rotate)
        {
            noInterrupts();
            // make sure v_tgt (int32_t) is non-zero unless factor is exactly 0.0f
            float temp = v_tgt_orig * factor;
            v_tgt = temp;
            if (v_tgt == 0 && temp > 0.0f)
                v_tgt = 1;
            else if (v_tgt == 0 && temp < 0.0f)
                v_tgt = -1;
            v_tgt_sqr = (int64_t)signum(v_tgt) * v_tgt * v_tgt;
            vDir      = (int32_t)signum(v_tgt_sqr - v_sqr);
            interrupts();

            // Serial.print(v_tgt_sqr);
            // Serial.printf("  %d \n", v_tgt);
        }
    }
 
Back
Top