Low-level library and timer assistance -- TeensyStep library ???

Preliminary tests on my desktop test unit were successful. I have a project to finish on my rose engine before I can fully test it.

Thank you very much for this.
 
I let the example run overnight, and something different happened after about 6 hours. I have the program set up to print if the slide position gets stuck at 9999. That didn't happen, but something else did, so there may be another edge case that it not yet handled correctly. I'm going to add more logging, perhaps to SD, to try to capture what happened.
 
Preliminary tests on my desktop test unit were successful. I have a project to finish on my rose engine before I can fully test it.

I just noticed a bug in the RoseFollower_T41.ino file. The call to tickTimer.priority(255) should be AFTER the call to begin(). Calling priority() before calling begin() has no effect, so please also make this change before you test.
 
Hi Joe -- I want to make sure we're not chasing the wrong goal here... @Elf / Ed posted a link to illustrate an issue with TeensyStep4, trying to answer your question...

But our ideal / real goal has always been to get the *original* TeensyStep library to be compatible with the timer changes in the T4.x. That way the same project can be compiled to run on T3.x and T4.x versions. (Eventually allowing for EoL of the T3.x versions.)

As you now noticed, the TeensyStep4 library is a completely different model, and while @Elf / Ed has sort of tested and re-written portions of the code to sort of try to get it to work, and found issues along the way, TeensyStep4 is not the path we wanted to go down. (And it was just @luni's attempt to put a band-aid on the T4.x timer issue.)

Can you focus on whatever it might take to get original TeensyStep to work on T4.x's?

Thanks, and apologies for any confusion,

--Jon & @Elf
 
Hi Joe -- I want to make sure we're not chasing the wrong goal here... @Elf / Ed posted a link to illustrate an issue with TeensyStep4, trying to answer your question...

But our ideal / real goal has always been to get the *original* TeensyStep library to be compatible with the timer changes in the T4.x. That way the same project can be compiled to run on T3.x and T4.x versions. (Eventually allowing for EoL of the T3.x versions.)

As you now noticed, the TeensyStep4 library is a completely different model, and while @Elf / Ed has sort of tested and re-written portions of the code to sort of try to get it to work, and found issues along the way, TeensyStep4 is not the path we wanted to go down. (And it was just @luni's attempt to put a band-aid on the T4.x timer issue.)

Can you focus on whatever it might take to get original TeensyStep to work on T4.x's?

Thanks, and apologies for any confusion,

--Jon & @Elf

Thanks for addressing this question. Do you know any of the reasons or history behind WHY TeensyStep4 uses a different model, or why the T4 timers don't map easily onto the way the T3 timers are used in TeensyStep?
 
I will dig back into my history notes and links... there are a couple threads where @luni described the issues, during the beta and prior to the T4.x's release. My vague recollection was that there was a discontinuity / disconnect between the bus frequency and syncing with the (peripheral?) timers he was using. While I'm looking, you can also search through @luni's posts.

He created TeensyStep4 as a sort of re-architected stop-gap, to try to get something working. But he was also sucked into a big work project and couldn't commit time to finding a solution to get TeensyStep running on the T4.x. He also couldn't spend time to fix issues with TeensyStep4. He did say at one point, if he did get it running, with the original TeensyStep API, not to expect T4 performance, again implying timer performance / sync issues.

P.S. I always try to add the caveat that this low-level timer stuff is "above my pay grade".
 
I will dig back into my history notes and links... there are a couple threads where @luni described the issues, during the beta and prior to the T4.x's release. My vague recollection was that there was a discontinuity / disconnect between the bus frequency and syncing with the (peripheral?) timers he was using. While I'm looking, you can also search through @luni's posts.

He created TeensyStep4 as a sort of re-architected stop-gap, to try to get something working. But he was also sucked into a big work project and couldn't commit time to finding a solution to get TeensyStep running on the T4.x. He also couldn't spend time to fix issues with TeensyStep4. He did say at one point, if he did get it running, with the original TeensyStep API, not to expect T4 performance, again implying timer performance / sync issues.

P.S. I always try to add the caveat that this low-level timer stuff is "above my pay grade".

I found and read the "Fast Stepper Motor Library" thread. Luni doesn't say why the TeensyStep approach won't work on T4, but he seems to have been very clear that a different approach was needed, thus TeensyStep4. On Aug 13, 2019, someone asking if TeensyStep would be ported to T4, and Luni answered:

Sorry, not yet. The timers of the T4 behave differently from those of the T3s. So, I need to rethink a few concepts to utilize the speed of the T4 which will take some time.

I'm guessing, but based on this message from Luni, I think you're right that the issue is something to do with synchronization between timers. After all, synchronization is what TeensyStep is about. I wonder if the issue is that PIT and FTM are "synchronized" on T3 in some fundamental way that PIT and the PWM-capable timers on T4 are not. If that's so, then I wonder if PIT can be avoided altogether on T4? The PWM timers do some more capable on T4 than on T3.

I need a better understanding of TeensyStep on T3 before I can think about alternative ways to get the same concept to work on T4. Luni is a very sophisticated C++ developer, so it's not easy for me to pick up on his designs quickly, if ever. One thing I'll mention, though, is the TeensyStep thread contains a fair amount of discussion of overrideSpeed() in TeensyStep4, so it's interesting that is where I found the issue that was causing Elf's problem with RoseFollower_T41.
 
Status update, still working on the problem in TeensyStep4 (TS4). The fix to overrideSpeed() allows the test sketch RoseFollower_T41 to run for quite a long time (6+ hours) without trouble, but the slide does eventually get stuck as before. I've got it running on my bench with logic to try to capture the conditions at which it gets stuck, and I think I'll have an answer later today.

I've been reviewing the source of TS4 versus and TS(3), and TS4 is much simpler and much less code. It uses only the QuadTimer (TMR). The problem in overrideSpeed() can be described as unintentionally setting the slide's speed to 0, which results in it stopping. My change to the logic in overrideSpeed() will avoid the problem if the function argument is very small, but I think my bench test may show that function tick() can (on rare occasions) call overrideSpeed() with argument speedFac exactly 0.0f. If that's true, the full fix will require a change to the spindle-versus-slide speed logic in tick(). I'm not sure whether it's important to be able to use overrideSpeed() to stop a motor, or if there is an alternative to that, but RoseFollower_T41 does use overrideSpeed(0) for that purpose.

TS4 uses a lot of single-precision floating-point, which I think is a holdover from TS, since T3.x has a single-precision FPU, and code speed depends on using float rather than double. T4.x has a double-precision FPU, so there is no speed advantage to float versus double, but there are still many instances in TS4 of using the float versions of standard functions, such as cosf(), sqrtf(), etc. There are also a number of places in TS4 where single-precision floating-point functions are used to operate on 64-bit integers, such as sqrtf(v_tgt_sqr). This will convert the 64-bit integer to float, possibly with loss of precision, before the sqrt is taken. It should probably use sqrt(v_tgt_sqr), which will convert to double, and then the result of sqrt() can be cast to float, with less chance of loss of precision. TS(3) uses int64_t where float does not provide sufficient precision. TS4 could probably use double everywhere that either float or int64_t is used, avoiding some of these pitfalls of casting back and forth.
 
After more testing of sketch RoseFollower_T41 with TeensyStep4, I'm confident now that the fix to overrideSpeed() does work correctly. If you let the sketch run long enough, about 8 hours, the slide will get stuck, not because of a problem in the library, but because of the use of single-precision float variables in the tick() function. spindlePos increases continuously, and eventually reaches a value that can't be accurately represented with a 32-bit float. When that happens, the position error (slidePos - slideTarget) loses accuracy and can become 0, which causes the slide to stop.

Put another way, when you first start RoseFollower_T41, the slidePos and slideTarget are sinusoids (via cosf), and the velocity changes sinusoidally. If you let the program run long enough, slidePos and slideTarget degrade to triangles, and the velocity becomes more discontinuous.

I'll set it up to run tonight with the local variables in tick() changed from float to double, and I think that will run forever without getting stuck.

I spent some more time studying TeensyStep, and I'm still trying to understand the timer usage. It uses PIT for stepTimer, and FTM for accTimer and delayTimer, but I don't yet understand the detail of how they are used, or why this approach won't work on T4.
 
Last edited:
I'll set it up to run tonight with the local variables in tick() changed from float to double, and I think that will run forever without getting stuck.
This worked. With a few variables in tick() changed from float to double, RoseFollower_T41 has run for 20+ hours without getting stuck.
 
Not much feedback from the TeensyStep users, but here's another status update.

I've spent some time trying to understand TeensyStep for T3.x, and I'll share my progress here for anyone who is interested.

TeensyStep allows for control of up to 4 groups of motors, with up to 10 motors in each group. Each of the 4 possible groups uses 3 timer resources (1 x PIT and 2 x FTM). The PIT instance is referred to as stepTimer, and is configured to interrupt at the desired step frequency. The leading edges of the step pulses for all of the motors in a group are generated in stepTimerISR(). The two FTM channels are referred to as delay timers. They are both used in output compare mode to produce interrupts at some future time ("delay"). One is referred to as delayTimer, and is used to generate the trailing edges of all step pulses for the group, and the other is referred to as accTimer. It occurs at lower frequency, and controls motor acceleration (steps/sec^2) by computing a new motor velocity (steps/sec) and modifying the PIT stepTimer to interrupt at that frequency.

The PIT channel's stepTimerISR() generates the leading edge of each step pulse and also calls the delayTimer FTM channel's trigger() function to configure the FTM channel to generate an interrupt at the desired time for the trailing edges. The FTM channel's trigger() function reads the CNT register, adds the desired delay, and writes that sum to the compare value (CV) register. When the counter (CNT) reaches the value written to CV, an FTM interrupt occurs, and delayTimerISR() writes the trailing edge of the pulse. The second FTM channel is used to control acceleration, and is referred to as accTimer. By default, the FTM accTimer channel is configured to interrupt every 5 ms (5000 us). Since both FTM channels are part of the same FTM module, they share an interrupt, and thus delayTimerISR(). When either match occurs, delayTimer or accTimer, delayTimerISR() checks both channel flags. If the flag for the delayTimer channel is set, the ISR generates the pulse trailing edges, and if the flag for the accTimer channel is set, it calls the acceleration update function, which computes a new velocity, modifies the PIT frequency accordingly, and configures the FTM channel to generate the next acceleration interrupt.

As a side note, I'm not sure why a separate FTM channel is used as accTimer, as opposed to updating velocity on every N occurences of either the leading edge (PIT) or trailing edge (FTM) interrupts.

Control of the step and direction pins for all motors is done via the T3's "bitband" I/O feature. This feature provides separate register locations to set or clear a given I/O bit that controls a pin. So, step pulse leading and trailing edges and direction changes are generated by writing 1 to the appropriate register locations.

There are 4 PIT channels, and each FTM has 8 channels, so 4 groups of motors can be controlled. Each group can contain up to 10 motors. Within the group, the one with the highest step frequency is the "lead" motor, and the PIT interrupt frequency is set according to the current step frequency for that motor. All other motors in the group are "slave" motors. They are optionally stepped on each PIT interrupt, with the composite frequency yielding the desired speed for each motor. The decision whether to step each slave motor on a step of the lead motor is made according to the Bresenham algorithm.

TeensyStep4 is quite different. It uses just one QuadTimer (TMR) channel for a motor group. That single TMR channel is used in PWM mode, with the PWM period set for the desired step frequency, and the PWM pulse width set of the desired step pulse width. Perhpas because the T4 has so much more computing power, acceleration is controlling updating the velocity (steps/sec) and therefore the PWM frequency, on every cyles.

I don't know of any reason why the T3 approach (1 x PIT + 2 x FTM) could not be mapped onto the T4 (1 x PIT + 2 x TMR), so perhaps @luni just decided it made more sense to use a single TMR channel in PWM mode to go from 3 timer resources to 1, and to take advantage of the T4's greater computing power to control acceleration by updating velocity on every step. I wonder whether the T4 approach could be mapped back onto the T3, using a single FTM channel in edge-aligned PWM mode rather than 1 PIT and 2 FTM channels in output compare mode. That would actually allow support for more groups, though I doubt that more than 4 would be necessary. Also, the T4 does not have bitband I/O functions, so TeensyStep4 uses digitalWriteFast() and digitalToggleFast() to change the state of step and direction pins.

So, the possibilities are:
  • modify TeensyStep(3) to support T4 via the same timer approach used for T3 (1 x PIT + 2 x TMR)
  • update TeensyStep4 to have the same API as TeensyStep, so applications can easily move to TeensyStep4
@elfren and other TeensyStep users, now that I've shown that RoseFollower_T41 can run without getting "stuck", can you please comment on what else you cannot do with TeensyStep4 that you can do with TeensyStep(s)?
 
It looks like all of the functions are working. The rose function is the only one which changes speeds and directions without stopping. I'm still seeing random crashes that I can't repro.

Thanks for all of your work on this. It's giving hope for the future.
 
It looks like all of the functions are working. The rose function is the only one which changes speeds and directions without stopping. I'm still seeing random crashes that I can't repro. Thanks for all of your work on this. It's giving hope for the future.
Do you mean crashes in the RoseFollower_T41 example sketch? If so, can you estimate how long after starting was the crash?
 
No, in the main program. I need to do quite a bit more testing in it, but the crashes appear to be happening when changing a setting.
 
No, in the main program. I need to do quite a bit more testing in it, but the crashes appear to be happening when changing a setting.
I’m so new to this, please assume I know nothing about your application. The more detail you can give me, the better chance I have of being able to help, and of course, if you can, try to provide a sketch that shows the problem.
 
This sketch in this link shows rotateAsync doesn't work correctly when the faster motor's speed/direction is negative.

Okay, that one was simple. In file TeensyStep4/src/steppergroupbase.h, the startRotate() function contains code used for sorting the steppers to find the one with highest speed to act as the lead motor. That code was comparing the max velocities, but should compare the absolute values of the max velocities as shown here. With this change, all 4 cases work correctly.

Code:
auto deltaSorter = [](Stepper* a, Stepper* b) { return std::abs(a->vMax) > std::abs(b->vMax); };

Here is output of a modified sketch that prints motor positions once per second and shows that all 4 cases are working correctly. Below that is the modified source. The code from main.cpp was moved into SynchroSpeeds.ino, so there is just one file. There is a new subroutine doRotation() that takes the two stepper velocities and accelerations as arguments and sets the positions of each motor to 0 before starting, so it's easier to add cases and see what's happening.

Code:
s1= 10000, a1= 50000, v2=  1000, a2= 25000

t = 1, s1pos =     9056, s2pos =      905
t = 2, s1pos =    19050, s2pos =     1905
t = 3, s1pos =    29045, s2pos =     2904
t = 4, s1pos =    39040, s2pos =     3904
t = 5, s1pos =    49034, s2pos =     4903

s1=-10000, a1= 50000, v2= -1000, a2= 25000

t = 1, s1pos =    -9056, s2pos =     -905
t = 2, s1pos =   -19050, s2pos =    -1905
t = 3, s1pos =   -29045, s2pos =    -2904
t = 4, s1pos =   -39040, s2pos =    -3904
t = 5, s1pos =   -49034, s2pos =    -4903

s1=-10000, a1= 50000, v2=  1000, a2= 25000

t = 1, s1pos =    -9056, s2pos =      905
t = 2, s1pos =   -19050, s2pos =     1905
t = 3, s1pos =   -29045, s2pos =     2904
t = 4, s1pos =   -39040, s2pos =     3904
t = 5, s1pos =   -49034, s2pos =     4903

s1= 10000, a1= 50000, v2= -1000, a2= 25000

t = 1, s1pos =     9056, s2pos =     -905
t = 2, s1pos =    19050, s2pos =    -1905
t = 3, s1pos =    29045, s2pos =    -2904
t = 4, s1pos =    39040, s2pos =    -3904
t = 5, s1pos =    49034, s2pos =    -4903

Code:
#include "Arduino.h"
#include "teensystep4.h"

using namespace TS4;

Stepper s1(3, 2);
Stepper s2(6, 5);

// define and move a groups of steppers synchronized
StepperGroup g1{s2, s1};

void setup()
{
  Serial.begin( 9600 );
  while (!Serial && millis() < 3000) {}
  pinMode(LED_BUILTIN, OUTPUT);
  TS4::begin();
}

void doRotation( int32_t v1, int32_t a1, int32_t v2, int32_t a2 )
{
  // print function arguments
  Serial.printf( "s1=%6ld, a1=%6ld, v2=%6ld, a2=%6ld\n", v1, a1, v2, a2 );
 
  // set postion of both steppers to 0
  s1.setPosition(0);
  s2.setPosition(0);
 
  // set max speed and accel for both steppers
  s1.setMaxSpeed(v1).setAcceleration(a1);
  s2.setMaxSpeed(v2).setAcceleration(a2);

  // start group, let it run, stop group
  int seconds = 0;
  elapsedMillis ms = 0;
  g1.startRotate();
  while (seconds < 5) {
    if (ms >= 1000) {
      seconds++;
      ms -= 1000;
      Serial.printf( "t = %1d, s1pos = %8ld, s2pos = %8ld\n",
        seconds, s1.getPosition(), s2.getPosition() );
    }
  }
  g1.stopAsync();
  delay(1500); 
}

void loop()
{ 
  doRotation(  10'000, 50'000,  1'000, 25'000 ); // s1 10x s2, both positive
  doRotation( -10'000, 50'000, -1'000, 25'000 ); // s1 10x s2, both negative
  doRotation( -10'000, 50'000,  1'000, 25'000 ); // s1 10x s2, s1 neg, s2 pos
  doRotation(  10'000, 50'000, -1'000, 25'000 ); // s1 10x s2, s1 pos, s2 neg

  digitalToggleFast(LED_BUILTIN);
  while(1) {};
}
 
I think that fixes all of the speed issues. I'm still having random crashes and reboots. They are generally happening when changing settings while the steppers are not running. Unfortunately, repeating the same set of actions doesn't trigger a crash.
 
I think that fixes all of the speed issues. I'm still having random crashes and reboots. They are generally happening when changing settings while the steppers are not running. Unfortunately, repeating the same set of actions doesn't trigger a crash.

Can you tell me what you mean by "changing settings" and provide a representative example?
 
Not much feedback from the TeensyStep users, but here's another status update.
Apologies, my notifications for the thread stopped working, for some unknown reason. But @Elf is really the one doing all the work. ;)

and sets the positions of each motor to 0 before starting
This might be a concern, but I’ll let Ed chime in, but we have a lot of places where we have to retain the motor position. Ed may be able to save and / re-write the current position(s), or maybe the motors don’t have to set to 0 once we know the library fixes are working?

Thank you Joe for the detailed explanation for how the libraries work. Even though the low-lever timer stuff is above my pay grade, your description was readable.

—Jon
 
This might be a concern, but I’ll let Ed chime in, but we have a lot of places where we have to retain the motor position. Ed may be able to save and / re-write the current position(s), or maybe the motors don’t have to set to 0 once we know the library fixes are working?
Hi Jon. Resetting the position to 0 was just to make it easier to see that all of the speeds were correct. The position was not changed by that command. It just redefines the current position as 0.
 
Back
Top