# analogWriteFrequency and analogWrite rounding

#### nox771

##### Well-known member
I've been working with the PWM functions on a Teensy3. I noticed that the analogWriteFrequency() and analogWrite() functions truncate rather than round fractional periods. This becomes more noticeable at high PWM frequencies.

For analogWriteFrequency(), since the period is truncated (ie. rounded down), the frequency always gets rounded up to the next quantized value. For instance if you do:
Code:
``analogWriteFrequency(20,1600000);``
then you get 1.6MHz PWM since 1.6MHz is an even divisor into the 48MHz bus clock.

But if you do this:
Code:
``analogWriteFrequency(20,1600001);``
then you get 1.655172MHz since the period (29.99998125 bus clocks) got truncated to 29 bus clocks.

A patch for this is to change the mod calculation in analogWriteFrequency() from:
Code:
``mod = ((F_BUS >> prescale) / frequency) - 1;``
to
Code:
``mod = (((F_BUS >> prescale)*10/frequency)+5)/10 - 1; // integer round``
which rounds up fractional clocks greater than or equal to 1/2 bus clocks. This effectively picks the closest quantized PWM freq given a specified freq.

Likewise for analogWrite() the duty cycle is truncated (rounded down regardless of fraction):
Code:
``````if (pin == 3 || pin == 4) {
cval = ((uint32_t)val * (uint32_t)(FTM1_MOD + 1)) >> analog_write_res;
} else {
cval = ((uint32_t)val * (uint32_t)(FTM0_MOD + 1)) >> analog_write_res;
}``````

Similar to above this can be fixed by adding 1/2 LSB prior to truncating (a bit easier here since the divisor is always a power of two, and the 'max' value is already defined in the preceding code):
Code:
``````if (pin == 3 || pin == 4) {
cval = ((uint32_t)val * (uint32_t)(FTM1_MOD + 1) + (max >> 1)) >> analog_write_res;
} else {
cval = ((uint32_t)val * (uint32_t)(FTM0_MOD + 1) + (max >> 1)) >> analog_write_res;
}``````

Is there any way Teensyduino can be patched like this? Currently I redefine the functions for my stuff, but it's a little bothersome since I have to keep a separate copy of analog_write_res (since it is a static global in pins_teensy.c I can't access it from my code).

Code:
``        mod = (((F_BUS >> prescale) + (frequency >> 1)) / frequency) - 1;``

Ah I should have seen that, that's clever, yes it should work.

Code:
``        mod = (((F_BUS >> prescale) + (frequency >> 1)) / frequency) - 1;``

Actually, I'm kind of wondering if there is a slight error.
Code:
``(frequency >> 1)/frequency``
is 0.5 unless frequency is an odd number, in which case it is 0.499... (it suffers the same truncation problem). I don't know if this is a practical problem for the calculation of mod, since the error is so small, but if you don't mind an extra addition then perhaps this:

Code:
``mod = (((F_BUS >> prescale) + ((frequency+1) >> 1)) / frequency) - 1;``

I had another thought - this method may not be good at very low frequencies (< 1kHz). I would need to script up a comparison of the 'mod' numbers that come out using the different methods.

Last edited:
Can you think of any specific frequencies where the results are not ok?

Code:
``        mod = (((F_BUS >> prescale) + (frequency >> 1)) / frequency) - 1;``

Your method here is fine. I was curious about it so I did a quick perl script to check the different algorithms:

Code:
``````#!/usr/bin/perl
\$FBUS = 48e6;
\$ptsPerDec = 10000; # points per decade
\$display = 0;       # set to 1 to dump all results

for(\$freq=10; \$freq <= 12e6; \$freq=\$freq*10**(1/\$ptsPerDec)) {
for(\$prescale = 0; \$prescale < 7; \$prescale++) {
\$minfreq = (\$FBUS >> 16) >> \$prescale;
last if(\$freq > \$minfreq);
}
\$modA = int((\$FBUS >> \$prescale)/\$freq) - 1;
if(\$modA > 65535) { \$modA = 65535; }
\$modB = int((int((\$FBUS >> \$prescale)*10/\$freq)+5)/10) - 1;
if(\$modB > 65535) { \$modB = 65535; }
\$modC = int(((\$FBUS >> \$prescale)+(\$freq >> 1))/\$freq) - 1;
if(\$modC > 65535) { \$modC = 65535; }
\$modD = int(((\$FBUS >> \$prescale)+((\$freq+1) >> 1))/\$freq) - 1;
if(\$modD > 65535) { \$modD = 65535; }

\$freqA = (\$FBUS >> \$prescale)/(\$modA+1);
\$freqB = (\$FBUS >> \$prescale)/(\$modB+1);
\$freqC = (\$FBUS >> \$prescale)/(\$modC+1);
\$freqD = (\$FBUS >> \$prescale)/(\$modD+1);

\$errA = (1-\$freqA/\$freq)*100;
\$errB = (1-\$freqB/\$freq)*100;
\$errC = (1-\$freqC/\$freq)*100;
\$errD = (1-\$freqD/\$freq)*100;

\$totErrA += abs(\$errA);
\$totErrB += abs(\$errB);
\$totErrC += abs(\$errC);
\$totErrD += abs(\$errD);

# only print freqs where error between methods B/C/D is different
# (method A is known bad so it is not checked)
if(\$display || \$errB != \$errC || \$errC != \$errD || \$errB != \$errD) {
printf("target freq:%8.2f  |  prescale:%d  |  ".
"modA:%6s freqA:%8.2f errA:%8.5f%  |  ".
"modB:%6s freqB:%8.2f errB:%8.5f%  |  ".
"modC:%6s freqC:%8.2f errC:%8.5f%  |  ".
"modD:%6s freqD:%8.2f errD:%8.5f%\n",
\$freq, \$prescale,
\$modA, \$freqA, \$errA,
\$modB, \$freqB, \$errB,
\$modC, \$freqC, \$errC,
\$modD, \$freqD, \$errD);
}
}
print("\ntotErrA: \$totErrA  totErrB: \$totErrB  totErrC: \$totErrC  totErrD: \$totErrD\n");``````

The original method is of course bad due to its truncation, but for all other methods they match almost all the time, and when they don't it is a difference of negligible amount. So I think the method you show above is fine.

Great. I'll put this into version 1.14.

I made the above calculations with the mentioned numbers and got the same rounding dependent results as described in this thread. In the current file teensy/avr/cores/teensy3/pins_teensy.c I found this line: mod = (float)(ftmClock >> prescale) / frequency - 0.5f;

I also tried this formula, but the described problem did not dissapear completely.

So I would like to offer an alternative approach with more accurate results:

Code:
``````double input_freq = 400.0;
double increment = 10.0;

uint32_t mck = F_BUS;
uint8_t prescaler = 1;                          // best accuracy with smallest divider(s)
uint8_t divisor;                                // smallest possible divisor will be calced
uint32_t period;                                // required value for period will be calced
double output_freq;
double mod;
uint32_t duration;

void setup() {
Serial.begin(9600);
while (!Serial);
Serial.println(" ");
Serial.print(" input freq");
Serial.print("               calced mod");
Serial.print("           calced in µs");
Serial.println("          output freq");
Serial.println(" ");
}

double calc_mod(double freq) {
divisor = 0;
uint32_t ticks = mck / (freq * 256);
if (ticks >= 262144) return 0;
else {
if (ticks >= 256) divisor = lmb12(ticks >> 7);
uint32_t mck_div = mck >> divisor;
period = (mck_div / (prescaler * freq)) + 0.5;
output_freq = (double) mck_div / (prescaler * period);
return period;
}
}

uint32_t lmb12(int x) {
// find index of left most bit (lmb) position equal 1 in x
// last bit has index 0
// require x >= 0 and x < 4096 (max 12 bits)
// x = 0 -> 0
uint32_t log2_tab[64] = {
0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
};
if (x >= 64) return log2_tab[x >> 6] + 6;
return log2_tab[x];
}

void loop() {
delay(400);
Serial.print("  ");
Serial.print(input_freq, 5);
Serial.print("                ");
duration = micros();
mod = calc_mod(input_freq);
duration = micros() - duration;
Serial.print(mod);
Serial.print("                 ");
Serial.print(duration);
Serial.print("                 ");
if (output_freq == input_freq) {
Serial.print(output_freq, 5);
Serial.println(" *     (exact match)");
}
else Serial.println(output_freq, 5);
input_freq += increment;
}``````
I developed this method on the M3 (Arduino DUE) and was able to compile and run it on my brand new teensy 3.6. I used this code to control stepper motors with acceleration ramps and would like to upgrade my projects with the teensy. Here you can see a video showing a Nema17 running at 15.000 RPM driven with this sketch!

Last edited:
Edit: I'm working with this code now.... still evaluating....

Last edited:
Ok, I've spend some time trying to compare against the existing algorithm. I can't find any cases where the results are different.

Can you please tell me which frequency "got the same rounding dependent results as described in this thread"?

Here's a slightly modified copy which computes both.

Code:
``````// https://forum.pjrc.com/threads/23448-analogWriteFrequency-and-analogWrite-rounding?p=148504&viewfull=1#post148504

double input_freq = 400.0;
double increment = 10.0;

uint32_t mck = F_BUS;
uint8_t prescaler = 1;                          // best accuracy with smallest divider(s)
uint8_t divisor;                                // smallest possible divisor will be calced
uint32_t period;                                // required value for period will be calced
double output_freq;
double err;
double mod;
uint32_t duration;

void setup() {
Serial.begin(9600);
while (!Serial);
Serial.println(" ");
Serial.print(" input freq");
Serial.print("               calced mod");
Serial.print("           calced in µs");
Serial.println("          output freq");
Serial.println(" ");
}

double calc_mod(double freq) {
divisor = 0;
uint32_t ticks = mck / (freq * 256);
if (ticks >= 262144) return 0;
else {
if (ticks >= 256) divisor = lmb12(ticks >> 7);
uint32_t mck_div = mck >> divisor;
period = (mck_div / (prescaler * freq)) + 0.5;
output_freq = (double) mck_div / (prescaler * period);
return period;
}
}

uint32_t lmb12(int x) {
// find index of left most bit (lmb) position equal 1 in x
// last bit has index 0
// require x >= 0 and x < 4096 (max 12 bits)
// x = 0 -> 0
uint32_t log2_tab[64] = {
0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
};
if (x >= 64) return log2_tab[x >> 6] + 6;
return log2_tab[x];
}

void loop() {
delay(40);
Serial.printf("%7.0f", input_freq);
Serial.print("       ");
duration = micros();
mod = calc_mod(input_freq);
duration = micros() - duration;
Serial.printf("%7.0f", mod - 1);
Serial.print("       ");
Serial.printf("%3d", duration);
Serial.print("       ");

Serial.print(output_freq, 4);
err = (output_freq - input_freq) / input_freq * 1e6;
Serial.printf(" (%7.2fppm)", err);

uint32_t ftmClock = F_BUS;
uint32_t prescale;
for (prescale = 0; prescale < 7; prescale++) {
float minfreq = (float)(ftmClock >> prescale) / 65536.0f;
if (input_freq >= minfreq) break;
}

uint16_t mod2 = (float)(ftmClock >> prescale) / ((float)input_freq) - 0.5f;
output_freq = (double)(ftmClock >> prescale) / (double)(mod2 + 1);

Serial.print("         ");
Serial.print(mod2);
Serial.print("     ");
Serial.print(output_freq, 4);
err = (output_freq - input_freq) / input_freq * 1e6;
Serial.printf(" (%7.2fppm)", err);

if ((mod - 1) != mod2) Serial.print(" DIFF");

Serial.println();

input_freq += increment;
}``````

Can you please tell me which frequency "got the same rounding dependent results as described in this thread"?

I am on tour, as soon as I am home I will run your code. Many thanks! To answer your question: I calculated the mentioned versions of the formula for 1600000 and 1600001 and as far as I remember either 1600000 was even and 1600001 too inaccurate (as stated by the TO) or 1600001 was closer and 1600000 did not result in an even frequency.

Another observation: I tried to run acceleration ramps on the teensy 3.6 with analogeWriteFrequency and found that if the ramp steps follow too fast the PWM output would not always update. Only with insertet delays I get a smooth ramp. Pseudocode:

Code:
``````void loop() {
analogWriteFrequency(pin, freq);
freq++
}``````

Works only if modified like this:

Code:
``````void loop() {
delayMicroseconds(x);
analogWriteFrequency(pin, freq);
freq += 20;
}``````

The divider(s) in my code are calculated without a for loop. I will mesaure if there are any speed differences between our codes and post the results. Thanks again and sorry if I made unnecessary troubles.

I just ran it again. The both compute the same results for 1600000 and 1600001.

Maybe it's a timing issue? Or maybe a hardware register update / double-buffer issue? Hard to say without code to reproduce the problem.

Another observation: I tried to run acceleration ramps on the teensy 3.6 with analogeWriteFrequency and found that if the ramp steps follow too fast the PWM output would not always update. Only with insertet delays I get a smooth ramp.
analogWriteFrequency() aborts the current PWM cycle. It disables and restarts the timer, so you get a PWM glitch.

The hardware has synchronization for the timer control registers. You can get glitch-free PWM output by programming the timer directly. However, that register synchronization doesn't include the timer clock source / pre-scaler.

I have one other open bug report about the PWM glitch when changing frequency. My understanding is the mod register can't be changed while it's running. Or maybe it can? Is there any way analogWriteFrequency could do better, for a nicer transition?

I have one other open bug report about the PWM glitch when changing frequency. My understanding is the mod register can't be changed while it's running. Or maybe it can?
For FTM in TPM mode, the MOD register write is buffered and the synchronization / update happens when the counter overflows (assuming you leave the timer running).

E.g. K20 manual, '35.4.10.2 MOD register update' or AN5261, '4.2. MOD register update (FTMEN = 0)'.

However, you would probably want to update both CnV and MOD at the same time and you would probably want to use FTM mode for that (in TPM mode you would need to be sure that a timer overflow doesn't happen). Since a pre-scaler change won't be synchronized, I'm not sure a change is worth it with the current Arduino API.

For FTM in TPM mode, the MOD register write is buffered and the synchronization / update happens when the counter overflows (assuming you leave the timer running).

Can I feed a counter which counts the MOD counter overflow? It would be great to know the absolute number of PWM pulses sent at any time during PWM stream. With that I could use PWM for accurate positioning of my steppers too. Pseudocode of what I intend to do:

Code:
``````pwm_move_to_position(desired_number_of_pulses) {
pwm_ramp(up);
while (read_mod_overflow_counter() < 0.5 * desired_number_of_pulses) do_nothing;
pwm_ramp(down);
}``````

Can I feed a counter which counts the MOD counter overflow? It would be great to know the absolute number of PWM pulses sent at any time during PWM stream.

You can abuse the DMA controller to get a 15-bit counter (you can poll it and use the delta to get an extended counter). The DMA trigger occurs on the falling PWM edge. If you want it at the rising edge, you could use a second timer channel, e.g. FTM0 channel 1 and set FTM0_C1V to 0 for the DMA trigger.

Code:
``````#include <DMAChannel.h>

auto& serial = Serial;

uint8_t out_pin = 22; // FTM0_CH0 (ALT4)

DMAChannel dma;
uint8_t dma_dummy_src = 0;
uint8_t dma_dummy_dest = 0;

uint32_t getPulseCount() {
return 0x7fffu - dma.TCD->CITER;
}

void setup() {
serial.begin(115200);
delay(2000);
serial.println("Starting...");

dma.disable();
dma.source(dma_dummy_src);
dma.destination(dma_dummy_dest);
dma.transferCount(0x7fffu); // maximum supported value is 15 counter bits
dma.triggerAtHardwareEvent(DMAMUX_SOURCE_FTM0_CH0);

analogWriteFrequency(out_pin, 1000);

// enable DMA trigger for FTM0 channel 0
FTM0_C0SC |= FTM_CSC_CHIE | FTM_CSC_DMA;

analogWrite(out_pin, 50);

elapsedMillis report_timer;
dma.enable();

while(true) {
if(report_timer > 1000) {
report_timer = 0;
uint32_t pulse_count = getPulseCount();
Serial.printf("Pulse count: %u\n", pulse_count);
}
}
}

void loop() {}``````

Of course, if you use analogWriteFrequency(), you have the potential PWM signal glitch and the counter may or may not have been updated (and the pulse may or may not have been long enough for the stepper to recognize).

I just ran it again. The both compute the same results for 1600000 and 1600001.

Maybe it's a timing issue? Or maybe a hardware register update / double-buffer issue? Hard to say without code to reproduce the problem.

OK I just came home and made new calculations based on your latest code: now I could not find differences any more. But I found the reason: in my old calculations I overlooked the "+1" thing in line

Code:
``output_freq = (double)(ftmClock >> prescale) / (double)(mod2 + 1);``
Also I made speed tests: on the DUE the prescaler derived with a for-loop takes about twice as long as with the lmb12()-version; on the teensy 3.6 everything runs much faster plus the lmb12()-version was slower than your for-loop.

You can abuse the DMA controller to get a 15-bit counter (you can poll it and use the delta to get an extended counter). The DMA trigger occurs on the falling PWM edge. If you want it at the rising edge, you could use a second timer channel, e.g. FTM0 channel 1 and set FTM0_C1V to 0 for the DMA trigger.
Many thanks for the code!

Of course, if you use analogWriteFrequency(), you have the potential PWM signal glitch and the counter may or may not have been updated (and the pulse may or may not have been long enough for the stepper to recognize).

It would be extremely helpful for me to have glitch free transitions in my ramps, but my coding skills are limited. I managed to run PWM acceleration ramps on the teensy 3.6 with Pauls above code. The ramp calculations are built with kind help of Mr. Walter Bislin. He also made a diagram explaining it:

Code:
``````#include <math.h>
#include <DMAChannel.h>

// input values:
uint16_t settle = 100;                          // delay time to let the hardware settle
uint8_t pwm_pin = 22; // FTM0_CH0 (ALT4)        // PWM output pin
uint8_t res = 12;                               // PWM resolution
const byte duty_percent = 50;                   // 50% duty cycle
double f_min = 500.0;                           // start frequency in steps/s
double delta_t_ms = 1.0;                        // ramp stage duration in ms
double a_f_max_ms = 500.0;                      // max possible frequency increment per ms in steps/ms
double steps_R = 200.0;                         // steps per motor revolution
double microsteps = 16.0;                       // microstepping in microsteps per step

// timing belt or rack & pinion:
double diameter_mm = 40.0;                      // diameter of pinion in mm
double ratio = 1.0;                             // gear ratio (> 1 = slower speed)

double threadpitch_mm = 8;                      // thread pitch in mm: set to zero if pinion driven

// desired:
double s_wanted_cm = 8.0;                       // desired travel in cm
double v_wanted_cms = 20.0;                     // desired speed in cm/s
double s_c_min = 0.001;                         // minimum travel with constant speed in m

double delta_t;                                 // ramp stage duration (derived from delta_t_ms) in s
uint32_t delta_t_us;                            // ramp stage duration (derived from delta_t_ms) in µs
double delta_f;                                 // frequency increment per ramp stage
double a_f_max;                                 // max possible frequency increment per ms (derived from a_f_max_ms) in steps/ms
double s_wanted;                                // desired travel (derived from s_wanted_cm) in m
double v_wanted;                                // desired speed (derived from v_wanted_cms) in m/s
double a_f;                                     // actual acceleration in steps/s^2
double t_a;                                     // duration of acceleration in s
double t_c;                                     // duration of travel with constant speed v_wanted in s
uint32_t t_c_ms;                                // duration of travel with constant speed v_wanted in ms
uint32_t t_c_us;                                // duration of travel with constant speed v_wanted in µs
double N;                                       // total number of ramp stages
double n;                                       // actual ramp stage
double u;                                       // conversion factor frequency to speed v/f in m/steps
double f;                                       // actual frequency in steps/s
double f_wanted;                                // desired frequency in steps/s
uint32_t t_n_us;                                // time stamp of ramp stage in µs
double t_delay_us;                              // ramp stage duration minus computing time in µs

double t_total;                                 // total time = 2 * t_a + t_c
double s_c;                                     // travel with constant speed in m
double a;                                       // acceleration in m/s^2
double v;                                       // achievable max speed in m/s
double s_a;                                     // travel of ramp in m

uint32_t ftmClock = F_BUS;                      // PWM source frequency
uint8_t prescale;                               // smallest possible divider will be calced
uint16_t duty;                                  // duty cicle value

DMAChannel dma;
uint8_t dma_dummy_src = 0;
uint8_t dma_dummy_dest = 0;

void setup() {
Serial.begin(115200);

pinMode(pwm_pin, OUTPUT);
analogWriteResolution(res);
init();

delay(settle);

dma.disable();
dma.source(dma_dummy_src);
dma.destination(dma_dummy_dest);
dma.transferCount(0x7fffu); // maximum supported value is 15 counter bits
dma.triggerAtHardwareEvent(DMAMUX_SOURCE_FTM0_CH0);

analogWrite(pwm_pin, 0);

// enable DMA trigger for FTM0 channel 0
FTM0_C0SC |= FTM_CSC_CHIE | FTM_CSC_DMA;
dma.enable();

start();

uint32_t pulse_count = 0x7fffu - dma.TCD->CITER;
Serial.printf("Pulse count: %u\n", pulse_count);
}

void loop() {
setup();
delay(400);
}

double calc_freq_dds(double freq) {
return (double) 125000000 * uint32_t(freq * 4294967296 / 125000000 + 0.5) / 4294967296; // 125Mhz = DDS clock frequency of AD9850
}

double calc_freq(double freq) {
uint16_t mod = calc_mod(freq);
return (double)(ftmClock >> prescale) / (double)(mod + 1);
}

uint16_t calc_mod(double freq) {
for (prescale = 0; prescale < 7; prescale++) {
float minfreq = (float)(ftmClock >> prescale) / 65536.0f;
if (freq >= minfreq) break;
}
return (float)(ftmClock >> prescale) / ((float)freq) - 0.5f;
}

double simul_ramp(double f_min, double a_f, double delta_t, uint32_t N) {
// simulation of ramp in order to obtain accurate travel

double delta_f = a_f * delta_t;
double f = f_min;
double s_f = 0;                                                       // travel / u, where u = v / f
uint32_t n = 0;
while (n < N) {
s_f += calc_freq(f) * delta_t;
f += delta_f;
n += 1;
}
return (double) u * s_f;
}

void init() {
delta_t = delta_t_ms / 1000;                                          // time increment in s
a_f_max = a_f_max_ms * 1000;                                          // frequency increment in steps/s^2
s_wanted = s_wanted_cm / 100;                                         // desired travel in m
v_wanted = v_wanted_cms / 100;                                        // desired speed in m/s

u = (PI * (diameter_mm / 1000) / ratio) / (steps_R * microsteps);   // in m/steps
}
else {
u = threadpitch_mm / 1000 / (steps_R * microsteps);                 // in m/steps
}

// calculations with the presumption that speed is the limiting factor
f_wanted = v_wanted / u;                                              // in steps/s

// corrections for frequencys
f_wanted = calc_freq(f_wanted);                                       // use calc_freq_dds() if DDS chips generate the frequencys
f_min = calc_freq(f_min);                                             // use calc_freq_dds() if DDS chips generate the frequencys

a_f = a_f_max;                                                        // in steps/s^2
t_a = (f_wanted - f_min) / a_f;                                       // in s
N = floor(t_a / delta_t) + 1;

// correction of a_f in order to obtain equal ramp stages of f for N ramp stages of delta_t
a_f = (f_wanted - f_min) / (N * delta_t);

// re-calculation with new a_f
t_a = N * delta_t;

// predict minimum needed travel to reach v_wanted
// s_a = u * a_f * ((N - 1) / (2 * N)) * (t_a * t_a) + u * f_min * t_a;    // in m     // formula if no simulation is needed
s_a = simul_ramp(f_min, a_f, delta_t, N);

// in case s_wanted is the limiting factor
if (2 * s_a + s_c_min > s_wanted) {

// calculations with the presumption that travel is the limiting factor
// s_wanted/2 is limiting the duration of the ramp
a_f = a_f_max;
s_a = s_wanted / 2 - s_c_min;
t_a = sqrt((2 * s_a / (u * a_f)) + ((f_min / a_f) * (f_min / a_f))) - (f_min / a_f);

// recalculate max possible speed
f_wanted = calc_freq(a_f * t_a);                                                     // use calc_freq_dds() if DDS chips generate the frequencys
// f_wanted = a_f * t_a;                                                             // formula if no simulation is needed

// re-calculate number of ramp stages
N = floor(t_a / delta_t) + 1;

// correction of a_f in order to obtain equal ramp stages of f for N ramp stages of delta_t
a_f = (f_wanted - f_min) / (N * delta_t);

// re-calculation with new a_f
t_a = N * delta_t;

// re-predict travel
s_a = simul_ramp(f_min, a_f, delta_t, N);                                            // delete this line if no simulation is needed
}

s_c = s_wanted - 2 * s_a;                                                              // travel with constant speed in m
v = u * f_wanted;                                                                      // reached speed (v <= v_wanted) in m/s
a = u * a_f;                                                                           // acceleration in m/s^2
t_c = s_c / v;                                                                         // duration of travel with constant speed in s
t_total = 2 * t_a + t_c;                                                               // total duration in s

// preparations:
t_c_ms = floor(t_c * 1000);
t_c_us = t_c * 1000000;
delta_f = a_f * delta_t;
duty = (duty_percent << res) * 0.01;
}

void start() {
// ramp-up
uint32_t t_c_start = 0;
uint32_t t_c_duration = 0;
delta_t_us = delta_t_ms * 1000;
f = f_min;
n = 0;
t_n_us = micros();
while (n < N) {
n += 1;
analogWriteFrequency(pwm_pin, f);
analogWrite(pwm_pin, duty);
f += delta_f;
// calculate next time stamp:
t_n_us += delta_t_us;
// needed waiting time = next time stamp minus last time stamp:
t_delay_us = t_n_us - micros();
delayMicroseconds(t_delay_us);
}

// travel with constant speed
analogWriteFrequency(pwm_pin, f_wanted);
analogWrite(pwm_pin, duty);
t_c_start = micros();
if (t_c_ms > 0) {
delay(t_c_ms - 1);
}
t_c_duration = micros() - t_c_start;

while (t_c_duration < t_c_us) {                                                        // in case delay() is not accurate compensate with millis()
t_c_duration = micros() - t_c_start;
}

// ramp-down
f = f_wanted - delta_f;
n = 0;
t_n_us = micros();
while (n < N) {
n += 1;
analogWriteFrequency(pwm_pin, f);
analogWrite(pwm_pin, duty);
f -= delta_f;
// calculate next time stamp:
t_n_us += delta_t_us;
// needed waiting time = next time stamp minus last time stamp:
t_delay_us = t_n_us - micros();
delayMicroseconds(t_delay_us);
}
analogWrite(pwm_pin, 0);
}``````

The sketch automatically compensates for the inaccuracies of the generated frequencies. It's also possible to use it to drive DDS chips such as the AD9850 which would result in something similar to the UKCNC Pulse Train Hat (there is a nice online calculator for the exact output frequencies of the AD9850 and other DDS chips).

I tested the sketch on my stepper driven linear rail to check for inconsistencies: I can mesaure tiny variations in repeats and also small inaccuracies in the 1/100mm range with the values in above code. I wanted to change the values in line 10 and 11 to something more like delta_t_ms = 0.1 and a_f_max_ms = 50.0, but that did not work as the PWM frequency did not update fast enough.

If glitch-free PWM transitions are possible it might be worth to make a single step accurate ramp library with user selectable choice of internal PWM and/or any number of external DDS boards.

EDIT1: I included the counting code of tni in #18: with s_wanted_cm = 8.0 I should see 32.000 steps, but the counted number varies from that which explains the mesaured inaccuracies. Is there a way to make it accurate at least in regard to repeatability?

EDIT2: analogWrite(pwm_pin, 0) at the end of my code seems to not stop PWM completely, that could also be a reason why I get more steps than I should...

Last edited:
EDIT: the problem I have seems related to a more basic problem, please see below #22

I made some tests with the above DMA counter and also with the FreqCount library. I improved my ramp calculations to allow singe pulse accurate positioning of my steppers. But if I let the teensy repeat the code I get only the same result from the second time on: the first time I always get 2 more pulses. Did I overlook something to do with initialization of PWM and/or DMA? Here is the code:
Code:
``````#include <math.h>
#include <DMAChannel.h>

// input values:
uint16_t settle = 100;                          // delay time to let the hardware settle
uint8_t pwm_pin = 22; // FTM0_CH0 (ALT4)        // PWM output pin
uint8_t res = 12;                               // PWM resolution
const byte duty_percent = 50;                   // 50% duty cycle
double f_min = 5000.0;                          // start frequency in steps/s
double delta_t_ms = 0.2;                        // ramp stage duration in ms
double a_f_max_ms = 500.0;                      // max possible frequency increment per ms in steps/ms
double steps_R = 200.0;                         // steps per motor revolution
double microsteps = 16.0;                       // microstepping in microsteps per step

// timing belt or rack & pinion:
double diameter_mm = 40.0;                      // diameter of pinion in mm
double ratio = 1.0;                             // gear ratio (> 1 = slower speed)

double threadpitch_mm = 8;                      // thread pitch in mm: set to zero if pinion driven

// desired:
double s_wanted_cm = 8.0;                       // desired travel in cm
double v_wanted_cms = 15.0;                     // desired speed in cm/s
double s_c_min = 0.001;                         // minimum travel with constant speed in m

double delta_t;                                 // ramp stage duration (derived from delta_t_ms) in s
double delta_t_us;                              // ramp stage duration (derived from delta_t_ms) in µs
double delta_f;                                 // frequency increment per ramp stage
double a_f_max;                                 // max possible frequency increment per ms (derived from a_f_max_ms) in steps/ms
double s_wanted;                                // desired travel (derived from s_wanted_cm) in m
double v_wanted;                                // desired speed (derived from v_wanted_cms) in m/s
double a_f;                                     // actual acceleration in steps/s^2
double t_a;                                     // duration of acceleration in s
double t_c;                                     // duration of travel with constant speed v_wanted in s
uint32_t t_c_ms;                                // duration of travel with constant speed v_wanted in ms
uint32_t t_c_us;                                // duration of travel with constant speed v_wanted in µs
double N;                                       // total number of ramp stages
double n;                                       // actual ramp stage
double u;                                       // conversion factor frequency to speed v/f in m/steps
double f;                                       // actual frequency in steps/s
double f_wanted;                                // desired frequency in steps/s
double t_n_us;                                  // time stamp of ramp stage in µs
double t_delay_us;                              // ramp stage duration minus computing time in µs

double t_total;                                 // total time = 2 * t_a + t_c
double s_c;                                     // travel with constant speed in m
double a;                                       // acceleration in m/s^2
double v;                                       // achievable max speed in m/s
double s_a;                                     // travel of ramp in m
double s_a_pulses;                              // travel of ramp in pulses

uint32_t ftmClock = F_BUS;                      // PWM source frequency
double timer_res = 0.000001;                    // Aufloesung des Timers in s
uint8_t prescale;                               // smallest possible divider will be calced
uint16_t duty;                                  // duty cicle value

DMAChannel dma;
uint8_t dma_dummy_src = 0;
uint8_t dma_dummy_dest = 0;

void setup() {
Serial.begin(115200);
pinMode(pwm_pin, OUTPUT);
analogWriteResolution(res);

init();

dma.disable();
dma.source(dma_dummy_src);
dma.destination(dma_dummy_dest);
dma.transferCount(0x7fffu); // maximum supported value is 15 counter bits
dma.triggerAtHardwareEvent(DMAMUX_SOURCE_FTM0_CH0);

// enable DMA trigger for FTM0 channel 0
FTM0_C0SC |= FTM_CSC_CHIE | FTM_CSC_DMA;
dma.enable();

start();

uint32_t pulses = 0x7fffu - dma.TCD->CITER;
Serial.println(pulses);
}

void loop() {
setup();
delay(300);
}

double calc_freq_dds(double freq) {
return (double) 125000000 * uint32_t(freq * 4294967296 / 125000000 + 0.5) / 4294967296; // 125Mhz = DDS clock frequency of AD9850
}

double calc_freq(double freq) {
uint16_t mod = calc_mod(freq);
return (double)(ftmClock >> prescale) / (double)(mod + 1);
}

uint16_t calc_mod(double freq) {
for (prescale = 0; prescale < 7; prescale++) {
float minfreq = (float)(ftmClock >> prescale) / 65536.0f;
if (freq >= minfreq) break;
}
return (float)(ftmClock >> prescale) / ((float)freq) - 0.5f;
}

double simul_ramp(double f_min, double a_f, double delta_t, uint32_t N) {
// simulation of ramp in order to obtain accurate travel

double delta_f = a_f * delta_t;
double f = f_min;
double s_f = 0;                                                       // travel / u, where u = v / f
uint32_t n = 0;
while (n < N) {
s_f += floor(floor(delta_t / timer_res) * timer_res * calc_freq(f));
// s_f += calc_freq(f) * delta_t;                                   // old version of formula
f += delta_f;
n += 1;
}
return (double) u * s_f;
}

void init() {
delta_t = delta_t_ms / 1000;                                          // time increment in s
a_f_max = a_f_max_ms * 1000;                                          // frequency increment in steps/s^2
s_wanted = s_wanted_cm / 100;                                         // desired travel in m
v_wanted = v_wanted_cms / 100;                                        // desired speed in m/s

u = (PI * (diameter_mm / 1000) / ratio) / (steps_R * microsteps);   // in m/steps
}
else {
u = threadpitch_mm / 1000 / (steps_R * microsteps);                 // in m/steps
}

// calculations with the presumption that speed is the limiting factor
f_wanted = v_wanted / u;                                              // in steps/s

// corrections for frequencys
f_wanted = calc_freq(f_wanted);                                       // use calc_freq_dds() if DDS chips generate the frequencys
f_min = calc_freq(f_min);                                             // use calc_freq_dds() if DDS chips generate the frequencys

a_f = a_f_max;                                                        // in steps/s^2
t_a = (f_wanted - f_min) / a_f;                                       // in s
N = floor(t_a / delta_t) + 1;

// correction of a_f in order to obtain equal ramp stages of f for N ramp stages of delta_t
a_f = (f_wanted - f_min) / (N * delta_t);

// re-calculation with new a_f
t_a = N * delta_t;

// predict minimum needed travel to reach v_wanted
// s_a = u * a_f * ((N - 1) / (2 * N)) * (t_a * t_a) + u * f_min * t_a;    // in m     // formula if no simulation is needed
s_a = simul_ramp(f_min, a_f, delta_t, N);

// in case s_wanted is the limiting factor
if (2 * s_a + s_c_min > s_wanted) {

// calculations with the presumption that travel is the limiting factor
// s_wanted/2 is limiting the duration of the ramp
a_f = a_f_max;
s_a = s_wanted / 2 - s_c_min;
t_a = sqrt((2 * s_a / (u * a_f)) + ((f_min / a_f) * (f_min / a_f))) - (f_min / a_f);

// recalculate max possible speed
f_wanted = calc_freq(a_f * t_a);                                                     // use calc_freq_dds() if DDS chips generate the frequencys
// f_wanted = a_f * t_a;                                                             // formula if no simulation is needed

// re-calculate number of ramp stages
N = floor(t_a / delta_t) + 1;

// correction of a_f in order to obtain equal ramp stages of f for N ramp stages of delta_t
a_f = (f_wanted - f_min) / (N * delta_t);

// re-calculation with new a_f
t_a = N * delta_t;

// re-predict travel
s_a = simul_ramp(f_min, a_f, delta_t, N);                                            // delete this line if no simulation is needed
}

s_c = s_wanted - 2 * s_a;                                                              // travel with constant speed in m
v = u * f_wanted;                                                                      // reached speed (v <= v_wanted) in m/s
a = u * a_f;                                                                           // acceleration in m/s^2
t_c = s_c / v;                                                                         // duration of travel with constant speed in s
t_total = 2 * t_a + t_c;                                                               // total duration in s

// preparations:
delta_f = a_f * delta_t;
duty = (duty_percent << res) * 0.01;
s_a_pulses = s_a / u;
}

void start() {
// ramp-up
uint32_t t_c_start = 0;
uint32_t t_c_duration = 0;
delta_t_us = delta_t_ms * 1000;
f = f_min;
n = 0;
t_n_us = micros();
while (n < N) {
analogWriteFrequency(pwm_pin, f);
analogWrite(pwm_pin, duty);
n += 1;
// calculate next time stamp:
t_n_us += (ceil((floor((delta_t_us / (1 / f * 1000000))) * (1 / f * 1000000))));
// t_n_us += delta_t_us;
f += delta_f;
// needed waiting time = next time stamp minus last time stamp:
t_delay_us = t_n_us - micros();
delayMicroseconds(t_delay_us);
}
double s_a_pulses_counted = 0x7fffu - dma.TCD->CITER;

// travel with constant speed
t_c_start = micros();
analogWriteFrequency(pwm_pin, f_wanted);
analogWrite(pwm_pin, duty);

t_c = t_c - (((s_a_pulses_counted - s_a_pulses) * 2) / f_wanted);
t_c_ms = floor(t_c * 1000);
t_c_us = t_c * 1000000;

f = f_wanted - delta_f;
n = 0;
if (t_c_ms > 0) {
delay(t_c_ms - 1);
}
t_c_duration = micros() - t_c_start;
delayMicroseconds(t_c_us - t_c_duration);

// ramp-down
t_n_us = micros();
while (n < N) {
analogWriteFrequency(pwm_pin, f);
analogWrite(pwm_pin, duty);
n += 1;
// calculate next time stamp:
t_n_us += (ceil((floor((delta_t_us / (1 / f * 1000000))) * (1 / f * 1000000))));
// t_n_us += delta_t_us;
f -= delta_f;
// needed waiting time = next time stamp minus last time stamp:
t_delay_us = t_n_us - micros();
delayMicroseconds(t_delay_us);
}
analogWrite(pwm_pin, 0);
}``````

Last edited:
I made some basic tests with analogWriteFrequency() and found unexpected output at the LED connected (with resistor) to pin 22 of my teensy 3.6, please verify the following 4 sketches:

Code:
``````void setup() {
pinMode(22, OUTPUT);
analogWriteResolution(12);

analogWriteFrequency(22, 4);    //  1st time:    blink 4x    ->    BROKEN
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);

delay(2000);

analogWriteFrequency(22, 4);    //  2nd time:    blink 4x    ->    OK
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);
}

void loop() {}``````

Code:
``````void setup() {
pinMode(22, OUTPUT);
analogWriteResolution(12);

analogWriteFrequency(22, 4);    //  1st time:    blink 4x    ->    BROKEN
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);

delay(2000);

analogWriteFrequency(22, 2);    //  2nd time:    blink 2x    ->    OK
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);
}

void loop() {}``````

Code:
``````void setup() {
pinMode(22, OUTPUT);
analogWriteResolution(12);

analogWriteFrequency(22, 2);    //  1st time:    blink 2x    ->    BROKEN
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);

delay(2000);

analogWriteFrequency(22, 4);    //  2nd time:    blink 4x    ->    BROKEN
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);
}

void loop() {}``````

Code:
``````void setup() {
pinMode(22, OUTPUT);
analogWriteResolution(12);

analogWriteFrequency(22, 4);    //  1st time:    blink 4x    ->    OK
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);

//  delay(2000);

analogWriteFrequency(22, 2);    //  2nd time:    blink 2x    ->    BROKEN
analogWrite(22, 2000);
delay(1000);
analogWrite(22, 0);
}

void loop() {}``````

Your 'analogWrite()' updates 'FTM0_C0V'. This register is double-buffered and synchronized at the end of the PWM cycle / timer counter overflow. The initial 'FTM0_C0V' value is zero, so you miss the first PWM cycle.

'analogWrite(22, 0)' doesn't change 'FTM0_C0V'. It changes the pin mode instead, forcing the pin to low.

Your 'analogWrite()' updates 'FTM0_C0V'. This register is double-buffered and synchronized at the end of the PWM cycle / timer counter overflow. The initial 'FTM0_C0V' value is zero, so you miss the first PWM cycle. 'analogWrite(22, 0)' doesn't change 'FTM0_C0V'. It changes the pin mode instead, forcing the pin to low.

Thanks for the hints, I got it working now:

Code:
``````uint8_t LED_PIN = 22;

double duty;
double f_max = 3;
uint8_t res = 12;
uint8_t duty_percent = 50;

void setup() {
pinMode(LED_PIN, OUTPUT);
analogWriteResolution(res);
duty = (duty_percent << res) / f_wanted * 0.01;
}

void loop() {
while (1);
}

analogWriteFrequency(LED_PIN, 1);
analogWrite(LED_PIN, duty);
if (FTM0_C0V == 0) delay(1000);              // add 1 cycle if first time
delay(4000);

analogWriteFrequency(LED_PIN, 2);
delay(4000);

analogWriteFrequency(LED_PIN, f_max);
delay(4000);

analogWriteFrequency(LED_PIN, 2);
delay(4000);

analogWriteFrequency(LED_PIN, 1);
delay(3500);                                // cut 0.5 cycle if last time
pinMode(LED_PIN, OUTPUT);
digitalWrite(LED_PIN, LOW);
}``````

BTW how can I configure your DMA counter to count pulses from an input pin?

Last edited:
The pins are grouped in 5 ports, port A - port E. Each port can trigger DMA transfers. The schematic has the pin / port mapping; e.g. pin 22 is PTC1 (pin 1 of port C).

Code:
``````    volatile uint32_t *pin_config = portConfigRegister(pin_nr);
*pin_config |= PORT_PCR_IRQC(0b0010); // DMA on falling edge; use 0b0001 for rising edge
...
dma.triggerAtHardwareEvent(DMAMUX_SOURCE_PORTC);``````

'portConfigRegister()' is the pin control register PORTx_PCRn.

Look at the manual, '12.5.1 Pin Control Register n (PORTx_PCRn)' and '23.1.1 DMA MUX request sources'.

\\

You could also use a FTM timer in input capture mode as DMA trigger, but setup is more complex, e.g:
https://github.com/tni/teensy-samples/blob/master/input_capture_dma.ino#L83