Using teensy 4.1 to generate square wave at precise frequency

jaze

New member
I’m planning on using a 4.1 to generate a clock signal that will be used to clock a high voltage shift register. The high voltage shift register is driving a piezo element at resonance, which means that the signal needs to have a precise frequency. So, I’d like to use the 4.1 to generate a 32 MHz square wave with about 100 kHz of precision. Is this possible?
 
How to generate a 32MHz square wave on pin 1:
analogWriteFrequency(1, 32000000); // 32 MHz analogWrite(1, 128); // 128 / 256 = 50% duty cycle
 
Thanks for the reply. I’ll explain my reasoning so you can show me what I’m missing. So let’s say we can get 32 MHz, the period for that is 1/32 us. The cpu clock on the 4.1 is 600 MHz. So the cpu clock period is 1/600 us. So let’s say I add two clock cycles, one for high and one for low, to the 32 MHz square wave. The frequency of the resulting signal would be (1/32+2/600)^-1 MHZ which ends up being 28.9 MHz. So the step size going down in this case is 3.2 MHz, which is not precise enough for my application. I’m not very familiar with embedded systems so I’m assuming my logic is off. Can you explain/point me in the right direction?
 
How to generate a 32MHz square wave on pin 1:
analogWriteFrequency(1, 32000000); // 32 MHz analogWrite(1, 128); // 128 / 256 = 50% duty cycle
Will that actually generate a 32-MHz Signal. I thought all the PWM frequencies had to be 150MHz divided by an integer.
It's getting late and there are rumbles of thunder here, so I'm going to shut down the computers. I'll try some test code tomorrow.
 
Will that actually generate a 32-MHz Signal. I thought all the PWM frequencies had to be 150MHz divided by an integer.
It's getting late and there are rumbles of thunder here, so I'm going to shut down the computers. I'll try some test code tomorrow.
Probably not accurate, the basic Teensyduino code is pretty rigid and based around powers-of-2 resolution. A 528MHz CPU clock would run the bus clock at 132MHz, which could be divided by 4 to give 33MHz...

A while ago I did discover FlexPWM supports fractional delay logic as a workaround to avoid modifying the bus clock. But even without that it's possible to get pretty close to an arbitrary frequency if it's acceptable to deviate slightly from a 50% duty cycle.
 
Another possibility is to use a programmable clock generator chip. Or a DDS tone generator like the AD9850? That's going to have more than enough precision and low jitter too.
 
Also, it might be worth investigating the internal PLLs. Certainly the one used for I²S / TDM is capable of generating 24.576MHz (i.e. 256x 96kHz), and I'd expect it'd go up to 32MHz. You'd need to check the Reference Manual...
 
Also, it might be worth investigating the internal PLLs. Certainly the one used for I²S / TDM is capable of generating 24.576MHz (i.e. 256x 96kHz), and I'd expect it'd go up to 32MHz. You'd need to check the Reference Manual...
It turns out that you can generate a 32MHz output with accuracy as good as the clock on the Teensy. The key to getting exactly 32MHz is to set the F_CPU to a frequency that, when divided by 4, is an even multiple of 32MHz. I picked 768MHz, but 512MHz would probably work.

To set the F_CPU to a frequency which is not one of the standard menu frequencies, I added a clone of the set_arm_clock function that is in clockspeed.c.

WARNING! I'm not sure what effect setting the clock after startup will have on other peripherals. If the peripherals behave nicely and pay attention to F_CPU_ACTUAL and F_BUS_ACTUAL, you'll probably be OK. At least it works for USB output---but that has its own clock generator.

Here's the simple demo code that generates 31.99993MHz according to the frequency counter in my Siglent oscilloscope:

Code:
// 32MHZ clock generator   MJB  9/2/2024
const int fpin = 1;

void setup() {
  // Set the Arm CLOCK  to 768MHz, which happens to be 24 * 32MHz
  // thus we end up with a bus clock of 192MHz that is an even multiple of 32MHz.

  Xset_arm_clock(768000000);  // Call a duplicate of set_arm_clock to avoid mods to cores
  delay(1);  // Does the clock need some settling time??
  Serial.begin(9600);
  delay(2000); // wait a bit then see if USB is working

  Serial.printf("F_CPU_ACTUAL:  %ld   F_BUS_ACTUAL:  %ld\n", F_CPU_ACTUAL,F_BUS_ACTUAL);
  analogWriteFrequency(fpin, 32000000); // 32 MHz
  analogWriteRes(3); // we can only have 3-bits of resolution for the fast clock output
  analogWrite(fpin, 4); // 4/8 for 50% duty cycle
}

void loop() {
  // put your main code here, to run repeatedly:
}

Here's the clone of set_arm_clock.


Code:
//  A clone of the
#include <stdint.h>
#include "imxrt.h"
#include "wiring.h"
#include "debug/printf.h"

// A brief explanation of F_CPU_ACTUAL vs F_CPU
//  https://forum.pjrc.com/threads/57236?p=212642&viewfull=1#post212642
//volatile uint32_t F_CPU_ACTUAL = 396000000;  already defined in startup code
//volatile uint32_t F_BUS_ACTUAL = 132000000;

// Define these to increase the voltage when attempting overclocking
// The frequency step is how quickly to increase voltage per frequency
// The datasheet says 1600 is the absolute maximum voltage.  The hardware
// can actually create up to 1575.  But 1300 is the recommended limit.
//  (earlier versions of the datasheet said 1300 was the absolute max)
#define OVERCLOCK_STEPSIZE  28000000
#define OVERCLOCK_MAX_VOLT  1575


// stuff needing wait handshake:
//  CCM_CACRR  ARM_PODF
//  CCM_CBCDR  PERIPH_CLK_SEL
//  CCM_CBCMR  PERIPH2_CLK_SEL
//  CCM_CBCDR  AHB_PODF
//  CCM_CBCDR  SEMC_PODF

uint32_t Xset_arm_clock(uint32_t frequency)
{
    uint32_t cbcdr = CCM_CBCDR; // pg 1021
    uint32_t cbcmr = CCM_CBCMR; // pg 1023
    uint32_t dcdc = DCDC_REG3;

    // compute required voltage
    uint32_t voltage = 1150; // default = 1.15V
    if (frequency > 528000000) {
        voltage = 1250; // 1.25V
#if defined(OVERCLOCK_STEPSIZE) && defined(OVERCLOCK_MAX_VOLT)
        if (frequency > 600000000) {
            voltage += ((frequency - 600000000) / OVERCLOCK_STEPSIZE) * 25;
            if (voltage > OVERCLOCK_MAX_VOLT) voltage = OVERCLOCK_MAX_VOLT;
        }
#endif
    } else if (frequency <= 24000000) {
        voltage = 950; // 0.95
    }

    // if voltage needs to increase, do it before switch clock speed
    CCM_CCGR6 |= CCM_CCGR6_DCDC(CCM_CCGR_ON);
    if ((dcdc & DCDC_REG3_TRG_MASK) < DCDC_REG3_TRG((voltage - 800) / 25)) {
        printf("Increasing voltage to %u mV\n", voltage);
        dcdc &= ~DCDC_REG3_TRG_MASK;
        dcdc |= DCDC_REG3_TRG((voltage - 800) / 25);
        DCDC_REG3 = dcdc;
        while (!(DCDC_REG0 & DCDC_REG0_STS_DC_OK)) ; // wait voltage settling
    }

    if (!(cbcdr & CCM_CBCDR_PERIPH_CLK_SEL)) {
        printf("need to switch to alternate clock during reconfigure of ARM PLL\n");
        const uint32_t need1s = CCM_ANALOG_PLL_USB1_ENABLE | CCM_ANALOG_PLL_USB1_POWER |
            CCM_ANALOG_PLL_USB1_LOCK | CCM_ANALOG_PLL_USB1_EN_USB_CLKS;
        uint32_t sel, div;
        if ((CCM_ANALOG_PLL_USB1 & need1s) == need1s) {
            printf("USB PLL is running, so we can use 120 MHz\n");
            sel = 0;
            div = 3; // divide down to 120 MHz, so IPG is ok even if IPG_PODF=0
        } else {
            printf("USB PLL is off, use 24 MHz crystal\n");
            sel = 1;
            div = 0;
        }
        if ((cbcdr & CCM_CBCDR_PERIPH_CLK2_PODF_MASK) != CCM_CBCDR_PERIPH_CLK2_PODF(div)) {
            // PERIPH_CLK2 divider needs to be changed
            cbcdr &= ~CCM_CBCDR_PERIPH_CLK2_PODF_MASK;
            cbcdr |= CCM_CBCDR_PERIPH_CLK2_PODF(div);
            CCM_CBCDR = cbcdr;
        }
        if ((cbcmr & CCM_CBCMR_PERIPH_CLK2_SEL_MASK) != CCM_CBCMR_PERIPH_CLK2_SEL(sel)) {
            // PERIPH_CLK2 source select needs to be changed
            cbcmr &= ~CCM_CBCMR_PERIPH_CLK2_SEL_MASK;
            cbcmr |= CCM_CBCMR_PERIPH_CLK2_SEL(sel);
            CCM_CBCMR = cbcmr;
            while (CCM_CDHIPR & CCM_CDHIPR_PERIPH2_CLK_SEL_BUSY) ; // wait
        }
        // switch over to PERIPH_CLK2
        cbcdr |= CCM_CBCDR_PERIPH_CLK_SEL;
        CCM_CBCDR = cbcdr;
        while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait
    } else {
        printf("already running from PERIPH_CLK2, safe to mess with ARM PLL\n");
    }

    // TODO: check if PLL2 running, can 352, 396 or 528 can work? (no need for ARM PLL)

    // DIV_SELECT: 54-108 = official range 648 to 1296 in 12 MHz steps
    uint32_t div_arm = 1;
    uint32_t div_ahb = 1;
    while (frequency * div_arm * div_ahb < 648000000) {
        if (div_arm < 8) {
            div_arm = div_arm + 1;
        } else {
            if (div_ahb < 5) {
                div_ahb = div_ahb + 1;
                div_arm = 1;
            } else {
                break;
            }
        }
    }
    uint32_t mult = (frequency * div_arm * div_ahb + 6000000) / 12000000;
    if (mult > 108) mult = 108;
    if (mult < 54) mult = 54;
    printf("Freq: 12 MHz * %u / %u / %u\n", mult, div_arm, div_ahb);
    frequency = mult * 12000000 / div_arm / div_ahb;

    printf("ARM PLL=%x\n", CCM_ANALOG_PLL_ARM);
    const uint32_t arm_pll_mask = CCM_ANALOG_PLL_ARM_LOCK | CCM_ANALOG_PLL_ARM_BYPASS |
        CCM_ANALOG_PLL_ARM_ENABLE | CCM_ANALOG_PLL_ARM_POWERDOWN |
        CCM_ANALOG_PLL_ARM_DIV_SELECT_MASK;
    if ((CCM_ANALOG_PLL_ARM & arm_pll_mask) != (CCM_ANALOG_PLL_ARM_LOCK
      | CCM_ANALOG_PLL_ARM_ENABLE | CCM_ANALOG_PLL_ARM_DIV_SELECT(mult))) {
        printf("ARM PLL needs reconfigure\n");
        CCM_ANALOG_PLL_ARM = CCM_ANALOG_PLL_ARM_POWERDOWN;
        // TODO: delay needed?
        CCM_ANALOG_PLL_ARM = CCM_ANALOG_PLL_ARM_ENABLE
            | CCM_ANALOG_PLL_ARM_DIV_SELECT(mult);
        while (!(CCM_ANALOG_PLL_ARM & CCM_ANALOG_PLL_ARM_LOCK)) ; // wait for lock
        printf("ARM PLL=%x\n", CCM_ANALOG_PLL_ARM);
    } else {
        printf("ARM PLL already running at required frequency\n");
    }

    if ((CCM_CACRR & CCM_CACRR_ARM_PODF_MASK) != (div_arm - 1)) {
        CCM_CACRR = CCM_CACRR_ARM_PODF(div_arm - 1);
        while (CCM_CDHIPR & CCM_CDHIPR_ARM_PODF_BUSY) ; // wait
    }

    if ((cbcdr & CCM_CBCDR_AHB_PODF_MASK) != CCM_CBCDR_AHB_PODF(div_ahb - 1)) {
        cbcdr &= ~CCM_CBCDR_AHB_PODF_MASK;
        cbcdr |= CCM_CBCDR_AHB_PODF(div_ahb - 1);
        CCM_CBCDR = cbcdr;
        while (CCM_CDHIPR & CCM_CDHIPR_AHB_PODF_BUSY); // wait
    }

    uint32_t div_ipg = (frequency + 149999999) / 150000000;
    if (div_ipg > 4) div_ipg = 4;
    if ((cbcdr & CCM_CBCDR_IPG_PODF_MASK) != (CCM_CBCDR_IPG_PODF(div_ipg - 1))) {
        cbcdr &= ~CCM_CBCDR_IPG_PODF_MASK;
        cbcdr |= CCM_CBCDR_IPG_PODF(div_ipg - 1);
        // TODO: how to safely change IPG_PODF ??
        CCM_CBCDR = cbcdr;
    }

    //cbcdr &= ~CCM_CBCDR_PERIPH_CLK_SEL;
    //CCM_CBCDR = cbcdr;  // why does this not work at 24 MHz?
    CCM_CBCDR &= ~CCM_CBCDR_PERIPH_CLK_SEL;
    while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait

    F_CPU_ACTUAL = frequency;
    F_BUS_ACTUAL = frequency / div_ipg;
    scale_cpu_cycles_to_microseconds = 0xFFFFFFFFu / (uint32_t)(frequency / 1000000u);

    printf("New Frequency: ARM=%u, IPG=%u\n", frequency, frequency / div_ipg);

    // if voltage needs to decrease, do it after switch clock speed
    if ((dcdc & DCDC_REG3_TRG_MASK) > DCDC_REG3_TRG((voltage - 800) / 25)) {
        printf("Decreasing voltage to %u mV\n", voltage);
        dcdc &= ~DCDC_REG3_TRG_MASK;
        dcdc |= DCDC_REG3_TRG((voltage - 800) / 25);
        DCDC_REG3 = dcdc;
        while (!(DCDC_REG0 & DCDC_REG0_STS_DC_OK)) ; // wait voltage settling
    }

    return frequency;
}

Here's a question perhaps only Paul can answer: Why is the overclock step size in clockspeed.c 28MHZ and why aren't more of the CPU clock speeds nice power-of-2 MHz values such as 128, 256, 512MHZ? My guess is that the somewhat odd existing frequencies divide down nicely to 44.1KHz or some other audio-related frequencies.
 
Why is the overclock step size in clockspeed.c 28MHZ

I just made that up.

But maybe you're reading more into the name than it really means? It's not any sort of limit or step size of how you can set the clock speed. Instead it's a scale factor for deciding how much to "over voltage" the CPU core. Every additional 28 MHz the code will add another 25mV to the CPU voltage (the variable DC-DC power supply does indeed have minimum step size of 25mV). This 28 MHz decision isn't based on any specs. 25mV steps is the thing that is fixed in the hardware. 28 MHz is just a decision about when to take that 25mV step. It's only my guesswork and gut feeling based on experimenting. And since that code was written NXP reviewed their datasheet, which previously said the max voltage was 1.6V. So perhaps this overclocking code is more risky than it seemed at the time based on NXP's earlier specs?


and why aren't more of the CPU clock speeds nice power-of-2 MHz values such as 128, 256, 512MHZ?

Again, mostly just arbitrary decisions I made early on.

The some of that thinking revolved around a desire to limit the number of peripheral clock frequencies in common use. Remember, this was mostly decided in late 2018 and early 2019, before a lot of experience with this new chip. Based on experience from the older Kinetis chips (which in hindsight really isn't as relevant now) quite a bit of frustration came from clock frequencies where the CPU was faster but due to the clock dividers the peripherals ran slower. Since we had to support 600 and 528 MHz, where the peripherals run at 150 and 132 MHz (the ratio must be 1, 2, 3, or 4) so the other non-overclock frequencies are multiples of 150 and 132 MHz.
 
why aren't more of the CPU clock speeds nice power-of-2 MHz values such as 128, 256, 512MHZ?

Two reasons really.

1: NXP didn't rate the chip at such a speed.

2: So far there really hasn't been any compelling need for under-clocking to those specific speeds.
 
Last edited:
Why clone set_arm_clock() instead of just using the existing function?

Regarding overclock step size, we actually ran into some trouble with the SDRAM devboards because they use industrial temperature, 500MHz rated chips instead of 600MHz and the default voltage settings eventually fail (causing baffingly weird execution behaviour) at both 600MHz and 528MHz. These were the modifications that eventually got it running stable at all available frequencies up to 600MHz:
Code:
#define OVERCLOCK_STEPSIZE  25000000
#define OVERCLOCK_MAX_VOLT  1300

    // compute required voltage
    uint32_t voltage = 1150; // default = 1.15V
    if (frequency > 432000000) {
#if defined(OVERCLOCK_STEPSIZE) && defined(OVERCLOCK_MAX_VOLT)
      voltage += ((frequency - 432000000) / OVERCLOCK_STEPSIZE) * 25;
      if (voltage > OVERCLOCK_MAX_VOLT) voltage = OVERCLOCK_MAX_VOLT;
#else
      voltage = 1225;
#endif
    } else if (frequency <= 24000000) {
        voltage = 950; // 0.95
    }
It also turned out to be possible to distinguish between the 500/600MHz rated MCUs by checking bits 16-17 in HW_OCOTP_CFG3, but by that time I'd already used the sledgehammer approach of "branding" my SDRAM boards by putting a signature in the flash security registers.
 
Last edited:
Why clone set_arm_clock() instead of just using the existing function?
The
I cloned the file because the original has no header that I could find and trying to call it caused compilation errors. I tried a few work-arounds that didn’t work to allow me to call the function. A major problem is that the file defines the storage for F_CPU_ACTUAL, so you can’t call it twice. Note that I have commented out the storage allocation lines in the clone.
 
It should work, the only possible catch would be if you were trying to call it from c++ code without the right declaration:
extern "C" uint32_t set_arm_clock(uint32_t frequency);
 
Back
Top