Software detail

Yerffog

Member
I am use to knowing what is going on behind the C code.
Is there a compiler that is not as noddy as the Arduino pack?

The 4.1 is a nice fast processor but the very slow operation of the compiler currently makes it pretty useless for me.
As an example I am trying to modulate the output of the pwm which is running at 38khz. Stopping and restarting it is currently taking several milliseconds. I have no idea what is going on below the surface. It is like writing code in basic,. No debug makes it just one big time consuming guessing game.
Do I go back to microchip or struggle on?
 
If you can say more about what you are trying to do, and post your code, you will find that people will try to help you.

If you're using the older Arduino 1.8.x IDE, you'll find that builds on the newer 2.3.x are faster.

If you've never worked with Arduino before, yes, there is a basic Arduino API for most things, including PWM. If you want to do something that is not supported by the API, or you're not satisfied with the behavior via the API, you can go around the API and do things yourself at a lower level, just like with any microcontroller. There is a big learning curve to the T4.x processor family, but you'll find many, many libraries and examples and discussions on this form.

Working without a debugger takes some getting used to if you haven't done it before.
 
Would like to help you, but without seeing your code or knowing specifically what you're actually doing and how you're observing the result as taking several milliseconds, can't do much. Please show us your code, or if it's a huge or secret program, please take just the part that's running too slow and post it here as a sample. Best if you give a complete program I or anyone else can load onto a Teensy 4.1 to reproduce the problem. We're much better at solving problems when they're reproducible!
 
As a quick check on the time taken to modulate PWM, I wrote this tiny program which alters the duty cycle every 0.1ms.

Code:
void setup() {
  pinMode(1, OUTPUT);
  analogWriteFrequency(2, 38000);
}

void loop() {
  digitalToggleFast(1);
  analogWrite(2, 50);
  delayMicroseconds(100);
  digitalToggleFast(1);
  analogWrite(2, 200);
  delayMicroseconds(100);
}

These are the waveforms my oscilloscope sees. The PWM hardware is double buffered, so when you write a new value it always complete the current cycle and the new value takes effect on the next cycle. At 38kHz, one cycle should be a worst case delay of 26.3us.


file.png
 
If you can say more about what you are trying to do, and post your code, you will find that people will try to help you.

If you're using the older Arduino 1.8.x IDE, you'll find that builds on the newer 2.3.x are faster.

If you've never worked with Arduino before, yes, there is a basic Arduino API for most things, including PWM. If you want to do something that is not supported by the API, or you're not satisfied with the behavior via the API, you can go around the API and do things yourself at a lower level, just like with any microcontroller. There is a big learning curve to the T4.x processor family, but you'll find many, many libraries and examples and discussions on this form.

Working without a debugger takes some getting used to if you haven't done it before.
 
My first post here. I need to sort out a data upload path. Tomorrow hopefully.
My objective:
Generate a group of 25 38khz bursts some of which are say 700us long with the same sort of gap and others are about 600us long with the same sort of gap. At the end of the 25 groups the signal goes quiet till 20ms has elapsed.
While this is happening I need to vary the start time of the whole 20ms signal by 500us each time. This causes it to start at 20.5 and the 21 and the 22 etc. It repeats after a time shift of 20ms.
At the same time I need to do the same to the 25 pulses in the burst varying them by +/-2.5%.
I have tried this on a pic and it needs more speed. Hence the Teensy 600MHz which has ample speed.
I have written hundreds and hundreds of programs in many types of processors so this is not new to me.
I really need to bypass the Arduino software and get to the peripheral hardware directly. Or find out what these noddy instructions are actually doing.
 
The installed Teensyduino software (coded to direct hardware at hand) goes direct from C/c++ to compiler and linked to a HEX executable for the hardware.

Arduino is nothing but the IDE and API the Teensyduino software follows.

All the CORES code for the ARM hardware is installed and present with the install.

Arduino just provides a uniform way to install the build and compile process for PJRC provided uploader to get the code ready to run.
 
Here is a shot of my current output. I will try in the morning the fast output instruction and some of the code you have uploaded. I will upload my code as well.
Thanks for the help. I will be back soon.
 

Attachments

  • Tek000.png
    Tek000.png
    158.1 KB · Views: 31
Aprt from scrolling other code are there any docs on the peripherals or what the C code does to the Teensy?
 
I will upload my code as well.

Good. Will wait for the code before trying to make sense of that scope screenshot.

But in the meantime, to try answering these latest question.

At the same time I need to do the same to the 25 pulses in the burst varying them by +/-2.5%.
I have tried this on a pic and it needs more speed. Hence the Teensy 600MHz which has ample speed.
I have written hundreds and hundreds of programs in many types of processors so this is not new to me.
I really need to bypass the Arduino software and get to the peripheral hardware directly. Or find out what these noddy instructions are actually doing.

As with all microcontrollers, you have 3 basic ways to do this sort of thing.

1: bitbanging with tight loops
2: bitbanging with interrupt routines
3: leverage special timer hardware

If you go with bitbanging, use digitalWriteFast(pin, value). If the pin is a const and value is HIGH or LOW (both inputs are known to the compiler as constants at compile time) using digitalWriteFast() always compiles to a direct register write. ARM architecture has a ST instruction which depends on 2 registers being loaded with the address and the value to write. For tight loops and relatively simple code the compiler will usually pre-load those registers, so digitalWriteFast() often involves just the single ST instruction. But in a worst case it might involve some other instructions to load the registers ST needs. If you really care about those details, the best thing you can do is check the generated assembly code. It gets written as a .lst file in the temporary folder where Arduino compiles your program. In Arduino IDE, click File > Preferences (or Areduino IDE > Settings if using MacOS) and turn on verbose output during compile. Then you can see the compiler commands and look for the full pathname of the temporary folder. On Windows and MacOS it's usually inside a hidden folder, but once you know the full pathname you can use the tools of those platforms. On Linux it's usually in a folder inside /tmp.

If you go with the interrupt approach, you might try using IntervalTimer.

It sounds like you really want to just access the hardware directly and not use any of the Arduino API or core library functions. You can do that. All the hardware registers are available to use from your program, with the names as published in the reference manual. Download if from the Teensy 4.1 product page. Just click or scroll down to "Technical Information" and it's the first document in the list. You can also get it from NXP's website, but they require registration. The PJRC copy it just a single click for the PDF, and ours has annotations so you can quickly see which pins and other details on Teensy correspond to NXP's rather cryptic pad naming.

But to be frank, if you're just going to do bitbangng, unless you want to toggle many pins at once you're probably just going to waste a lot of time. Just using digitalWriteFast and IntervalTimer will give you access to the native hardware.

You definitely do need to dive into the reference manual if you decide to go with route #3. You'll probably want to focus on the FlexPWM timers, which are chapter 55 stating on page 3091. The good news in FlexPWM is incredibly capable. These timers clock at 150 MHz when the CPU runs at 600 MHz, so assuming you don't prescale the clock you'll get 6.7ns timing resolution. While that's 1/4th the speed of the CPU, the timers generally give results that are independent of software latency like interrupts or bus usage by DMA-based peripherals (eg, USB). The bad news is so much hardware capability comes a lot of info to read and a lot of registers to digest.


Aprt from scrolling other code are there any docs on the peripherals or what the C code does to the Teensy?

You might look at the comments in pwm.c starting on line 101. But generally speaking, the website documenation about PWM doesn't feature a deep dive into the low-level details of the hardware. It's generally meant to allow Arduino-style access without needing to worry about those low level details. Reading the actual code really is the way to figure out what the APIs are actually doing. A lot of effort has gone into keeping that code compact and relatively to read, using the actual register names as documented in the referece manual (eg, not using extra abstraction layers which are supposed to make code easier to read but usually just end up adding a lot of extra complexity to unravel to the actual hardware accessed).
 
Wow! Now that is a lot of very useful information.
Thanks Paul.
Let me digest and learn.
Back with some code soon..ish.
My current scheme is heavily interrupt driven. It should make the timing flexible from a bunch of parameters.
At the moment using pins 3 and 6 it almost feels as if the pwm and timer are interacting. Changing one upsets the other.
 
Working without a debugger takes some getting used to if you haven't done it before.
If you want a debugger and a programming environment that I think you would be more familiar/happy with then I suggest you look at VisualMicro.
It sits on top of the Arduino IDE so you still have all the Arduino (Teensy, ESP32, ARDUINO etc) libraries available but gives a much better environment in which to program and gives debugging capability.
I use it all the time.
In order to be able to use it you would have to install Microsoft Visual Studio, then VisualMicro is installed from within that.
 
At the moment using pins 3 and 6 it almost feels as if the pwm and timer are interacting. Changing one upsets the other.
PWM uses timers. Looking at my excel document: this time will show part of the MUX page (https://github.com/KurtE/TeensyDocuments/blob/master/Teensy4x Pins.xlsx)

1729776667750.png

Shows that pins 3 and 6 both use FLEXPWM for PWM. Chapter 55 in reference manual.
There are multiple(4) FlexPWM objects, pin 3 is on FlexPWM4 and 6 is on FlexPWM2, so they should not be upsetting each other...

Each of these objects have multiple sub-modules.
Now each of these Sub-modules can have 3 IO pins associated with them A, B, X
So pin 3 (FLEXPWM4_PWMB02) is Module 4, sub-module 2 and pin B
Note: if you are using pin 2 it has (FLEXPWM4_PWMA02) - So same module and sub-module, so they will run at the same frequency
but can have different pulse widths...

Note some of the other pins use Quad timers (Chapter 54 in Reference Manual) instead. Different rules and capabilities.
 
One of the many things I've dreamed of making (in my dream world of infinite daily programming hours and zero mundane business stuff) is a DMA-based pulse output library. And also one for capturing pulses. We get these questions pretty regularly where someone wants to do really unusual PWM stuff and sad reality is the FlexPWM learning curve is just too steep, even for experienced programmers who learned on other hardware like 8 bit AVR & PIC. The solution pretty much always ends up as bitbanging, which gets the job done as long as they don't add too much other stuff using interrupts or too much DMA to cause bus latency, but it's kinda sad to know that incredibly capable FlexPWM hardware is just sitting there unused while the CPU toils away at poking the GPIO registers.

I recently did some fiddling with FlexPWM input capture linked to DMA meant to verifying the pulse timing of the many WS2812 libraries. With a different API and access to prescaling that might become useful for a generic pulse input library. As for generic pulse output, coming up with an actually useful API seems like a big challenge.
 
Paul mentions FlexPWM, so I'll add that it can be configured to interrupt on each PWM period, and the duty cycle can be updated, with the new value applying on the next period. If you search the forum or github for eFlexPWM, you will find a library by that name, with two examples. At least one of them runs at 10 kHz, so it's in the same ballpark as your 38 kHz, and I'm sure it would work at that frequency. One example varies the duty cycle as a 60-Hz sine wave, and I was thinking you could achieve your goal either via a state machine or simply an array of duty cycle values to generate the pattern you want. There is a learning curve to the library, but the two examples work out of the box, so at least you're starting from working code.

One more note on the eFlexPWM library, it's built on top of the PWM driver from the NXP SDK.
 
Good. Will wait for the code before trying to make sense of that scope screenshot.

But in the meantime, to try answering these latest question.



As with all microcontrollers, you have 3 basic ways to do this sort of thing.

1: bitbanging with tight loops
2: bitbanging with interrupt routines
3: leverage special timer hardware

If you go with bitbanging, use digitalWriteFast(pin, value). If the pin is a const and value is HIGH or LOW (both inputs are known to the compiler as constants at compile time) using digitalWriteFast() always compiles to a direct register write. ARM architecture has a ST instruction which depends on 2 registers being loaded with the address and the value to write. For tight loops and relatively simple code the compiler will usually pre-load those registers, so digitalWriteFast() often involves just the single ST instruction. But in a worst case it might involve some other instructions to load the registers ST needs. If you really care about those details, the best thing you can do is check the generated assembly code. It gets written as a .lst file in the temporary folder where Arduino compiles your program. In Arduino IDE, click File > Preferences (or Areduino IDE > Settings if using MacOS) and turn on verbose output during compile. Then you can see the compiler commands and look for the full pathname of the temporary folder. On Windows and MacOS it's usually inside a hidden folder, but once you know the full pathname you can use the tools of those platforms. On Linux it's usually in a folder inside /tmp.

If you go with the interrupt approach, you might try using IntervalTimer.

It sounds like you really want to just access the hardware directly and not use any of the Arduino API or core library functions. You can do that. All the hardware registers are available to use from your program, with the names as published in the reference manual. Download if from the Teensy 4.1 product page. Just click or scroll down to "Technical Information" and it's the first document in the list. You can also get it from NXP's website, but they require registration. The PJRC copy it just a single click for the PDF, and ours has annotations so you can quickly see which pins and other details on Teensy correspond to NXP's rather cryptic pad naming.

But to be frank, if you're just going to do bitbangng, unless you want to toggle many pins at once you're probably just going to waste a lot of time. Just using digitalWriteFast and IntervalTimer will give you access to the native hardware.

You definitely do need to dive into the reference manual if you decide to go with route #3. You'll probably want to focus on the FlexPWM timers, which are chapter 55 stating on page 3091. The good news in FlexPWM is incredibly capable. These timers clock at 150 MHz when the CPU runs at 600 MHz, so assuming you don't prescale the clock you'll get 6.7ns timing resolution. While that's 1/4th the speed of the CPU, the timers generally give results that are independent of software latency like interrupts or bus usage by DMA-based peripherals (eg, USB). The bad news is so much hardware capability comes a lot of info to read and a lot of registers to digest.




You might look at the comments in pwm.c starting on line 101. But generally speaking, the website documenation about PWM doesn't feature a deep dive into the low-level details of the hardware. It's generally meant to allow Arduino-style access without needing to worry about those low level details. Reading the actual code really is the way to figure out what the APIs are actually doing. A lot of effort has gone into keeping that code compact and relatively to read, using the actual register names as documented in the referece manual (eg, not using extra abstraction layers which are supposed to make code easier to read but usually just end up adding a lot of extra complexity to unravel to the actual hardware accessed).
Thank Paul,
I have been working on the application today.
It is just about working. On one of my sensors it is a bit slow to respond to the code.
Took your advice a d bit banged. Just about enough adjustment in the counts to get the thing to work.
The code is a bit expanded but it does allow me to adjust the timings to fine tune the output.
The code generates a pulse train that loops every 46ms. It time shifts its start by about 250us each time unitl it has shifted a total of 46us then repeats. This sequence is repeated with a slightly adjusted clock a few times until it tracks in with code out of the sensor.
 

Attachments

  • sketch_oct25.ino
    4.4 KB · Views: 11
  • Tek003.png
    Tek003.png
    171.8 KB · Views: 13
I have come back to try and use the hardware pwm and some timers.
First the PWM:
It starts okay at 38khz on pin 4. However any attempt to change the MS ratio results in a totally wrong ratio.
I have attached code and the scope display.
Not sure how to get the sketch to display here??
 

Attachments

  • pwm test failure.txt
    727 bytes · Views: 12
  • Tek000.jpg
    Tek000.jpg
    192.6 KB · Views: 7
With a 12bit resolution, 4096 steps the pwm mark space ratio should be half this not 128 but 2048.
I will keep experimenting... Wonder how the timers work??
 
With a 12bit resolution, 4096 steps the pwm mark space ratio should be half this not 128 but 2048.
I will keep experimenting... Wonder how the timers work??

If you look at my previous post, I mentioned library eFlexPWM and Paul provided links. Start from the working example with 10-kHz PWM. If you want to know how the FlexPWM works, read the chapter in the reference manual. You will make a lot more progress that way than by trial and error with the Arduino API.

I wasn't sure what you meant by MS ratio. We normally use the term duty cycle, and you will see that in the eFlexPWM library.
 
Here's a few tips...

First, 38000 Hz is slightly beyond 12 bit PWM resolution. This is documented on the PWM page. It's also easy math. The FlexPWM timers clock at 150 MHz. 150 MHz / 4096 = 36621.09 Hz.

1730391207198.png


Your use of analogWrite(4, 1); only turns off the PWM because of the limited resolution. The smallest pulse the hardware can create is 6.7ns, because it clocks at 150 MHz. This requests a pulse of 1/38000 * 1/4096 = 6.4ns.

After you have configured for 12 bit resolution, to get a 50% duty cycle waveform, you need to use analogWrite(4, 2047); If you use analogWrite(4, 128), you can expect 3.125% duty cycle because the range becomes 0 to 4096.

To show us your program, when composing your message look for the </> icon. It's the very first button in the forum's editor toolbar. Click it and a popup appears. In Arduino IDE, use Ctrl-A and Ctrl-C to copy your program to clipboard. Then in that popup, use Ctrl-V. Simpler than saving a text file, and everyone reading this thread can see it more easily.

Before your copy to clipboard, you might also press Ctrl-T in Arduino IDE. It will automatically "fix" the whitespace in your code, which makes reading easier for everyone.
 
Last edited:
Back
Top