Teensy 3.1 - Any way to show CPU speed on my OLED?

Status
Not open for further replies.

Andology

Member
Hi chaps (and chapess's)

Im making like a status page in my menu for an OLED and working on. Just wondered was there a way I can get the CPU usage so I can display it (e.g 34% etc)?
Also, I've tried a few functions to display RAM usage / remaining (stack vs heap from the adafruit forum) but wondered if there was anything special in the Teensy that will give me that info? SRAM percentage used would be ideal, or just number bytes remaining (estimated i appreciate the limitations of calculating available RAM)..

Anyway.. Would really appreciate some help, made some great progress with this board and I love it (now my Arduino Mega is looking at me funny on the desk infront of me LOL )

Cheers

Andy
 
Not really. CPU usage is more useful on multi threaded or context/progam switching os. With the teensy and Arduino, you don't have that, it predominantly single threaded, with the odd interrupt. So if you are doing something and then go in a delay loop, you would have to record how many nop loops you do and work out what the percentage from that.
 
the yield() function in the arduino/teensy libs might help. You can "hook" it to catch all calls to yield() and thereby see the unused CPU time.
Alas, this depends on all programs/libraries calling yield() within a spin-loop. And some may not. For example, delay() does call it.

This is CPU utilization. Speed is fixed by the oscillator mode in the startup code, assuming there's no added code for battery power systems' use of sleep strategies.
 
Last edited:
That won't be effective either-- it only proves that yield() was called, just to give control away to the next "thread", whether it was ready or not.

To get a sense of "idle time" you need a "ready queue" that lists the "threads" that are ready to execute. The dispatching code can then spin until the ready queue has "entry(ies)" in it. The spin on an empty ready queue tells you how much idle time you have.

Instead, what you have in Arduino-esque code is "CPU burn" within the application asking "do you have something for me?" (input) or "can I send you something now?" (output). The fact that this constant querying happens in one or several threads doesn't offer a view of "idle time" (unless you manually track it in your own application).

The concept of a ready queue (idle time) probably does exist within an RTOS, where there is preemptive scheduling. But I have not looked at the current offerings to say for certain.
 
Given we don't have threading intrinsic to Teensyduino/Arduino, the use of yield() is perhaps the best we have.
It's probably a good estimate for CPU utilization, in that most applications use delay(). A few might use delaymicros(). But a program that suspends due to, say, a read on Serial or Serial1, or loops on an I/O busy bit would not show in the yield() stats.

Unlikely that the Arduino world will see much use of a threaded OS/RTOS due to the premise of Arduino's minimalism.

FreeRTOS (et al) have an idle task that runs when no tasks are ready- all are waiting on an event, timer, message queue, etc.
 
Last edited:
Let's take wwg's idea one step further. Assuming you have several modules that can report if they need attention and maybe some interrupts that are firing every now and then. You can ask all modules if they need attention and, if not, might enter a shallow (??? as opposed to deep - I think you know what I mean) sleep mode that leaves interrupts enabled. That faciliates power saving. The amount of time you spend waiting for stuff needing attention can be related to the time you spend actually doing stuff, and calculate CPU usage accordingly.

What you need in the first place is a number of modules that can report if they need attention.
 
There's this rather simple notion:
Take the case of a pin interrupt causing an ISR to want to use the SPI port to read interrupt cause/status. That pin interrupt ISR could simply disable further pin interrupts using the ARM NVIC slot for that ISR. Then the ISR would set a flag that non-ISR service is needed soon, then exit the interrupt. The interrupting device would keep the pin asserted because the ISR didn't use the SPI to read the interrupt status. The same could be done for interrupts from the SPI controller itself, or a DMA is finished interrupt for an SPI transfer where the DMA controller has to own the SPI port for a while.

When the application or library (pseudo-OS) sees the flag set, it can try to MUTEX the SPI port and if successful, read the interrupt cause via SPI then tell the VIC to reenable the pin interrupt.
This is clean and simple, and does add delay to reading the interrupt cause. But the only way to avoid that where there are multiple devices on one SPI port, is to not do that! One device per SPI port.

The challenging part is how the pin interrupt ISR can signal the non-ISR code to take note. In FreeRTOS, there's an ISR-save way to do this.
With the minimal Arduino world, there'd have to be conventions set on how to do this via a common SPI port manager or some such. Having freeware authors conform is a challenge.
 
Maybe something along these lines to start with:

Code:
class ExponentialFilter
{
  public:
    // the constructor takes one argument: the alpha value
    ExponentialFilter(double alpha)
      : alpha_(alpha),
      beta_(1 - alpha_)
    { }
    // feed a value into the filter and update the filtered value
    void feed(double sample)
    {
      value_ = alpha_*sample + beta_*value_;
    }
    // get filtered value
    double get() const
    {
      return value_;
    }
    // additional function-like access
    double operator()() const
    {
      return get();
    }
  private:
    const double alpha_;
    const double beta_;
    double value_;
};

class AbstractModule
{
  public:
    AbstractModule()
    {
      // add to list of modules
      prepend(this);
    }

    /** do whatever is necessary to update this module
      \return true if the module had work to do, false if the module remained idle
     **/
    virtual bool update() = 0;

    /** Update all modules **/
    static bool updateAll()
    {
      static ExponentialFilter filter(0.01); // smothen the cpu usage value
      static uint32_t idleTime = 0; // used to measure idle time

      bool busy = false; // this is true if *any* module was busy in the loop

      auto p = modules_();
      elapsedMicros loopTime; // how long did we need to ask / execute all modules

      // loop through modules and ask / execute
      while(p != nullptr)
      {
        if (p->update())
        {
          busy = true;
        }
        p = p->next();
      }
      if (busy) // some module was busy, to regard the loop time as busy time
      {
        uint32_t busyTime_ = loopTime;
        double cpuUsage = (double)busyTime_/(busyTime_ + idleTime);
        idleTime = 0;
        filter.feed(cpuUsage);
        Serial.printf("cpu usage: %5.3f\r", filter.get());
      }
      else // no module was busy, add loop time to idle time
      {
        idleTime += loopTime;
      }

      return busy;
    }

  protected:
    virtual ~AbstractModule() {}

  private:
    /** prepend a module to the list of registered modules **/
    static void prepend(AbstractModule* module)
    {
      module->next_ = modules_();
      modules_() = module;
    }

    /** Don't allow copy construction **/
    AbstractModule(const AbstractModule& other);

    /** Don't allow copy assignment **/
    AbstractModule& operator=(const AbstractModule& other);

    /** get pointer to list of modules **/
    static AbstractModule*& modules_()
    {
      static AbstractModule* p = nullptr;
      return p;
    }

    /** get next module in list **/
    AbstractModule* next() {return next_;}

    /** pointer to next module (intrusive linked list) **/
    AbstractModule* next_;

};

template <uint8_t pin>
class Blink: public AbstractModule
{
  public:
    Blink(uint16_t period)
      : period_(period)
    {
      pinMode(pin, OUTPUT);
    }
    bool update() override
    {
      if (timer_ < period_)
      {
        return false;
      }
      timer_ -= period_;
      digitalWriteFast(pin, !digitalReadFast(pin));
//      delay(50); // <-- uncomment to turn this into a CPU hog
      return true;
    }
  private:
    elapsedMillis timer_;
    uint16_t period_;
};

void setup()
{
  static Blink<LED_BUILTIN> ledBlinker(200);
  static Blink<10> otherBlinker(300);
}
void loop()
{
  AbstractModule::updateAll();
}

Classes:
  • ExponentialFilter is not really needed, but used to smoothen the cpu usage value
  • AbstractModule Provides an interface for modules, gives them CPU time and measures idle vs busy time.
  • Blink is a module implementation that does what it says. Its update() method returns true if anything had to be updated, and false otherwise. The CPU usage measurement depends on the correctness of the return value!
Program your Teensy with this code and connect via USB Serial. You should see that two blinking LEDs use an astonishing 6% CPU on a Teensy 3.1, but then again I didn't really code for speed (or precision, or beauty...), and we get some additional functionality on top.

If you decide to use this kind of strategy, it will force you to divide your application into reasonably small chunks that can be handled on their own and have more or less clean interfaces between them. If you application is not structured like that, it's usually a good idea to refactor things, and here's your motivation: You can then measure CPU usage without adding an RTOS.

Regards

Christoph
 
Last edited:
Note that delay() itself calls yield(). Yield() can be overridden at linker-time with something that schedules/runs other code that shares the single-stack.
CPU hog not, if you wish.
Dumb example: Code a yield() replacement that increments a counter and returns. Every 1 second, zero the counter. The counter is yields() per unit of time. that indicate time available to do other tasks. If your main program does nothing but call yield(), or call delay() and nothing more, the counter will show the CPU utilization less the overhead of the call/return of your own yield(), and so on. I've used this to get a good feel for CPU Idle time available.
 
Last edited:
Note that delay() itself calls yield(). Yield() can be overridden at linker-time with something that schedules/runs other code that shares the single-stack.
CPU hog not, if you wish.

Can you elaborate a bit more on this, please? I can't see how that would relate to my example. It does use delay(), but only to demonstrate the effect of modules that more CPU time - I should have mentioned that.
 
Note that delay() itself calls yield(). Yield() can be overridden at linker-time with something that schedules/runs other code that shares the single-stack.
CPU hog not, if you wish.
Dumb example: Code a yield() replacement that increments a counter and returns. The counter is yields() per second that indicate time available to do other tasks. If your main program does nothing but call yield(), or call delay() and nothing more, the counter will show the CPU utilization less the overhead of the call/return of your own yield(), and so on. I've used this to get a good feel for CPU Idle time available.

This of course, can bring up some reentrancy issues. Like one "thread" calling a usb serial function, which then calls yield(), causing (if a thread context change is done) another "thread", which might also invoke a usb serial call. Alternatively another usb serial call directly from within the yield function. I've not had time to trace the code for this yet, but I believe that is what I saw last night, while testing my Fibers library (each test fiber is writing a message using usb serial).
 
Would it be interesting to use current consumption as a proxy for CPU utilization? Might be simpler than inching down the slippery slope towards writing your own RTOS.
 
Would it be interesting to use current consumption as a proxy for CPU utilization? Might be simpler than inching down the slippery slope towards writing your own RTOS.
I don't think so. Current consumption highly depends on what the any outputs might be driving, what internal peripherals are enabled, and if we're sitting in delay loops. It would drop while the CPU is in some sleep mode - but to enable that, we need to know that no part of the application would rather be calculating something.
 
It is a simple problem. As said above.
when your application has no work to do, call yield(). Code your yield to count and compare to time intervals = utilization.
 
Status
Not open for further replies.
Back
Top