Lightweight C++ callbacks

It is probably writing to stderr. Since (IIRC) stderr is redirected to Serial now, terminate (and probalby a lot other c-library functions) ends up writing stuff to the serial port. If so, I'm not sure if redirecting was really a good idea?
 
I don't see any non-intended output to the serial port (in my programs, but I don't use all the features mentioned in the list below), so I think it's fine. I don't see a lot of `printf` calls in the core libraries. Here's what I see with a quick search:

1. analog_init() in analog.c
2. set_arm_clock() in clockspeed.c
3. IntervalTimer::beginCycles() in IntervalTimer.cpp
4. Some calls in startup.c
5. Some calls in usb_audio.cpp
6. Some calls in usb_flightsim.cpp
7. usb_joystick_send() in usb_joystick.c
8. usb_keyboard_transmit() in usb_keyboard.c
9. Some calls in usb_midi.c
10. usb_mouse_transmit() in usb_mouse.c
11. usb_mtp_configure() in usb_mtp.c
12. usb_rawhid_configure() in usb_rawhid.c
13. Some calls in usb_seremu.c
14. Some calls in usb_serial.c
15. Some calls in usb_serial2.c
16. Some calls in usb_serial3.c
17. Some calls in usb.c

I think some of those are before system initialization is done, but I don't think some of those should be there, eg. the one in IntervalTimer.

As well, I don't think the C library uses `printf` for internal debugging. If it did, I'd be surprised because it's production-level code that shouldn't be including <stdio.h>.

Update: looks like many of those `printf` calls use a blank version from debug/printf.h. My comments are partially moot. I wish they'd use `printf_debug` instead.
 
As well, I don't think the C library uses `printf`
Agreed, none the less the library function terminate() which shouldn't know anything about Teensies prints to USB Serial. I assume this works since _write() is now defined to forward write requests on files 0..2 (stdout/stderr??) to the USB serial port. If the terminate function does this, who knows what other library functions are able now to write to the serial port. While this can be useful, it also might be dangerous. One doesn't know what hardware is connected to a microcontroller.


Test code
Code:
#include <exception>

void setup() {
  while(!Serial){}
  Serial.println("start");
  std::terminate();
}

void loop() {
}

prints:
Code:
start
terminate called without an active exception
 
You'll get no argument from me! :) I especially agree that unknown things printed to the serial port isn't the best idea.

I just verified that `std::terminate()` prints to `stderr` by modifying Print.cpp's `_write()` to look at stuff. I want to see if I can find out if `std::terminate()` can be pointed to a custom thing...
 
Here's what `terminate()` (approximately) looks like:
https://gcc.gnu.org/git/?p=gcc.git;...c;hb=cd5dea63a67ccca09f086df98d11d141d0f86f01

It prints some stuff to `stderr` and then calls abort(). It's easy to change the handler. I tested it with this:

Code:
  std::set_terminate([]() {
    Serial.println("Here!");
    abort();
  });
  std::terminate();

I wonder what the "best way" is to define `stderr` to have the least impact, but still be useful. Right now, Print.cpp redirects all of stdin(0), stdout(1), and stderr(2) to Serial. I could see changing the redirect to exclude 2 (stderr), but that would mean casting "2" to a "Print*" and then trying to call a function on that. Maybe this (see Print.cpp):

Code:
int _write(...) {
  if (file >= 0 && file <= 1) file = (int)&Serial;
  else if (file == 2) return len;  // Ignore stderr output
  return ((class Print *)file)->write((uint8_t *)ptr, len);
}

I'm sure there's reasons for doing it the way it's doing it, probably for "most efficient code possible". I also don't know what "best" means here for `stderr`, however. Maybe a macro definition or some such for configuring it.
 

Yes, that's the one. And I'm using this define:

Code:
#define SG14_INPLACE_FUNCTION_THROW(x) std::__throw_bad_function_call()

In summary: that's equivalent to the call-an-empty-std::function behaviour. It calls `std::terminate()`, which then calls the default `std::terminate_handler` that: prints some stuff to `stderr`and then calls `abort()`.
 
Of course this special case is easily fixed by not doing a __throw_bad_function_call() but crashing the sketch as proposed above.

However, I see a general, but admittedly small, safety issue with redirecting stdout/stderr to the usb serial port by default. Here another nice example:

Code:
#include "Arduino.h"
#include "assert.h"

void someInnocentLibraryFunction(int i){
    assert(i < 5000);
    // do something here which requires i < 5000
}

void setup(){
}

void loop(){
    unsigned i = millis();
    someInnocentLibraryFunction(i);
}

Which, after 5s, prints
Code:
assertion "i < 5000" failed: file "src/main.cpp", line 8, function: void someInnocentLibraryFunction(int)

Kind of nice that assert works now out of the box but the redirecting might generate unexpected behaviour. However, the chance that something bad happens is probably neglectible.
 
Last edited:
I've been doing testing with staticFunctional, inplace_function, fixed_size_function, and delegate.

Maybe I've missed some subtle C++ stuff, but the first 3 seem to be pretty much the same thing. If anyone can explain the practical differences between them, please do!

Delegate seems to be quite an odd duck. It does have special support for function(void *context) and calling non-static member functions, which are pretty compelling features, but using them requires C++ template syntax in user-level code. It's also the only one which is "trivially copyable", which may not really matter. Sadly, it can't support capturing lambdas, even just 4 bytes which could fit into its unused storage. So Delegate is probably out of consideration.

Another big question is choosing the default capture storage size. They all use 8 bytes of RAM on top of the storage. Is only 8 bytes lambda capture too limiting? For example, if we do this with attachInterrupt(), each 4 byte pointer we're storing in a table in RAM becomes 16 bytes. Or 24 bytes if we allow lambda capture to hold 16 bytes. I'm guessing the majority of lambda capture will involve just one 4 byte integer or pointer. But there are probably some pretty important uses than need more, right?
 
staticFunctional is some home brew code I copied from a forum thread. Doesn't need to be bad but I'd trust inplace_function more since it comes from an official c++ workgroup. Chances are that inplace_function will be part of the official STL later. Other than that both do the job as far as I tested them last year.

inplace_function is part of the TimerTool and the EncoderTool. I.e., it implicitely got used by quite some users over the last months without issues.

Never tested "fixed_size_function".

Regarding the storage size I think 8 bytes is borderline. I'd vote for 16 bytes. Having this too small will be problematic when you store functors which tend to have more parameters than lambdas. 16bytes stoarage @60pins would add about 20 bytes x 60 = 1200 bytes RAM compared to the old attachInterrupt. Seems not a big thing for the T4.x but might be problematic for the T3.x? Maybe different sizes for 3.x and 4.x?
 
For me the most interesting use case is the integration of pin intetrrupts / timers etc in user classes. In this case you need to capture the "this" pointer. Having only 8 bytes storage would leave only one int for captured variables :-/
 
Ok, looks like inplace_function is the way to go.

I've been testing with this:

Code:
#define SG14_INPLACE_FUNCTION_THROW(x) /* allow uninitialized no-op without error */

The idea is libraries can create uninitialized callbacks and not need to store extra info about whether it's actually initialized. The situation is like we have now, where the function pointers are initialized to an empty function, so calling through the pointer is always safe.
 
Since `inplace_function` is more likely to be in the standard, its syntax, including sizing for captures, will be well documented. I think the minimum size should be chosen to accommodate the most common use case, and let the user choose bigger sizes as needed, because the `inplace_function` API explicitly allows that. Sure, a bit of documentation and examples may be required, but since most people are either used to no-capture lambdas (includes plain Arduino-style callback functions), or perhaps some functions that track some extra state (i.e. one pointer), including just enough space for the `this` pointer, is, in my view, sufficient as a starting place.
 
Ok, looks like inplace_function is the way to go.

I've been testing with this:

Code:
#define SG14_INPLACE_FUNCTION_THROW(x) /* allow uninitialized no-op without error */

The idea is libraries can create uninitialized callbacks and not need to store extra info about whether it's actually initialized. The situation is like we have now, where the function pointers are initialized to an empty function, so calling through the pointer is always safe.

By any chance are you planning on enabling exceptions for C++17 or threading?
 
I'm afraid using user defined sizes won't work since things like attachinterrupt need to store the function objects somewhere and therefore need the size predefined.
 
Benchmark for various C++ function implementations: https://github.com/user706/CxxFunctionBenchmark/tree/feature/sg14_inplace_function
Code:
[size]
stdex::function<int(int)>: 24
std::function<int(int)>: 64
cxx_function::function<int(int)>: 40
multifunction<int(int)>: 32
boost::function<int(int)>: 40
func::function<int(int)>: 32
generic::delegate<int(int)>: 48
fu2::function<int(int)>: 32
fixed_size_function<int(int)>: 128
embxx_util_StaticFunction: 64
Function_: 56

[function_pointer]
Perf< no_abstraction >: 0.1330217596 [s] {checksum: 0}
Perf< stdex::function<int(int)> >: 0.2122339664 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2399393114 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.2386651034 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.2141137326 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2151225938 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.2121811386 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.2123649795 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2420536333 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2389506757 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2130780047 [s] {checksum: 0}
Perf< Function_ >:    0.2130749860 [s] {checksum: 0}

[compile_time_function_pointer]
Perf< no_abstraction >: 0.3972803010 [s] {checksum: 0}
Perf< stdex::function<int(int)> >: 1.5937800175 [s] {checksum: 0}
Perf< std::function<int(int)> >: 1.8513632305 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 1.6328985953 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 1.5960648980 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 1.5989849202 [s] {checksum: 0}
Perf< func::function<int(int)> >: 1.3389178922 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 1.6025210662 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 1.5986084840 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 1.8629920010 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 1.8321471718 [s] {checksum: 0}
Perf< Function_ >:    1.8688015562 [s] {checksum: 0}

[compile_time_delegate]
Perf< no_abstraction >: 0.0626496160 [s] {checksum: 0}
Perf< stdex::function<int(int)> >: 0.1589898226 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2116471244 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.1846861995 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.1856941551 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.1645116925 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1423058787 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.1793970741 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2156849841 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2176550105 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.1899209876 [s] {checksum: 0}
Perf< Function_ >:    0.1872478979 [s] {checksum: 0}

[heavy_functor]
Perf< stdex::function<int(int)> >: 1.5955586561 [s] {checksum: 0}
Perf< std::function<int(int)> >: 1.8375015018 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 1.6050371818 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 1.6027912430 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 1.6006808455 [s] {checksum: 0}
Perf< func::function<int(int)> >: 1.3345585372 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 1.6102707624 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 1.5997166615 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 1.8603795878 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 1.8277974767 [s] {checksum: 0}
Perf< Function_ >:    1.8649366700 [s] {checksum: 0}

[non_assignable]
Perf< stdex::function<int(int)> >: 0.1590927615 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2110050395 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.1838614811 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.1871413366 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.1631176407 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1413607129 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.2028013256 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2168885537 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2178153053 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.1893721816 [s] {checksum: 0}
Perf< Function_ >:    0.1874921136 [s] {checksum: 0}

[lambda_capture]
Perf< stdex::function<int(int)> >: 1.5930697092 [s] {checksum: 0}
Perf< std::function<int(int)> >: 1.9196068641 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 1.8563353887 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 1.6529472210 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 1.6056910397 [s] {checksum: 0}
Perf< func::function<int(int)> >: 1.5992049860 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 1.8260656285 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 1.8579377332 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 2.0478922288 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 1.8469540663 [s] {checksum: 0}
Perf< Function_ >:    1.7864459408 [s] {checksum: 0}

[stateless_lambda]
Perf< stdex::function<int(int)> >: 1.5926567463 [s] {checksum: 0}
Perf< std::function<int(int)> >: 1.8454776031 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 1.6209733850 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 1.6152552974 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 1.6308521949 [s] {checksum: 0}
Perf< func::function<int(int)> >: 1.3416845627 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 1.6141353468 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 1.6046740280 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 1.8767809781 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 1.8332145964 [s] {checksum: 0}
Perf< Function_ >:    1.8790269169 [s] {checksum: 0}
 
Ok, I've committed inplace_function and usage in IntervalTimer to the core library.

https://github.com/PaulStoffregen/cores/commit/a9d7ece998701d44bc498525afa659422b2d87a2

https://github.com/PaulStoffregen/cores/commit/673a4c0aac64a98bf88f6590dfbc16da81bfb0c6

More usage to come, but hopefully this gives a good starting point.

Can someone explain in two sentences:
Apart from replacing "void (*funct)()" by "callback funct" is there any other advantage, and what is different to "typedef void (*callback)() ;" that I used in the past?
and why should this be lightweight?
 
Can someone explain in two sentences:
Apart from replacing "void (*funct)()" by "callback funct" is there any other advantage, and what is different to "typedef void (*callback)() ;" that I used in the past?
and why should this be lightweight?

Would love to see example of expressions in use - assume like luni's TimerTool and EncoderTool.
> github.com/luni64/TeensyTimerTool/wiki

AFAIK: It allows _isr()'s to be called with parameter values:: github.com/luni64/TeensyTimerTool/wiki/Callbacks#functors-as-callback-objects
So rather than an _isr() havnig no incoming data - it can be provided by constant param values or perhaps resolved from variables - perhaps out of the context of the _isr() IIRC some examples initialized with expressions at some point.

The choice of method minimizes overhead and storage to be lightweight. Seems the path to that has been resolved for a first pass. But not sure of the extent of the parameter choices with the method under study.
 
Would love to see example of expressions in use

Yes, I this might be a good idea. Probably the wiki would be a good idea to collect those examples?

Here something simple which would be complicated to write with the traditional function pointer interface:

Code:
#include "Arduino.h"

IntervalTimer t[4];

void callback(const char* text, int i)  // Callbacks with some input parameters
{
   Serial.printf("%s %d\n",text, i);
}

void setup()
{
    const char text[12] = "someText";  

    for (int i = 1; i < 4; i++)
    {
        t[i].begin([i, text] { callback(text,i); }, i * 100'000);    // setup all timers at once, capture the parameter values for later use (the capturing is necessary since"i" and "text" are local variables)
    }
}

void loop()
{
}
 
Last edited:
Yes, I this might be a good idea. Probably the wiki would be a good idea to collect those examples?
...

So it stores known static values allowing the same shared _isr() code to be entered and present known unique behavior? Prior posts have seemed to use 'live' variables?
Doing 5 pin interrupts to a single _isr() wouldn't have to guess/calculate what pin caused it to be called. The same single copy of the _isr() code would be executed - but would be 'passed'/provided identifying values. The added RAM use of interest is just saving the passed data and not any duplicated code?

On creation it uses "//... capture the parameter" to save those 'identifier values' - so the example above the "char text[12]" could be edited between each iteration of the for() and that unique 'string' would be captured before the local is destroyed?

It doesn't help eliminate 'volatile globals' for shared data if they are captured/static?

If there was some shared data a pointer (to struct or anything) could be passed?
 
BTW: Your questions are more related to lambda expressions. Using lambdas as callbacks is just one of the possibilities you now have to attach callbacks. Anyway,

So it stores known static values allowing the same shared _isr() code to be entered and present known unique behavior? Prior posts have seemed to use 'live' variables?

You can choose in the capturing expression. If you change the lambda to
Code:
[&i, &text](){...}
it will capture references and you have "live" variables. Of course in this example capturing "i" or "text" as references would be not good since they live on the stack and will only be valid during setup.

Doing 5 pin interrupts to a single _isr() wouldn't have to guess/calculate what pin caused it to be called. The same single copy of the _isr() code would be executed - but would be 'passed'/provided identifying values. The added RAM use of interest is just saving the passed data and not any duplicated code?
Yes, that's true. As it is set now you can capture up to 16bytes. If you increase the size of the text variable to 13 you'll get a compilier error since together with the 4 bytes for "i" wont fit in the prealocated space. This limitation ist the main difference to the usual std::function which would allocate the required space dynamically and would have no such limit (but of course the disadvantage of dynamically allocating)

On creation it uses "//... capture the parameter" to save those 'identifier values' - so the example above the "char text[12]" could be edited between each iteration of the for() and that unique 'string' would be captured before the local is destroyed?
Yes, the same happens with "i"

It doesn't help eliminate 'volatile globals' for shared data if they are captured/static?

This is all just solving the problem of storing more general functions. It does not change anything about the fact that the IntervalTimer callbacks are running in a interrupt environment. So, if you want to access some variables which are changed in an interrupt you need to tell the compiler that it should get the current value in any case (i.e. use volatile).

If there was some shared data a pointer (to struct or anything) could be passed?

Sure, actually for me this is the most interesting use case. If you want to embedd a say interval timer in a class you would capture the "this" pointer in the constructor. Thus your callback has access to all class members without the usual static ISR workaround. I'll do an example later
 
BTW: Your questions are more related to lambda expressions. Using lambdas as callbacks is just one of the possibilities you now have to attach callbacks. Anyway,
...

Cool, with one 'captured' example I wasn't sure if lambda expressions were supported in the chosen implementation. So indeed, creation of exemplary samples will be of great value to understand proper use and any limitations.
 
and why should this be lightweight?

Originally when I started this thread with "lightweight" in the title, I had imagined we would increase from 4 to 8 bytes as Delegate does. I still like a lot of things about Delegate, but after spending a *lot* of time, I'm convinced (limited size) capturing lambda support is valuable feature. Like so many features, not everyone will use it, but I'm pretty sure some people will use it to very good effect in libraries many others will enjoy. This is about "software ecosystem" over the long term.

So we've gone from 4 to 8 to 8+capture, which means 16 or 24 bytes in most cases. Each library or API offering callbacks can choose its own capture size. For IntervalTimer where we allocate only 4 callbacks, 24 bytes makes sense. For attachInterrupt where memory is needed for every pin, 8 or maybe even only 4 bytes might make sense. We're still in the early phase of adopting this.

The main meaning of "lightweight" is an absolute guarantee we never call malloc() or otherwise allocate heap. The size checking is done at compile time. Lightweight could also be taken to mean the added CPU overhead for calling the function is only single-digit cycles. But avoiding heap allocation is the main reason to go to all this trouble rather than just use std::function.


is there any other advantage

Yes, of course. We're not going to use 12 or 20 extra bytes for no extra features!
 
Back
Top