Lightweight Teensy3.x Fibers Library (C++) Available

Status
Not open for further replies.
I'm late to the party here (sorry, but been busy).

The fibres library should work just fine with malloc/free/printf et al. assuming stack and heap are properly set up. Because the context switch only happens when yield() is called, you'll never interrupt printf or malloc type functions. The only thing left that you have to be careful about is what your interrupt service routines are doing. Those occur at any time (when not blocked) and should thus not being doing anything with malloc/free etc.

Warren.

I thought so too, that's why I was a bit confused by the warning about malloc, printf and such. Nonetheless as soon as I used printf in an a Zilch fiber, the application stopped or hung. I didn't try it with the fibers library ( the one this thread is actually about).
 
I thought so too, that's why I was a bit confused by the warning about malloc, printf and such. Nonetheless as soon as I used printf in an a Zilch fiber, the application stopped or hung. I didn't try it with the fibers library ( the one this thread is actually about).

I don't have the time at the moment to look more deeply at this, but I suspect printf()/foo() has exhausted your available stack space and overrun it. Thus corrupting another fiber's stack frame (or the even the heap).

If things work with a simple bar() call (with low stack frame needs), then it should work for printf() as well, given enough stack. I would look at the printf() source code to estimate how much space it needs. This may require some guess work because it prolly calls other functions to complete its work.

IIRC, I also put some stack sentinel capability in the fibers library so you could run a test to determine how much stack was needed. Of course, you'd have to make the output available through some external means (how is your blinking LED morse code?)

wwg

PS: Apologies re Zilch hijack. I assumed because I started this thread with my own version of the fibers library.
 
Last edited:
Some tests

This one works flawlessly. No printf, no malloc:
Code:
#include <zilch.h>

Zilch task(1000);// main task
/*******************************************************************/
void setup() {
  delay(2000);
  pinMode(LED_BUILTIN ,OUTPUT);
  task.create(task1, 1000, 0);
  task.create(task2, 1000, 0);

}

// main thread 
void loop() {
  static uint32_t count = 0;
  Serial.printf("loop #%d\n", count);
  count++;
  delay(1000);
}

/*******************************************************************/
// First task (coroutine)
static void task1(void *arg) {
  while ( 1 ) {
    Serial.println("task1");
    delay(1000);
  }
}

/*******************************************************************/
// 2nd task (coroutine)
static void task2(void *arg) {
  while ( 1 ) {
    Serial.println("task2");
    delay(1000);
  }
}

Now let's sneak malloc into the task and hope that the volatile keyword outsmarts the compiler (I'm a bit paranoid that the compiler would optimize the array away. They do arcane stuff sometimes). Task2 remains as it is above. No problems here, it just runs:
Code:
static void task1(void *arg) {
  volatile int* pInt = (volatile int*)malloc(1000*sizeof(int));
  for (int i = 0; i < 1000; i++)
  {
    pInt[i] = 0;
  }
  while ( 1 ) {
    Serial.print("task1 [0] = "); Serial.println(pInt[0]);
    for (int i = 0; i < 1000; i++)
    {
      pInt[i]++;
    }
    delay(1000);
  }
}

Putting too much stuff on the stack instead of allocating on the heap:
Code:
static void task1(void *arg) {
  volatile int pInt[1000];
  for (int i = 0; i < 1000; i++)
  {
    pInt[i] = 0;
  }
  while ( 1 ) {
    Serial.print("task1 [0] = "); Serial.println(pInt[0]);
    for (int i = 0; i < 1000; i++)
    {
      pInt[i]++;
    }
    delay(1000);
  }
}
This crashes. Output:
Code:
loop #0
task1 [0] = 0
task2
loop #1
task1 [0] = 1  // <== actual last line

So now that we have the gut feeling that malloc isn't that bad after all, how about printf()? Here we go:
Code:
static void task1(void *arg) {
  volatile int* pInt = (volatile int*)malloc(1000*sizeof(int));
  for (int i = 0; i < 1000; i++)
  {
    pInt[i] = 0;
  }
  while ( 1 ) {
    Serial.printf("task1 (printf) [0] = %d\n", pInt[0]);
    for (int i = 0; i < 1000; i++)
    {
      pInt[i]++;
    }
    delay(1000);
  }
}
This crashes as well. Output:
Code:
loop #0
task1 (printf) [0] = 0
task2
loop #1
task1 (printf) [0] = 1  // <== actual last line
Enlarging the stack solves this problem. I've tried
  • 10000 bytes(no crash),
  • divide an conquer... (no crashes),
  • 1250 bytes(no crash),
  • 1024 bytes(crash),
  • 1100 bytes(no crash),
It might be that printf allocates a 1K buffer or something like that on the stack. This is in the same ballpark as the numbers stated here (found by a quick google search):
https://mcuoneclipse.com/2013/04/19/why-i-dont-like-printf/
To be honest I really don't want to sift through the sources of printf and whatever might be beyond that unless I really have to... Allocating 2K for a task seems to be reasonable when I know that I might be using printf. On a Teensy 3.6 that's not too much.

So both malloc and printf seem to be fine *if* the task stack is large enough and we don't let the heap crash into our stacks. But the second point is true for all applications (and yes I know we usually don't need malloc in embedded applications anyway. Just let me have my candy!)

Additional thoughts anyone, or am I off track here?
 
Oh btw my blinky morse code is quite advanced but in this case I'm using SPI and SCK0 is the builtin LED pin. I could move SCK0 to the adjacent pin (14), but that one's occupied by a control signal. Realizing this actually cost me two hours yesterday because I tried to morse debug messages on the LED while writing code that uses the SPI.
 
Returning from a task
I've tried the naive approach below to see what happens when a task is simply finished with its work, and there seems to be more to it than I expected. Just returning crashes the app:
Code:
#include <zilch.h>

Zilch task(1000);// main task
/*******************************************************************/
void setup() {
  delay(2000);
  pinMode(LED_BUILTIN ,OUTPUT);
  task.create(task1, 1100, 0);
  task.create(task2, 1000, 0);

}

// main thread 
void loop() {
  static uint32_t count = 0;
  Serial.printf("loop #%d\n", count);
  count++;
  delay(1000);
}

/*******************************************************************/
// First task (coroutine)
static void task1(void *arg) {
  volatile int* pInt = (volatile int*)malloc(1000*sizeof(int));
  for (int i = 0; i < 1000; i++)
  {
    pInt[i] = 0;
  }
  while ( pInt[0] < 5 ) {
    Serial.printf("task1 (printf) [0] = %d\n", pInt[0]);
    for (int i = 0; i < 1000; i++)
    {
      pInt[i]++;
    }
    delay(1000);
  }
  yield();
}

/*******************************************************************/
// 2nd task (coroutine)
static void task2(void *arg) {
  while ( 1 ) {
    Serial.println("task2");
    delay(1000);
  }
}
Output:
Code:
loop #0
task1 (printf) [0] = 0
task2
<-- snip -->
loop #4
task1 (printf) [0] = 4
task2
loop #5
task2
loop #6
I know that I can sync with a task, but that requires knowledge about which task will finish first, doesn't it?
 
1024 bytes(crash),
1100 bytes(no crash),

Heh, heh, I thought so. Printf() does a fair bit of work. On normal platforms, one doesn't worry too much about buffer sizes on the stack or the heap. But for embedded, it can be quite a different story.

As for the crash from a thread returning in Zilch, I can't help. I know that my fibres library was tested as well behaved in this area.

wwg
 
Returning from a task
I've tried the naive approach below to see what happens when a task is simply finished with its work, and there seems to be more to it than I expected. Just returning crashes the app:

I know that I can sync with a task, but that requires knowledge about which task will finish first, doesn't it?
Increase the stack size of the main task i.e Zilch task(2000);// main task
Seems to work then but unless you really know what printf is doing I would recommend not using it. I have not investigated printf and just avoid it. I had problems in non fiber, thread program (big loop) with the Audio Library before with printf.
 
About syncing/joining

TaskState Zilch::sync( task_func_t task ) takes a funtion pointer
whereas
FiberState Fibers::join(uint32_t fiberx) takes an index to a fiber

Can I start the same function twice as a task? If that is the case, I can only sync with one of them when using Zilch. In that case, the fiber library allows me to do more. If both don't support two fibers running the same function, then the signature of Zilch::sync correctly implies that such a limitation exists. So the bottom line is:

Can I run the same function twice in two fibers, either with the Zilch lib or with the Fibers lib?
 
TaskState Zilch::sync( task_func_t task ) takes a funtion pointer
this is for when two tasks get out of sync not for joining a new task.

Can I run the same function twice in two fibers, either with the Zilch lib or with the Fibers lib?
yes you can, but why?
Do you mean this?
Code:
  task.create(task1, 1000, 0);
  task.create(task1, 1000, 0);

You can do whatever you want with them but it gets pretty ugly quick when you have lots of tasks.

I find it best that each task is used for specific purpose like monitoring a serial port. Here is something I do all the time if I don't want the delays of the loop impeading the usb recieve of packets.
Code:
#include <zilch.h>


Zilch task(1000);// main task
/*******************************************************************/
void setup() {
  while (!Serial);
  delay(100);
  pinMode(LED_BUILTIN , OUTPUT);
  task.create(usb_receive_task, 1000, 0);
}


// main thread
void loop() {
  Serial.println("doing loop stuff with lots of delays");
  delay(random(10, 1000));
}
/*******************************************************************/
// Usb task
static void usb_receive_task(void *arg) {
  int idx = 0;
  while ( 1 ) {
    if (Serial.available()) {
      char n = Serial.read();
      Serial.print(n);
    }
    yield();
  }
}
 
Worker threads come to mind, but as we don't get a handle back when creating a task it's hard to manage them.

Relating to your example: monitoring two serial ports or more doesn't sound that exotic to me, and the tasks would simply get a pointer to different port objects
 
Looking at the fibers library again:

First of all, I'd like to get rid of the linker script modifications. The easiest way of doing that seems to be to remove all the instrumentation code. However, I'd also like to understand how that stack space measurement works in order to get it to work with the original linker script.

Another thing that strikes me is that when fibers have returned, they are basically available for reuse in Fibers::create(), but this opportunity is not taken. So instead of giving up at
Code:
if ( n_fibers >= max_fibers )
  return 0;
the library could call Fibers::restart() on a fiber that has returned. However, I'm not sure how that approach might interfere with Fibers::join(), because that relies on a Fiber being marked as FiberState::FiberReturned. Is there a scenario where Fibers::join() might not work as expected because the fiber was already restarted?

Neither Fibers::restart() nor the underlying fiber_restart() check if the fiber is still running. Is that safe?
 
Looking at the fibers library again:

First of all, I'd like to get rid of the linker script modifications. The easiest way of doing that seems to be to remove all the instrumentation code. However, I'd also like to understand how that stack space measurement works in order to get it to work with the original linker script.

What does this section of your current script look like?

Code:
	.stack :
	{
	    . = ALIGN(4);
...
	} >RAM

	_estack = ORIGIN(RAM) + 64 * 1024;

The symbol _minimum_stack_size is not needed by the fibers code, unless I missed it. What is needed is the _estack symbol, so that the library knows where the stack ends. Can you quote me what your current linker script provides here? Let's start there and work one issue out.
 
That would be
Code:
  .stack : {
    . = ALIGN(4);
    _sstack = .;
    . = . + _minimum_stack_size;
    . = ALIGN(4);
  } >RAM

  _estack = ORIGIN(RAM) + LENGTH(RAM);
(actually this won't work because minimum_stack_size is undefined, but it's what's suggested in fibers.h)
 
Last edited:
Ah, I see. Basically, having it that way is the safest because the linker script is the only place where the whole memory layout is known. However, you could assume or override what is in there by using something like (in fiberslc.h):

Code:
template <unsigned max_fibers>
uint32_t *
Fibers<max_fibers>::end_stack() {
    const uint32_t my_stack_size = <whatever>; // Your chosen stack size

    return (char *)(&_estack) - my_stack_size;
}

Even if my_stack_size is specified larger than actual, you should be ok if you avoid using instrumentation and your code doesn't overrun the stack. The start of stack is only needed so that the stack can be written with sentinel values for later testing (instrumentation). The value of _estack should always be known.

So with that small change above, you should be golden.
 
I tried something similar:
Code:
	if ( instrument ) {
        extern uint32_t _estack;
//		volatile uint32_t *ep = end_stack();
		volatile uint32_t *ep = &_estack - ((main_stack + 4 - 1) & ~(4 - 1)); // upper end - (main_stack rounded up to 4 bytes)
		uint32_t *sp = fiber_getsp();

		while ( --sp > ep )
			*sp = pattern;	// Fill stack with pattern
	}
that makes sense because it takes the actual stack size given as a parameter into account, BUT fiber_getsp() is undefined and indeed it's nowhere to be found in the .h and .cpp file. Something is wrong here. Also, the file "fiberslc.h" you mentioned above doesn't exist in the repo.
 
Last edited:
I tried something similar:
Code:
	if ( instrument ) {
        extern uint32_t _estack;
//		volatile uint32_t *ep = end_stack();
		volatile uint32_t *ep = &_estack - ((main_stack + 4 - 1) & ~(4 - 1)); // upper end - (main_stack rounded up to 4 bytes)
		uint32_t *sp = fiber_getsp();

		while ( --sp > ep )
			*sp = pattern;	// Fill stack with pattern
	}
that makes sense because it takes the actual stack size given as a parameter into account, BUT fiber_getsp() is undefined and indeed it's nowhere to be found in the .h and .cpp file. Something is wrong here. Also, the file "fiberslc.h" you mentioned above doesn't exist in the repo.

I have to say, I am stumpped. The name fiber_getsp() certainly sounds like it should have come from my source code and nowhere else. Yet, I know I have used this in the past and it worked. I am doing an exhaustive find on my Mac, but only the references are coming up. Was this something the avr gcc provided before? I can't explain. If I had the time right now, I could figure out exactly what is needed (is it just the current SP?)

However, the only thing that is in your way really is this instrumentation business. I suggest you just neutre it and carry on:

Code:
#if 0
        if ( instrument ) {
                volatile uint32_t *ep = end_stack();
                uint32_t *sp = fiber_getsp();

                while ( --sp > ep )
                        *sp = pattern;  // Fill stack with pattern
        }
#endif

Obviously instrumentation won't work after the change, but after a recompile there should be no references to the offending symbol.
Hopefully, that addresses your immediate need. After I finish with my book, I need to acquire a recent Teensy module (which looks exciting) and come back to this.

wwg

PS: I was looking at the teensylc version of the library (which has teensylc.h). Both projects lack the fiber_getsp() from what I can see.
 
Last edited:
I have never seen anything called "fiber_*" being provided by avr-gcc.

So for the fibers library I'll just remove instrumentation and carry on with my experiments
 
I have never seen anything called "fiber_*" being provided by avr-gcc.

So for the fibers library I'll just remove instrumentation and carry on with my experiments

Like I said, I can't explain it. It used to link ok, but doesn't now. Obviously it was either generated by the compiler or buried somewhere else 3 years ago. Others in the past have used it also and so that remains a complete mystery.
 
well for now the fibers library seems not to work at all, even with instrumentation commented out. I'd prefer its api over that of Zilch, though, so I'll try to look into it.
 
well for now the fibers library seems not to work at all, even with instrumentation commented out. I'd prefer its api over that of Zilch, though, so I'll try to look into it.

If I understand correctly, you are using this for the Teensy 3.6, which I haven't tested it on. Someone worked with me to develop and test a port for the TeensyLC, which you might try. There were some differences in the saving/restoring of context as I recall. Beyond that, there is not much else I can do for you at the present time. Be sure to apply the same patch to the instrumentation to work around linking issues.

https://github.com/ve3wwg/teensylc_fibers
 
Warren and Christoph,

I've been testing with the Fibers and Zilch libraries and following your discussion. I find the Fibers example builds and runs on T3.2, but won't build on T3.5 due to missing "fiber_getsp" and "_sstack". I've also searched and have been unable to find "fiber_getsp" anywhere, so I don't know how that is being resolved for T3.2. My fix has been to delete the instrumentation, which I think is useful but not necessary. With that small change, Fibers builds and runs for T3.2, and at least builds for T3.5. I can run it tomorrow.

I have also made one small change to what you call the "rounding" of stack size to word boundary.

stack_size = ( stack_size + sizeof (uint32_t) ) / sizeof (uint32_t) * sizeof (uint32_t);

When stack_size is on a word boundary, your code increases the size of the stack by 4 bytes, which I don't think you intended. I think it would be better to use this code, which rounds stack_size down if it's not on a uint32_t boundary:

stack_size = ( stack_size / sizeof (uint32_t) ) * sizeof (uint32_t);

I'm working on a set of mailbox, queue, semaphore, etc. functions as an optional extension to fibers. Fibers can pend on mailboxes, etc. with a timeout, and ISRs can signal fibers via mailboxes, etc.

Zilch has support for both 3.x and LC in one set of files, which would be useful to duplicate in Fibers. While Zilch does build for both T3.2 and T3.5, I prefer the simpler API of Fibers. In my own experience with cooperative executives that include mailboxes, etc., I've always been able to get by without ending, pausing, or resuming tasks. That can all be done by having them pend on signals.

Joe
 
Warren and Christoph,

I've been testing with the Fibers and Zilch libraries and following your discussion. I find the Fibers example builds and runs on T3.2, but won't build on T3.5 due to missing "fiber_getsp" and "_sstack". I've also searched and have been unable to find "fiber_getsp" anywhere, so I don't know how that is being resolved for T3.2. My fix has been to delete the instrumentation, which I think is useful but not necessary. With that small change, Fibers builds and runs for T3.2, and at least builds for T3.5. I can run it tomorrow.

I have also made one small change to what you call the "rounding" of stack size to word boundary.

stack_size = ( stack_size + sizeof (uint32_t) ) / sizeof (uint32_t) * sizeof (uint32_t);

When stack_size is on a word boundary, your code increases the size of the stack by 4 bytes, which I don't think you intended. I think it would be better to use this code, which rounds stack_size down if it's not on a uint32_t boundary:

stack_size = ( stack_size / sizeof (uint32_t) ) * sizeof (uint32_t);

I'm working on a set of mailbox, queue, semaphore, etc. functions as an optional extension to fibers. Fibers can pend on mailboxes, etc. with a timeout, and ISRs can signal fibers via mailboxes, etc.

Zilch has support for both 3.x and LC in one set of files, which would be useful to duplicate in Fibers. While Zilch does build for both T3.2 and T3.5, I prefer the simpler API of Fibers. In my own experience with cooperative executives that include mailboxes, etc., I've always been able to get by without ending, pausing, or resuming tasks. That can all be done by having them pend on signals.

Joe

Thanks for the summary of portability. As for the rounding, I should have provided:

Code:
stack_size = ( stack_size + sizeof (uint32_t) - 1 ) / sizeof (uint32_t) * sizeof (uint32_t);

That way, if the specification is 1 to 3 bytes short of alignment, it would bump the size up by the necessary difference. Good catch.

It would be real nice to find out how fibers_getsp() is resolved. This would perhaps require the use of the avr nm command to list out symbols in all of the participating object files (starting with the application). However, I'll have to return to this next year sometime. I'm simply swamped at the moment.
 
Status
Not open for further replies.
Back
Top