Zilch cooperative multi-tasking for Teensy 3.x/LC

Status
Not open for further replies.

joepasquariello

Well-known member
I'm using Zilch, and I have a suggestion for the CPP file. task_create() currently has a different signature for KINETISK/L. Both can use the same signature as shown below, which allows much of task_create() to be common. I have tested for KINETISK on Teensy 3.2 and 3.5. I can test on Teensy LC tomorrow, probably.

(1) In the extern "C" block, eliminate the KINETISK/L differences and have one declaration for task_create()

Code:
TaskState  task_create ( volatile stack_frame_t *frame, task_func_t func, size_t stack_size, void *arg );

(2) This eliminates KINETISK/L differences in Zilch::create()

Code:
TaskState Zilch::create( task_func_t task, size_t stack_size, void *arg ) {
    // Round stack size to a word multiple
    int s_size = ( stack_size + sizeof ( uint32_t ) - 1 ) / sizeof ( uint32_t ) * sizeof ( uint32_t );
    if ( num_task+1 >= MAX_TASKS ) return TaskInvalid;
    TaskState p = task_create( &process_tasks[++num_task], task, s_size, arg );
    return p;
}

(3) And allows much of task_create() to be common for KINETISK/L:

Code:
TaskState task_create( volatile stack_frame_t *frame, task_func_t func, size_t stack_size, void *arg ) {
#if defined(KINETISL)
    asm volatile("push {r0,r1,r2,r3}\n");
    asm volatile("stmia r0!,{r4,r5,r6,r7}\n");	// Save lower regs
    asm volatile("mov r1,r8\n");
    asm volatile("mov r2,r9\n");
    asm volatile("mov r3,sl\n");
    asm volatile("stmia r0!,{r1,r2,r3}\n");		// Save r8,r9 & sl
    asm volatile("mov r1,fp\n");
    asm volatile("mov r3,lr\n");
    asm volatile("stmia r0!,{r1,r2,r3}\n");		// Save fp,(placeholder for sp) & lr
    asm volatile("pop {r0,r1,r2,r3}\n");		// Restore regs
    frame->r7 = (uint32_t)frame;				// Overwrite r12 with fiber ptr
#elif defined(KINETISK)
    asm volatile( "STMEA %0,{r1-r11}\n" : "+r" ( frame ) :: "memory" );// Save r1-r11 to task struct
    frame->r12 = (uint32_t)frame;				// r12 points to struct
#endif
    frame->stack_size = stack_size;				// Save task size
    frame->func_ptr   = func;           		// Save task function
    frame->arg        = arg;            		// Save task arg
    frame->sp         = stackroot;              // Save as tasks's sp
    frame->lr         = (void*)task_start;		// Task startup code
    frame->state      = TaskCreated;		    // Set state of this task
    frame->initial_sp = stackroot;		        // Save sp for restart()
    stackroot        -= stack_size;             // This is the new root of the stack
    
    int address          = (1 << num_task);    // get task address
    frame->address  |= address;                // set task address
    task_mask         |= (1 << num_task);    // task swap mask
    init_mask            = task_mask;            // num of tasks
    return TaskCreated;
}

Also, should the setting of frame->address be using "|=", or should it be like this?

Code:
    int address         = (1 << num_task);	// task address
    frame->address   = address;              // save address in frame
    task_mask        |= address;	        // add address to task mask
 
I'll take a look at this tonight might be able to do that but the LC is different beast from 3.x's, last night I worked on some optimizations to the context switch code (3.x) that might pay some dividens on the Teensy LC side of things.


Also, should the setting of frame->address be using "|=", or should it be like this?

Code:
    int address         = (1 << num_task);    // task address
    frame->address   = address;              // save address in frame
    task_mask        |= address;            // add address to task mask
I'll take a look at this, you might be right. I need to go through all the code again to see what i was thinking, guess I need better comments;)
 
I'll take a look at this tonight might be able to do that but the LC is different beast from 3.x's, last night I worked on some optimizations to the context switch code (3.x) that might pay some dividens on the Teensy LC side of things.

Thanks for taking a look. Just to be clear, I didn't touch any of the asm() statements in task_create(). With the "frame" argument available for both KINETISK/L (instead of "frame" and "p"), it became clear that all of the non-assembler statements were identical for KINETISK/L except for the setting of the "r7" for KINETISL and "r12" for KINETISK. So, I moved that non-assembler statement to be just below the last asm() statement, and the remainder of the function became common to KINETISK/L.

One thing that I had trouble figuring out was that each task, when it gets swapped in the first time, begins at task_start(), and within task start(), the task function is called with the specified argument. I don't understand why/how you are using interrupt. Is that necessary to do the task management functions?
 
Zilch doesn't save the high registers on LC. Are you simply assuming they are not used?
 
I don't understand why/how you are using interrupt. Is that necessary to do the task management functions?
Thats so no other task could possibly run while do any administrative stuff, these tasks are running at lowest priority level, or I guess you can say no priority.

The context switch runs at Thread Mode this means that it won't interfere with any libraries or code that uses interrupts (Handler Mode).

One thing that I had trouble figuring out was that each task, when it gets swapped in the first time, begins at task_start(), and within task start(), the task function is called with the specified argument.
You can pass an argument when you create the task to pass to the static function that corresponds to that task when starting it, I used it few times but for most applications I assume a global variable would suffice.
 
Zilch doesn't save the high registers on LC. Are you simply assuming they are not used?
Because many of the Thumb instructions can only access the low registers. No I'm not assuming that look at the task_swap code it moves the high reg to the low ones.
 
Last edited:
Also, should the setting of frame->address be using "|=", or should it be like this?

Code:
    int address         = (1 << num_task);    // task address
    frame->address   = address;              // save address in frame
    task_mask        |= address;            // add address to task mask
I'll fix this tonight...
 
Because many of the Thumb instructions can only access the low registers. No I'm not assuming that look at the task_swap code it moves the high reg to the low ones.
Cortex M0 has R0 - R12, just like M4. R10 - R12 aren't saved by task_swap.
 
Cortex M0 has R0 - R12, just like M4. R10 - R12 aren't saved by task_swap.
Oh r12 is not getting saved, but r10-11 are getting saved -> r10 = sl, r11 = fp (gcc). I haven't looked at the LC stuff in a year or so maybe I missed something but it works without saving r12.

If you want to give pull request if you see a better way please do so. I'm not really focused on the LC at this moment but will when I get all the M4 Teensy's optimized.
 
Oh r12 is not getting saved, but r10-11 are getting saved -> r10 = sl, r11 = fp (gcc).
Ah, ok I missed those aliases.
I haven't looked at the LC stuff in a year or so maybe I missed something but it works without saving r12.
Looking at the ARM calling convention, it seems to be ok not to save R12.
 
I got my Teensy LC today, and when I tried to build Zilch with the suggested changes that started this thread, an error occurs in task_create() that probably explains why you did what you did. I don't know why yet, but with the LC, the "frame" argument can't be passed into the function as I've shown. I'll let you know if I figure out why and get it to work.
 
I got my Teensy LC today, and when I tried to build Zilch with the suggested changes that started this thread, an error occurs in task_create() that probably explains why you did what you did. I don't know why yet, but with the LC, the "frame" argument can't be passed into the function as I've shown. I'll let you know if I figure out why and get it to work.
yep thats why it had to be done that way. I'm looking into alot of cruft that needs to be fixed or thrown out right now and once I have the 3.x working right I'll be delving into the LC stuff.

I have found another issue with the way I was setting up the stacks. I was assigning each tasks initial stack pointer backwards. This meant that task1 was using the main's stack space, task2 would use task1's stack space and so on. This might be a problem with the fibers library to I haven't looked yet. So I'm working on a new approach where you pass the "create" function an 32 byte array that will become that tasks stack space.
 
I was assigning each tasks initial stack pointer backwards.

I was trying to understand how stackroot was working. Your init was stackroot=0, and then task_create() was doing stackroot -= stack_size. I assumed all the stacks were in the heap, but I couldn't make sense of the layout. Passing in the address of the stack makes sense. Let me know when you want someone else to try it.
 
I was trying to understand how stackroot was working. Your init was stackroot=0, and then task_create() was doing stackroot -= stack_size. I assumed all the stacks were in the heap, but I couldn't make sense of the layout. Passing in the address of the stack makes sense. Let me know when you want someone else to try it.
If you look in "init_stack" function you'll see where I store the MSP stack pointer to the "stackroot" for use later, its in the asm code. "stack_root" will now only be referring to main task stack space (loop). The other tasks will use an 32 byte array that you declare in the sketch.

Doing some serious housecleaning now but should have something by later tonight or tomorrow. Thanks for helping out:)
 
For T3.5/3.6 do floating point regs/state need to be saved/restored?

You can safely do floating-point in a task, but if you do them within any ISR, you would need to save those registers. You would also have save them in all of your ISRs. I have always found I can avoid it. Do you think you need it?
 
Here is the overhauled beta version of Zilch I've been working on. I fixed a whole host of issues and made some optimizations to context switching. Right now only Teensy 3.x's are working but I plan to add support for the LC soon. Besides simplifying the code I fixed the initial stack pointer pointing to the incorrect address at startup where each task was using its neighbors memory:p. The one thing I'm having a hard time figuring out is how to best layout the actual stack memory. Right now the user declares a global static 32bit array which takes care of the alignment for the most part but I'm weary of having memory allocation left open to the user. While this is the simplest way I'm thinking that the library could create a "Master" pool and allocate from that statically. Then where should this pool reside, bss, FASTRUN I'm not sure yet, maybe thats beyond the scope of this library and just leave it in the .bss and have the user provide it?

The example shows how each task uses stack memory, I found 256 bytes is about the lowest you can go with a task that just "yields".

In respect to saving the floating point registers, I don't know, these tasks don't run from an ISR so it probably follows how the stack handles calling a function that deals with floating point registers. I'll have to read up on that. This isn't attended to be a deterministic like a RTOS but provide easy to use multi-tasking.
 
Last edited:
Downloaded and put in sketchbook\libraries\zilch and it works on my T_3.6! Win 10 - IDE 1.6.12 - TD_1.31

In the example with minor extension [lines below in loop1() were in task1()] for some reason task3 uses 32 bits less stack space it seems?

The delay(100) and delay(3000) in memory_layout setup() make it seem lagging to start and aren't needed?

Would it make sense to just malloc() the task stacks from the heap as they are created? I assume a task is permanent and fixed so no heap fragmentation? (just need to be sure of alignment)

I see the main loop() is still working - so serialEvent() and other normal coding works? Except I don't see that code in the zilch.cpp's replacement yield()?

First thing I did to see it working of course was add qBlink()**, task[1,2,3] each at delay(50) with an odd # of tasks for a uniform blink. Then in each task I did delay() with [50,10,150] for a pulse effect as they toggle over each other.

Apologies as needed for trying it without reading enough/anything before posting - but the good news is it worked and seems to function logically as far as I got.

As noted below trying to see how to best use this - it seems this would essentially supplement a normal running sketch with serialEvent() like tasks? Are tasks run during normal loop() or outside loop()? - (interrupt driven notes are in the code but seem to be ifdef/commented out).

Having not read anything - I made the task1() while(1) into an if(1) and the task is no longer called - as I see in task_start( ) last lines.

Trying to see how this could be used and understand it - I then tried this to be more Arduino'y (for all 3 task#). This worked but used stack space decreased- so I assume the non task loop1() was made in the global stack? I suppose if I RTFM this would be clear, but from my assumption - this would work outside the Zilch scheme? Creating Worker_tasks not queued for round robin but created like a task is either documented - or outside the scope - or a good extension easily done?
Code:
// First task
static void [B]task1[/B](void *arg) {
  // task setup() goes here
  while ( 1 ) {
    yield();
    [B]loop1[/B]();
  }
}
static void [B]loop1[/B]() {
  static uint32_t ii;

  ii = micros();
  qBlink();
  delay(50);
}

** qBlink() toggles LED_BUILTIN>> #define qBlink() (digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN) ))
 
In respect to saving the floating point registers, I don't know, these tasks don't run from an ISR so it probably follows how the stack handles calling a function that deals with floating point registers. I'll have to read up on that. This isn't attended to be a deterministic like a RTOS but provide easy to use multi-tasking.

FWIW, there is code to save floating point state in Bill's -ChibiOS-RTand-FreeRTOS-for-Teensy-3-0. To get it to compile for T3.5/T3.6 i had to make some changes:
https://forum.pjrc.com/threads/34808-K66-Beta-Test?p=119026&viewfull=1#post119026
 
In the example with minor extension [lines below in loop1() were in task1()] for some reason task3 uses 32 bits less stack space it seems?
I noticed that to but it looks like the normal operation of the stack.

Would it make sense to just malloc() the task stacks from the heap as they are created? I assume a task is permanent and fixed so no heap fragmentation? (just need to be sure of alignment)
Heap is slower, but you could try to pass it to the "create" function and see if it works?

I see the main loop() is still working - so serialEvent() and other normal coding works? Except I don't see that code in the zilch.cpp's replacement yield()?
don't use seriaEvent, you can do the same if not better with tasks. SerialEvent is really just like a single task in this library being called whenever yield is called. I override the yield function so it won't call the serialEvent yield!

FWIW, there is code to save floating point state in Bill's -ChibiOS-RTand-FreeRTOS-for-Teensy-3-0. To get it to compile for T3.5/T3.6 i had to make some changes:
https://forum.pjrc.com/threads/34808-K66-Beta-Test?p=119026&viewfull=1#post119026
Yes, you do if you don't use the Lazy Stacking feature taht is enabled by default for interrupts is what I'm reading. Since this doesn't use interrupts to do the context switch I don't know if it applies here? I'm still looking but how normal function calls handle the floating point registers? I guess I'll have to figure out some way of having to use the floating point registers between two tasks. Maybe non context switch example would be funcA uses floating point but does not finish it's floating point operation when funcB is called from funcA. Does the hardware handle that for us?
 
Status
Not open for further replies.
Back
Top