T3 Hooks for ChibiOS RTOS?

Status
Not open for further replies.

Bill Greiman

Well-known member
I have ported the ChibiOS/RT RTOS kernel to Teensy 3.0 as a library. Here is a link to ChibiOS http://www.chibios.org/dokuwiki/doku.php?id=start.

Porting the RTOS has required several patches to the Teensy 3.0 core. I have tried to minimize these patches and would appreciate any advice on a better way.

I need three features to run a preemptive kernel. Here are changes to the beta7 code.

A hook in the SysTick handler. I modified systick_isr() like this.
Code:
void (*systick_hook)() = 0;
void systick_isr(void)
{
	timer0_millis_count++;
	if (systick_hook) systick_hook();
}

Access to the Supervisor Call exception. I added a weak symbol like this:
Code:
diff -r org/mk20dx128.c mod/mk20dx128.c
28a29,30
> void svcall_isr(void)		  __attribute__ ((weak, alias("fault_isr")));
> 
92c94
< 	fault_isr,					// 11 ARM: Supervisor call (SVCall)
---
> 	svcall_isr,					// 11 ARM: Supervisor call (SVCall)
Ability to use thread mode and handler mode. I added this code at the start of void ResetHandler(void).
Code:
164a167,168
> int unused_handler_stack() {return 0;}
> int handler_stack_size(void)		__attribute__ ((weak, alias("unused_handler_stack")));
167a172,182
>   int mstk = handler_stack_size();
>   if (mstk) {
>     uint32_t psp, reg;
>     /* Process Stack initialization - allow mstk entries for handler stack */
>     asm volatile ("cpsid   i");
>     psp = (uint32_t)(&_estack - mstk);
>     asm volatile ("msr     PSP, %0" : : "r" (psp));
>     reg = 2;
>     asm volatile ("msr     CONTROL, %0" : : "r" (reg));
>     asm volatile ("isb");
>   }

I would appreciate any advice on a better way to implement the above features.

Here is an example of how ChibiOS can be used with Teensy 3.0. This example runs two high priority tasks and the normal Arduino loop().
Code:
// Simple demo of three threads
// LED blink thread, print thread, and idle loop
#include <ChibiOSTeensy3.h>
const uint8_t LED_PIN = 13;
volatile uint32_t count = 0;
//------------------------------------------------------------------------------
// thread 1 - high priority for blinking LED
// 64 byte stack beyond task switch needs
static WORKING_AREA(waThread1, 64);

static msg_t Thread1(void *arg) {
  pinMode(LED_PIN, OUTPUT);
  while (TRUE) {
    digitalWrite(LED_PIN, HIGH);
    chThdSleepMilliseconds(50);
    digitalWrite(LED_PIN, LOW);
    chThdSleepMilliseconds(150);
  }
  return 0;
}
//------------------------------------------------------------------------------
// thread 2 - print idle loop count every second
// 200 byte stack beyond task switch needs
static WORKING_AREA(waThread2, 200);

static msg_t Thread2(void *arg) {
  while (TRUE) {
    Serial.println(count);
    chThdSleepMilliseconds(1000);
  }
  return 0;
}
//------------------------------------------------------------------------------

void setup() {
  Serial.begin(9600);
  while(!Serial) {};
  delay(2000);
  Serial.println("chBegin");
  
  // initialize ChibiOS
  chBegin();
  
  // start blink thread
  chThdCreateStatic(waThread1, sizeof(waThread1),
    NORMALPRIO + 2, Thread1, NULL);
    
  // start print thread
  chThdCreateStatic(waThread2, sizeof(waThread2),
    NORMALPRIO + 1, Thread2, NULL);
}
//------------------------------------------------------------------------------
// idle loop runs at NORMALPRIO
void loop() {
  count++;
}
 
This is very cool! Would including these patches in the "normal" Teensy 3.0 distribution affect performance much? Or maybe the mods could be turned on/off with a compiler flag so multitasking could be enabled just by including your .h file?
 
The only patch that affects performance is the hook for SysTick. The SysTick interrupt happens every millisecond. The extra overhead when not using a this feature is the time to execute this if() statement when systick_hook is zero.
Code:
  if (systick_hook) systick_hook();

My guess is this would cost on the order of 100 ns every millisecond. That's like a 0.01% performance hit.

The cost for the Supervisor Call exception is zero sine it take no flash and is not executed when the feature is not used.

The thread mode patch adds a few bytes of flash and a few ns of overhead at start-up.
 
Nice work Bill :)

I'm looking at putting these hooks, or something very similar you could use, into mk20dx128.c.

I'm having a difficult time following part of the diff. When you run diff, please use the "-u" option to create a "unified" format diff. That's the format everyone uses. It's a lot easier to follow and match up against modified code because it includes the context of nearby lines.

For systick_isr, rather than add the overhead of reading a pointer, I'd prefer to make the whole thing a weakly bound symbol so you can simply override it and replicate the Teensy-specific code if you like.

If you've been following the Arduino developers mail list, you probably saw I'm working (and trying to collaborate with the Arduino team) on cooperative concurrency features. The most important is adding yield() into busy loops. I put some into beta7. This means you can provide a yield() function and have the normal stuff like delay() work as expected.

My main goal is to create an API similar to Java's Timer object. That's a lot less sophisticated than ChibiOS, but also a lot simpler. I want to provide a hook in that so it can also work together nicely with ChibiOS or similar schedulers.... but at this point it's all in the early design phase.

ChibiOS is pretty awesome, but like all multi-stack schedulers, there is a requirement for users to allocate stack memory. How does anyone decide what sizes to use for WORKING_AREA()?
 
Sorry about the diff, I agree more context is better.

Making systick_isr a weak symbol is better. It would be nice to call a function with your code rather than replicate it. Probably not a big issue for the SysTick isr.

How does anyone decide what sizes to use for WORKING_AREA()?

At task start-up I fill the stack with a pattern (0X55 in each byte). I wrote a function to return the unused space like this:
Code:
  // unused stack for thread 1
  size_t unused1 = chUnusedStack(waThread1, sizeof(waThread1));

To begin you allocate a generous stack. You run the application and then check how much space is unused to adjust the thread's stack size.

ChibiOS really performs on Teensy 3.0, it was designed for Cortex. I used a scope with this sketch and got a context switch in under three microseconds. That's over five times faster than on an AVR arduino.

Code:
// Connect a scope to pin 13
// Measure difference in time between first pulse with no context switch
// and second pulse started in thread 2 and ended in thread 1.
// Difference should be about 15-16 usec on a 16 MHz 328 Arduino.
#include <ChibiOS.h>

const uint8_t LED_PIN = 13;

// Semaphore to trigger context switch
Semaphore sem;
//------------------------------------------------------------------------------
// thread 1 - high priority thread to set pin low
// 64 byte stack beyond task switch and interrupt needs
static WORKING_AREA(waThread1, 64);

static msg_t Thread1(void *arg) {

  while (TRUE) {
    chSemWait(&sem);
    digitalWrite(LED_PIN, LOW);
  }
  return 0;
}
//------------------------------------------------------------------------------
// thread 2 - lower priority thread to toggle LED and trigger thread 1
// 64 byte stack beyond task switch and interrupt needs
static WORKING_AREA(waThread2, 64);

static msg_t Thread2(void *arg) {
  pinMode(LED_PIN, OUTPUT);
  while (TRUE) {
    // first pulse to get time with no context switch
    digitalWrite(LED_PIN, HIGH);
    digitalWrite(LED_PIN, LOW);
    // start second pulse
    digitalWrite(LED_PIN, HIGH);
    // trigger context switch for task that ends pulse
    chSemSignal(&sem);
    // sleep until next tick (1024 microseconds tick on Arduino)
    chThdSleep(1);
  }
  return 0;
}
//------------------------------------------------------------------------------
void setup() {
  // initialize ChibiOS with interrupts disabled
  // ChibiOS will enable interrupts
  cli();
  halInit();
  chSysInit();

  // initialize semaphore
  chSemInit(&sem, 0);

  // start high priority thread
  chThdCreateStatic(waThread1, sizeof(waThread1),
    NORMALPRIO+2, Thread1, NULL);

  // start lower priority thread
  chThdCreateStatic(waThread2, sizeof(waThread2),
    NORMALPRIO+1, Thread2, NULL);
}
//------------------------------------------------------------------------------
// idle task not used
void loop() {}
 
Last edited:
One more point on stack size. I configured ChibiOS to allocate free memory to the stack for loop(). This works just like the native Arduino IDE.

If you use good design, only high priority threads will have stacks allocated by the WORKING_AREA macro. High priority threads should be light weight and are usually well behaved for stack usage.

The bulk of most applications will be in the loop() thread with the more unpredictable stack usage and third party libraries.

Use of preemptive RTOSs is not for beginners. The Arduino coop scheduler is probably a good next step.

I started using RTOSs around 1971 with the DEC PDP-11. Many of my professional projects used VxWorks, the OS used in the Mars landers.

At this point I don't find the various scheduler projects for Arduino very interesting. They seem like going back to the cooperative scheduler used in Apollo in the 1960s.

I good RTOS like ChibiOS has cooperative scheduling primitives so you can design an app with cooperative concurrency or a mixture of preemptive and cooperative concurrency.
 
It was at 48 MHz and the time was close to two microseconds. This is really more than a context switch.

ChibiOS does a context switch in 1.04 usec on a 72 MHz STM32 Cortex M3 so the actual context switch is probably well under 2 usec on Teensy 3.0 at 48 MHz. I used the STM32 Cortex M3/M4 code as the base for my port.
 
Last edited:
Bill, did you do a base line ChibiOS port, and then put the arduino stuff on top? Do you mind sharing your ChibiOS port? I have been playing with it, but haven't had a lot of time lately, but would love to see a port that has been done, then work could go into the HAL.

--Carl
 
pricecw,

I did not do a base port for Kinetis. I am integrating the ChibiOS kernel and ARMv7-M port with the Teensy 3.0 core. The result is a Arduino style library.

I am trying to make it easy for Arduino/Teensy users to take advantage of all the existing libraries and software and use a full featured high performance preemptive kernel.

It is would be really easy to port the ChibiOS kernel to Kinetis. I thought I recognized your name. Remember this http://forum.chibios.org/phpbb/viewtopic.php?f=3&t=652. Giovanni's explanation of what needs to be done is far better than what I could do. As Giovanni says, the HAL is the hard part.

I will be posting my Teensy 3.0 library soon so Paul and others can comment on it.
 
Last edited:
Paul,

For AVR Arduino I ported ChibiOS and FreeRTOS. I am now looking into FreeRTOS http://www.freertos.org/. It looks like the following is sufficient for both.

1. Make most vectors weak. At least SVCallVector, PendSV, and SysTick.

2. Provide a flexible way to setup main and process stacks in ResetHandler.

3. Fill stack/heap area with 0X55 pattern in each byte so usage can be determined.

Your suggestions for the vectors takes care of the first point. I think you should always fill the stack/heap area since stack usage determination is a big problem for Arduino.

That leaves point two. Here, again, is how I did it in my current version of the ChibiOS library. This is what I added to your ResetHandler. The added stuff is between the Bill Greiman start/end comments.
Code:
// Bill Greiman start
__attribute__((weak))
int handler_stack_size() {return 0;}
// Bill Greiman end
__attribute__ ((section(".startup")))
void ResetHandler(void)
{
  // Bill Greiman start
  int mstk = handler_stack_size();
  if (mstk) {
    uint32_t psp, reg;
    /* Process Stack initialization - allow mstk entries for handler stack */
    asm volatile ("cpsid   i");
    psp = (uint32_t)(&_estack - mstk);
    asm volatile ("msr     PSP, %0" : : "r" (psp));
    reg = 2;
    asm volatile ("msr     CONTROL, %0" : : "r" (reg));
    asm volatile ("isb");
  }
  // Bill Greiman end
        uint32_t *src = &_etext;
        uint32_t *dest = &_sdata;
	//void (* ptr)(void);

	WDOG_UNLOCK = WDOG_UNLOCK_SEQ1;
	WDOG_UNLOCK = WDOG_UNLOCK_SEQ2;
	WDOG_STCTRLH = WDOG_STCTRLH_ALLOWUPDATE;

This results in the main stack at the high end of RAM and the loop() stack between the heap and the main stack. I would appreciate any guidance on how I should do this.

Here is the CMx ResetHandler code from ChibiOS:
Code:
/**
 * @brief   Early initialization.
 * @details This hook is invoked immediately after the stack initialization
 *          and before the DATA and BSS segments initialization. The
 *          default behavior is to do nothing.
 * @note    This function is a weak symbol.
 */
#if !defined(__DOXYGEN__)
__attribute__((weak))
#endif
void __early_init(void) {}

/**
 * @brief   Late initialization.
 * @details This hook is invoked after the DATA and BSS segments
 *          initialization and before any static constructor. The
 *          default behavior is to do nothing.
 * @note    This function is a weak symbol.
 */
#if !defined(__DOXYGEN__)
__attribute__((weak))
#endif
void __late_init(void) {}

/**
 * @brief   Default @p main() function exit handler.
 * @details This handler is invoked or the @p main() function exit. The
 *          default behavior is to enter an infinite loop.
 * @note    This function is a weak symbol.
 */
#if !defined(__DOXYGEN__)
__attribute__((weak, naked))
#endif
void _default_exit(void) {
  while (1)
    ;
}
/**
 * @brief   Memory fill.
 *
 * @param[in] start     fill area start
 * @param[in] end       fill area end
 * @param[in] filler    filler pattern
 */
static void fill32(uint32_t *start, uint32_t *end, uint32_t filler) {

  while (start < end)
    *start++ = filler;
}

/**
 * @brief   Reset vector.
 */
#if !defined(__DOXYGEN__)
__attribute__((naked))
#endif
void ResetHandler(void) {
  uint32_t psp, reg;

  /* Process Stack initialization, it is allocated starting from the
     symbol __process_stack_end__ and its lower limit is the symbol
     __process_stack_base__.*/
  asm volatile ("cpsid   i");
  psp = SYMVAL(__process_stack_end__);
  asm volatile ("msr     PSP, %0" : : "r" (psp));

#if CORTEX_USE_FPU
  /* Initializing the FPU context save in lazy mode.*/
  SCB_FPCCR = FPCCR_ASPEN | FPCCR_LSPEN;

  /* CP10 and CP11 set to full access.*/
  SCB_CPACR |= 0x00F00000;

  /* FPSCR and FPDSCR initially zero.*/
  reg = 0;
  asm volatile ("vmsr    FPSCR, %0" : : "r" (reg) : "memory");
  SCB_FPDSCR = reg;

  /* CPU mode initialization, enforced FPCA bit.*/
  reg = CRT0_CONTROL_INIT | 4;
#else
  /* CPU mode initialization.*/
  reg = CRT0_CONTROL_INIT;
#endif
  asm volatile ("msr     CONTROL, %0" : : "r" (reg));
  asm volatile ("isb");

  /* Early initialization hook invocation.*/
  __early_init();

#if CRT0_INIT_STACKS
  /* Main and Process stacks initialization.*/
  fill32(&__main_stack_base__,
         &__main_stack_end__,
         CRT0_STACKS_FILL_PATTERN);
  fill32(&__process_stack_base__,
         &__process_stack_end__,
         CRT0_STACKS_FILL_PATTERN);
#endif

#if CRT0_INIT_DATA
  /* DATA segment initialization.*/
  {
    uint32_t *tp, *dp;

    tp = &_textdata;
    dp = &_data;
    while (dp < &_edata)
      *dp++ = *tp++;
  }
#endif

#if CRT0_INIT_BSS
  /* BSS segment initialization.*/
  fill32(&_bss_start, &_bss_end, 0);
#endif

  /* Late initialization hook invocation.*/
  __late_init();

#if CRT0_CALL_CONSTRUCTORS
  /* Constructors invocation.*/
  {
    funcpp_t fpp = &__init_array_start;
    while (fpp < &__init_array_end) {
      (*fpp)();
      fpp++;
    }
  }
#endif

  /* Invoking application main() function.*/
  main();

#if CRT0_CALL_DESTRUCTORS
  /* Destructors invocation.*/
  {
    funcpp_t fpp = &__fini_array_start;
    while (fpp < &__fini_array_end) {
      (*fpp)();
      fpp++;
    }
  }
#endif
}
 
pricecw,

I did not do a base port for Kinetis. I am integrating the ChibiOS kernel and ARMv7-M port with the Teensy 3.0 core. The result is a Arduino style library.

I am trying to make it easy for Arduino/Teensy users to take advantage of all the existing libraries and software and use a full featured high performance preemptive kernel.

It is would be really easy to port the ChibiOS kernel to Kinetis. I thought I recognized your name. Remember this http://forum.chibios.org/phpbb/viewtopic.php?f=3&t=652. Giovanni's explanation of what needs to be done is far better than what I could do. As Giovanni says, the HAL is the hard part.

I will be posting my Teensy 3.0 library soon so Paul and others can comment on it.

Thanks for the info Bill, I would prefer to have a baseline ChibiOS or FreeRTOS, I will continue as I have time to port it from the base up.

--Carl
 
pricecw,

I understand. Full ChibiOS is overwhelmingly a better choice than Arduino for experienced embedded system developers. I have used ChibiOS on about a dozen different boards.

ChibiOS on STM32 is my favorite choice for serious projects. I use ST-link/V2 for program load/debug over SWD. I love the STM-Studio debugger with it's noninvasive access to the chip and it's plots of variables in real-time. It's like having an oscilloscope/logic analyzer in the chip.

stm-studio.jpg

Other companies like Keil have similar products. A feature like this would be great for Arduino.
 
Bill,

I agree, it is a good system, and I have used it on a couple of STM32 Discovery boards, and a couple other basic STM boards. I really like the idea of small uController boards to be able to program to small tasks, and not worry about, the Teensy 3.0 fit this use wonderfully (before these, I was using a custom MSP430 board I built, a little smaller than the Teensy though).

For this, I want something I can quickly compile a couple of threads, and use in various quick use places. If the need continues, etc, I will throw a board together to fulfill the task.

If you find yourself going to the bare ChibiOS, let me know. If I get something working I will make sure to put it out there.

--Carl
 
Here's what I have so far for the upcoming "beta8". Will this work?

http://www.pjrc.com/teensy/beta/mk20dx128.c

Edit: for systick, I made systick_isr a weak symbol aliased to a default version that only increments the count. Rather than adding the overhead of checking a hook at runtime, the idea is to just replicate systick_isr in ChibiOS's support code.

I added weak symbols for all the other system interrupts, and hooks for early and late initialization. The early hook is as early as possible, before static variables are initialized. The late hook is after everything is initialized (include C++ constructors), right before main() is called.
 
Paul,

I need to setup the thread mode and handler mode stacks. I think I will try it in startup_late_hook() and call main there.

I assumed systick_isr would be a weak symbol in my current development.

I think it should work since timer0_millis_countis a global.
 
Sounds good. I changed the name from "timer0_millis_count" to "systick_millis_count".

I'm going to publish beta8 very soon, since the linker script bug is serious. If there's any other last minute stuff I should put in for ChibiOS, just let me know?
 
Paul,

You might make void ResetHandler(void) naked since it never returns. That saves some stack and I may not need to set the MSP register which I want near _estack.

Edit: I am really happy with your proposal!

Here is the startup_late_hook() for ChibiOS:
Code:
__attribute__((naked))
void startup_late_hook() {
  uint32_t psp, reg;
  /* Process Stack initialization - allow MSP_SIZE entries for handler stack */
  asm volatile ("cpsid   i");
  psp = (uint32_t)(&_estack - MSP_SIZE);
  asm volatile ("msr     PSP, %0" : : "r" (psp));
  reg = 2;
  asm volatile ("msr     CONTROL, %0" : : "r" (reg));
  asm volatile ("isb");
  {
    // fill memory - loop works since compiler dosen't use stack
    uint32_t *p = &_ebss;
    while (p < &_estack) *p++ = 0X55555555;
  }
  
  sei();
  main();
  while(1) {}
}

Using naked on ResetHandler() and startup_late_hook() means I don't need to set MSP and all memory up to _estack is used for the handler stack.
 
Last edited:
I'm planning to post update #22 to the T3 Kickstarter project. mentioning the beta8 software, ChibiOS and other recent developments.

Should I link to this forum topic for ChibiOS? Or would you like to create a new topic, or a github repo or something else?

Or maybe it's best to wait until there's an update for the new hooks in beta8?
 
Paul,

I have both ChibiOS/RT and FreeRTOS running with beta8. I will post them on google code soon.

I think a separate topic would be best to point to the code.

I am doing final testing, finishing examples, and a minimal writeup.

I have four examples for each RTOS.

One example is very simple, runs two tasks, loop(), and can show stack use for all tasks, loop/heap and the handler (isr and kernel) stack. I fill all stacks with a 0X55 pattern and have functions to determine the maximum stack used by any task.

Here is an example of stack use for two tasks plus loop and the handler stack. This makes it easy to make sure you have the right size stacks.
Memory use
Area,Size,Unused
Thread 1 stack,168,104
Thread 2 stack,304,264
Handler stack,400,344
Heap/loop,11604,11476

I have a serious data logger that logs a set of analog pins to a text file. It can log and format 1000 samples per second with four pins in each sample. It has two tasks, a high priority task to read analog pins and queue the values, and a lower priority task to format and write the data to an SD.

I am pleased that the jitter in sample time is about one microsecond. Try to do that in loop() with the normal Arduino environment and no user written ISR.
 
Would you like to start the new topic, perhaps in the Project Guidance section?

I just finished the schematic. Going to post a Kickstarter update very soon!
 
Status
Not open for further replies.
Back
Top