Page 1 of 1

Stack guard pages on Cortex-M4  Topic is solved

Posted: Mon Feb 11, 2019 1:08 am
by wpaul

I'm still working on a hobby project using the Nordic Semi nRF54820 which uses a Cortex-M4 core. Since it includes MPU support, I decided to try experimenting with the PORT_ENABLE_GUARD_PAGES feature. I got it to work, but I ran into a couple of unusual issues. I'm not sure if they're bugs per se, but at the very least they seem like inconsistencies.

My configuration:
OS: ChibiOS 18.2.1
Compiler: GCC 8.2.0
CPU: Nordic nRF52840 Cortex-M4 with FPU enabled
RAM: 256KB
Flash: 1MB

First, in os/common/ports/ARMCMx/chcore_v7m.h, there is the following macro:


I understand that you need to reserve some space on the stack to hold exception frames, but I'm curious as to where this specific number came from. I got the ARM Cortex-M4 user guide from here: ... UI0553.pdf

Figure 2-3 on page 2-27 shows that there are two possible exception frame layouts: one with preserved floating point context and one without. When there is no floating point context the exception frame is 8 words in size (32 bytes), and when there is floating point context it's 26 words (104 bytes).

That makes the worst case size 104 bytes, which is larger than the 64 bytes that ChibiOS currently reserves. I would expect the definition to be something more like:


Is there any particular reason why it's not done this way? I can confirm that in my setup the CPU does occasionally save FP context. (Disclaimer: I don't know the exception frame formats of all ARM V7M processors, so maybe there is one that's 64 bytes in size.)

The second inconsistency has to do with alignment of the stack. The exception stack and main thread stacks are allocated via the linker scripts, in particular os/common/startup/ARMCMx/compilers/GCC/ld/rules_stacks.ld.

The MPU has a requirement that a protected region's base address must be aligned on a boundary that agrees with the requested region size. The smallest size you can specify is 32 bytes, which is the size that ChibiOS chooses for its guard pages. This means that for 32-byte regions, the base address must be aligned on a 32-byte boundary. This means 0x20002000 is ok, but 0x20002010 is not. If you specify the latter address, the MPU will treat it as 0x20002000.

In os/common/ports/ARMCMx/chcore_v7m.h, we have:


This enforces the correct alignment for all stacks declared with THD_WORKING_AREA().

The problem is that this macro doesn't apply to the main thread stack and exception stack since they're declared via linker script instead. In os/common/startup/ARMCMx/compilers/GCC/ld/rules_stacks.ld, it says:

Code: Select all

    /* Special section for exceptions stack.*/
    .mstack :
        . = ALIGN(8);
        __main_stack_base__ = .;
        . += __main_stack_size__;
        . = ALIGN(8);
        __main_stack_end__ = .;

    /* Special section for process stack.*/
    .pstack :
        __process_stack_base__ = .;
        __main_thread_stack_base__ = .;
        . += __process_stack_size__;
        . = ALIGN(8);
        __process_stack_end__ = .;
        __main_thread_stack_end__ = .;

Here the enforced alignment is only 8. I suppose if you are lucky you might end up with the main and exception stacks aligned on a 32 byte boundary, but I was not lucky. This meant that the protected region for the main thread stack overlapped to the start of the exception stack, and I got a memory manager fault as soon as the first exception happened after the main thread was created.

I ended up doing the following workarounds:

- In chconf.h:
. define PORT_INT_REQUIRED_STACK to 104 (I have the FPU enabled)

- In my project's custom linker script:
. change ALIGN(8) to ALIGN(32)

Then everything worked as expected.

Re: Stack guard pages on Cortex-M4

Posted: Mon Feb 11, 2019 8:40 am
by Giovanni

PORT_INT_REQUIRED_STACK is not the space require for exception frames, those are represented by the port_extctx structure. That macro represent the extra stack space required by the ISR epilogue code used for context switching, it is in the asm part and is executed on ISR return. That code has a stack requirement that depends on compiler, compiler version and compiler options used, this is why it is set to a very large value. You are supposed to trim it down after freezing your code in order to save RAM, if this is an issue.

Main and process stack sizes are handled by the linker scripts, you are supposed to use values compatible with alignment constraints if you are going to change defaults, enforcing 32 bytes by default could be a good idea.


Re: Stack guard pages on Cortex-M4

Posted: Wed Feb 13, 2019 12:00 am
by wpaul
You know, I looked right at the port_extctx part of PORT_WA_SIZE() several times without realizing what it was for. Sorry for being so thick and thanks for pointing that out to me.

When I first turned on the stack guard feature, I found that I did have to increase PORT_INT_REQUIRED_STACK a little in order to get things to work reliably. I investigated some more and now I realize why. It has to do with my specific configuration. There are two important additional details: a) I'm using the Nordic SoftDevice, and b) I enabled advanced kernel mode.

Advanced kernel mode uses the svc instruction to perform thread context switches in the SVCALL handler. The Nordic SoftDevice also uses the SVCALL handler for processing its own API calls. It does support applications using the SVCALL handler too, provided you use a specific range of system call numbers (0 to 15). This means advanced kernel mode should work with ChibiOS, but when I first tried it everything would just crash, and I decided to just switch to compact kernel mode to get around the problem.

Recently I got curious and tried it again, and I figured out why things were crashing. ChibiOS sets the SVCALL handler priority to 1. This implies that the current BASEPRI value should be 2 or higher (e.g. lower priority) in order to successfully execute an svc instruction. When ChibiOS performs a context switch, it temporarily sets BASEPRI to 2 in order to lock out any interrupts with lower priority.

It turns out that when you activate the SoftDevice, it changes the SVCALL priority to 4. ChibiOS still sets the BASEPRI to 2 when it does a context switch, but this results in the svc instruction being executed at a higher priority than the SVCALL handler, which triggers a hard fault.

Once I figured that out, I changed CORTEX_PRIORITY_SVCALL to 4 to match the SoftDevice, and that made advanced kernel mode work correctly. I set all my device interrupt priorities to 5 and 7, so they're all at lower priority than the SVCALL handler and the kernel, so things should have been consistent.

But I forgot something: the SoftDevice manages some devices on its own, like the radio and one of the timers. It sets the priority for the radio interrupt fairly high, higher than CORTEX_BASEPRI_KERNEL.

The result is that we can sometimes take an interrupt in the middle of _port_switch(), which ChibiOS doesn't expect. So what would sometimes happen is:

- current context is the idle thread
- a device interrupt occurs - the CPU pushes the exception fame to the stack (104 bytes)
- _port_switch() pushes some core registers on the stack (36 bytes)
- _port_switch() does a vpush to save some of the floating point context on the stack (64 bytes)
- a radio interrupt occurs right after the vpush - the CPU pushes _another_ exception frame to the stack (104 bytes)

Unfortunately when the radio interrupt happens, there's only 64 bytes left before the stack guard, so a memory management fault occurs when the CPU tries to write the exception frame.

This is obviously not a problem with ChibiOS: it's a problem with my configuration,and it's up to me to figure out the best way to deal with it. I suppose the simplest thing is to just go back to using compact kernel mode.

In any case, thanks for the clarifications.


Re: Stack guard pages on Cortex-M4

Posted: Wed Feb 13, 2019 8:36 am
by Giovanni

This is always the problem when integrating things not designed for RTOS integration.

Just a note, you can have interrupts at priorities above the kernel one (usually 0 and 1), in ChibiOS those are called "fast interrupts", but you need:

1) To not use ChibiOS macros inside the ISR code, there is a specific macro for ISR declaration but it is just a void-void function.
2) Do not use any call to ChibiOS inside fast ISRs. Some X-class functions could be used but, in general, better do not interact with the OS in there.

If you need to call the OS from a fast ISR you may use "delegation", from the fast ISR you trigger a SW interrupt and, from there, you can call the OS. NVIC allows to trigger interrupts in software. The SW-triggered interrupt priority must be, of course, in the allowed range.