Priority order violation (different question :) )

Discussions and support about ChibiOS/RT, the free embedded RTOS.
wpaul
Posts: 16
Joined: Wed Oct 12, 2016 10:06 pm
Been thanked: 3 times

Priority order violation (different question :) )

Postby wpaul » Mon Oct 22, 2018 7:10 pm

So, recently I turned on the CH_DBG_ENABLE_ASSERTS flag in chconf.h for my project (along with other checks) and it exposed some problems. Most were easily fixed, but I also ran into the "priority order violation" problem. I *think* I've actually fixed it now, but I'm trying to understand exactly what happened. Sorry for possibly re-covering old ground.

First, the configuration I have is this:

ChibiOS 18.1.2
Nordic NRF52840 MCU
ARM Cortex-M4 core
1MB flash
256KB RAM
NVIC implements 3 priority bits (8 priority levels total)
Peripherals: UART0, TIMER1, TIMER2, SPI0, SPI3, QSPI0, GPIO, I2C0, I2S0, SysTick timer
Additional support: Nordic S140 SoftDevice

The SoftDevice is basically a binary blob from Nordic which implements driver support for the on-board BlueTooth radio and a BLE5 stack. Normally when you load ChibiOS onto an ARM target, it's positioned at address 0 in the flash. This is also where the interrupt vector table lives, which is constructed as part of the ChibiOS build process. So normally ChibiOS' interrupt handlers service interrupts directly.

However when you use the SoftDevice, things are a bit different:

- The SoftDevice is flashed at address 0
- The SoftDevice provides its own interrupt vector table
- The "application" (in this case ChibiOS) is placed at different flash offset, immediately after the SoftDevice blob (in this case 0x26000)
- The SoftDevice acts as a filter, capturing interrupt calls and forwarding them to the application
- Software makes calls to the SoftDevice using the SVCall instruction
- When active, the SoftDevice notifies the application of events using a software interrupt (there are four SWI vectors in the NRF52840 vector table)

Initially the SoftDevice is not activated, in which case it just passes through all interrupt calls to the application code. The only effect is a slight increase in interrupt latency. When it's turned on, it takes over control of certain peripherals and interrupts (notably the radio) and uses SWI interrupts to signal radio-related events.

Because the SoftDevice uses the SVCall vector, and because it wants to intercept/filter all interrupts, I had to do to the following:

#define CORTEX_SIMPLIFIED_PRIORITY TRUE
#define CRT0_VTOR_INIT FALSE

I rigged up my linker script to place the ChibiOS vector table at 0x26000 and fixed it so that I can link ChibiOS and the SoftDevice into a sigle image, and I can now flash the whole thing in one go and it all works. Everything seemed fine, until I turned on the asserts and checks and ran into the priority order violation.

Also, I increased the system tick frequency like this:

#define CH_CFG_ST_FREQUENCY 24576

I checked carefully to make sure that I wasn't calling an I-class function without a reschedule. That was not the case. It seemed as if the problem only manifested when I enabled the SoftDevice. I most often got it trigger when there was heavy I/O activity. In this case, I was trying to display video and play sound, which involves:

- SPI0 (reading data from SD card, polling touch screen)
- SPI3 (writing to the display)
- I2S (writing audio samples)

In the meantime, the SoftDevice gets radio events and signals them to a separate thread. This thread makes an SVCall to pop events off the event queue. There may be other SVCalls made to manage other things.

Ultimately I tracked the problem down to interrupt priorities. Originally things were set like this:

#define CORTEX_MAX_KERNEL_PRIORITY 0U

#define CORTEX_PRIORITY_PENDSV CORTEX_MAX_KERNEL_PRIORITY
#define NRF5_ST_PRIORITY CORTEX_MAX_KERNEL_PRIORITY

The latter is the SysTick timer priority. Also, all peripherals were set to either 2, 3 or 5.

What ultimately seemed to fix the problem was to set all the priorities like this:

#define NRF5_SPI_SPI0_IRQ_PRIORITY 5
#define NRF5_SPI_SPI3_IRQ_PRIORITY 5
#define NRF5_SPI_QSPI0_IRQ_PRIORITY 5
#define NRF5_SERIAL_UART0_PRIORITY 5
#define NRF5_GPT_TIMER1_IRQ_PRIORITY 5
#define NRF5_GPT_TIMER2_IRQ_PRIORITY 5
#define NRF5_I2C_I2C1_IRQ_PRIORITY 5
#define NRF5_EXT_GPIOTE_IRQ_PRIORITY 5
#define NRF5_SD_IRQ_PRIORITY 5 /* SoftDevice event interrupt via SWI */
#define NRF5_I2S_IRQ_PRIORITY 5
#define NRF5_ST_PRIORITY 5

The PendSV priority is still set to CORTEX_MAX_KERNEL_PRIORITY though.

One other data point: as an experiment last night I set things like this:

#define NRF5_SPI_SPI0_IRQ_PRIORITY 3
#define NRF5_SPI_SPI3_IRQ_PRIORITY 3
#define NRF5_SPI_QSPI0_IRQ_PRIORITY 3
#define NRF5_SERIAL_UART0_PRIORITY 3
#define NRF5_GPT_TIMER1_IRQ_PRIORITY 3
#define NRF5_GPT_TIMER2_IRQ_PRIORITY 3
#define NRF5_I2C_I2C1_IRQ_PRIORITY 3
#define NRF5_EXT_GPIOTE_IRQ_PRIORITY 3
#define NRF5_SD_IRQ_PRIORITY 3
#define NRF5_I2S_IRQ_PRIORITY 3
#define NRF5_ST_PRIORITY 4

and I left the board running. When I checked this morning, it had hit a priority order violation assert again.

Anyway, while I'm pretty sure that the SysTick priority is the culprit here, I don't fully understand why. If someone can clarify it for me, I would appreciate it. (Note that since I think I have things working ok now, I'm not in a huge rush for an answer or anything; I just don't like it when things work and I can't explain why. :) )

-Bill

User avatar
Giovanni
Site Admin
Posts: 13074
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 758 times
Been thanked: 637 times
Contact:

Re: Priority order violation (different question :) )

Postby Giovanni » Mon Oct 22, 2018 7:44 pm

Hi,

It very much depends on how the blob forwards IRQ to the 2nd vectors table. The OS is very reliant on the exact behavior of ISRs, any difference would make it not work, for example extra stacking before calling the ISR. The OS assumes that the main stack is empty after returning from the outer ISR, probably this is no more the case and this causes any kind of problems on context switch.

Try doing the opposite, set VTOR to point to your vectors table and, from your ISRs, call the original vectors in the proprietary blob.

Giovanni

wpaul
Posts: 16
Joined: Wed Oct 12, 2016 10:06 pm
Been thanked: 3 times

Re: Priority order violation (different question :) )

Postby wpaul » Mon Oct 22, 2018 9:45 pm

I had considered trying to intercept the interrupts first, but there are complications.

For example, remember that the SoftDevice assumes it will capture interrupts first and then forward them. If I capture them first, then call the SoftDevice, the SoftDevice will then attempt to forward them again. This means I would have to put ChibiOS at, say, 0x26100, and then have a dummy table at 0x26000 where all the vectors branch to a stub that just returns.

Your comment about the stack is interesting though. It occurs to me that I don't know what the MSP is when the ChibiOS vector handlers run. I would think the SoftDevice would be careful to keep things as transparent as possible, but I suppose I should double check.

-Bill

User avatar
Giovanni
Site Admin
Posts: 13074
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 758 times
Been thanked: 637 times
Contact:

Re: Priority order violation (different question :) )

Postby Giovanni » Tue Oct 23, 2018 7:34 am

Also consider that LR does not contain the usual magic value on ISR entry, it contains a real return address.

Giovanni

wpaul
Posts: 16
Joined: Wed Oct 12, 2016 10:06 pm
Been thanked: 3 times

Re: Priority order violation (different question :) )

Postby wpaul » Tue Oct 23, 2018 10:06 pm

"Also consider that LR does not contain the usual magic value on ISR entry, it contains a real return address."

If you mean the SoftDevice is not passing through the right EXC_RETURN value in the link register, that does not appear to be the case. The Nordic engineers seem to have thought of that.

I investigated with the debugger, and it looks like the SoftDevice is correctly fixing things up before calling the ChibiOS ISRs. I set breakpoints on th SysTick vector addresses in both the SoftDevice's vector table at 0x0 and ChibiOS's vector table at 0x26000: ChibiOS gets the right stack pointer (equal to __main_stack_base__) and the link register contains the same EXC_RETURN value in both places. So the state seems correct.

When I got home yesterday I found that the target had hit the assertion failure again. But I discovered that I made a couple of mistakes in my driver code that botched my attempts to fix up the interrupt priorities: due to a couple of typos, the QSPI and SPI bus 3 priorities were still set to 3. (I found this by dumping the NVIC registers to confirm what was happening.) The QSPI bus isn't being used right now so it doesn't generate any interrupts after boot, but SPI bus 3 is used for the graphics display. I fixed that and I also changed the SysTick priority to 7 so now everything looks like this:

#define NRF5_SPI_SPI0_IRQ_PRIORITY 5
#define NRF5_SPI_SPI3_IRQ_PRIORITY 5
#define NRF5_QSPI_QSPI0_IRQ_PRIORITY 5
#define NRF5_SERIAL_UART0_PRIORITY 5
#define NRF5_GPT_TIMER1_IRQ_PRIORITY 5
#define NRF5_GPT_TIMER2_IRQ_PRIORITY 5
#define NRF5_I2C_I2C1_IRQ_PRIORITY 5
#define NRF5_EXT_GPIOTE_IRQ_PRIORITY 5
#define NRF5_SD_IRQ_PRIORITY 5
#define NRF5_I2S_IRQ_PRIORITY 5
#define NRF5_ST_PRIORITY 7

The only higher priority interrupt used by ChibiOS is PendSV, which is at 0. Some of the peripherals taken over by the SoftDevice have higher priorities, but I don't know if ChibiOS would be affected by them.

After this, I tried stressing the target for a bit and couldn't get it to crash. I left it running overnight and it was still going this morning. (Every so often it refreshes the display so there's some periodic SPI bus activity.) If it's still going when I get home tonight I'll consider it fixed. :)

However, I still have my original question. If we assume for the sake of argument that the SoftDevice is not modifying the CPU state in a way that makes ChibiOS angry, then that means getting the interrupt priorities set right is key to avoiding the "priority order violation" problem. But why is that? What is it about getting the priorities wrong that causes ChibiOS to end up with the run queue in this state? (I tried reading some of the other forum threads on this topic but they don't really explain it.)

-Bill

User avatar
Giovanni
Site Admin
Posts: 13074
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 758 times
Been thanked: 637 times
Contact:

Re: Priority order violation (different question :) )

Postby Giovanni » Wed Oct 24, 2018 7:44 am

Hi,

Priorities at 0..2 are fine as long ChibiOS code is not called from those ISRs directly or indirectly through callbacks.

Giovanni

wpaul
Posts: 16
Joined: Wed Oct 12, 2016 10:06 pm
Been thanked: 3 times

Re: Priority order violation (different question :) )

Postby wpaul » Thu Oct 25, 2018 11:59 pm

Quick update: with the priorities adjusted as shown previously, the target has been running for a couple of days now without any problems, with all the debug checks and asserts enabled.

I still don't understand exactly how the interrupt priority assignments can lead to the "priority order violation" failure though. (I understand that higher priority ISRs will preempt lower priority ones -- assuming interrupts are unmasked -- but I don't get how that botches the scheduling.)

-Bill

User avatar
Giovanni
Site Admin
Posts: 13074
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 758 times
Been thanked: 637 times
Contact:

Re: Priority order violation (different question :) )

Postby Giovanni » Fri Oct 26, 2018 7:38 am

Kernel critical sections only mask interrupts with priority 2 or lower, operations performed at higher priority can preempt the kernel and corrupt internal data structures, imagine 2 list insertion operations performed at same time or other things like that. Kernel corruption or assertions are almost always a symptom of an IRQ priority problem.

Giovanni


Return to “ChibiOS/RT”

Who is online

Users browsing this forum: No registered users and 2 guests