Potential benefit of using stdatomic.h

faisal · Postby **faisal** » Tue Oct 09, 2018 6:05 pm

Any ideas on the potential benefits of using processor intrinsics for atomic operations which the kernel and HAL use all over the place? STM32 has the ldrex/strex instructions, and I'm sure other processor families have their own version of supporting atomic operations.

As the per the current kernel design, what are we leaving on the table performance wise by not using those instructions?

Postby **Giovanni** » Tue Oct 09, 2018 7:36 pm

Hi,

I don't see much use for those right now, this will change with multicore architectures where, in critical zones, you need to protect not just vs interrupt sources but also other cores. The kernel will require some changes, probably NIL will be a testbed, RT will follow.

Not performance advantages anyway.

Giovanni

apmorton · Postby **apmorton** » Sun Oct 21, 2018 7:56 pm

FWIW, this is what using ldrex/strex on arm looks like to write a boolean atomically.

As you can see it must be implemented as a loop.

Purely hypothetically this *could* be an infinite loop if you somehow managed to have an interrupt fire precisely between the ldrex/strex instructions every time whatever thread this code ran in was scheduled. You would have to be incredibly unlucky, but it is technically possible.

Strictly speaking you *could* get a very slight cycle count improvement in some cases using ldrex/strex instead of masking and unmasking interrupts, but you would have to have just the right situation for it to make sense.

Pretend you want to atomically swap some integer to 0 and do something with the old value (maybe the integer is a bunch of flags for example).

Given the instruction cycle durations from here: http://infocenter.arm.com/help/index.js ... DIGAC.html

Assuming R2 is the address of your integer, R3 will be the old value and R1 contains the new value

Implemented using interrupt masking

Code: Select all

CPSID I          // 2 cycles
LDR R3, [R2]     // 2 cycles
STR R1, [R2]     // 2 cycles
CPSIE I          // 2 cycles

// do your stuff with old value outside critical zone

implemented using ldrex/strex

Code: Select all

again:
LDREX R3, [R2]      // 2 cycles
STREX R0, R1, [R2]  // 2 cycles
CMP    R0, #0       // 1 cycle
BNE    again        // 1 cycle if branch not taken

// do your stuff with old value outside critical zone

as you can see you can save a whopping 2 cpu cycles in the best case in this example using ldrex/strex.
However, if something interrupts your exclusive load/store you have to take a branch and try again, which makes the worst case significantly worse than simply masking interrupts.

Not to mention that if you want to perform *multiple* operations on memory in your critical zone you basically cannot use ldrex/strex on their own, since the operations wouldn't be atomic as a whole.

TL;DR; Giovanni is correct. In the best case you are leaving 2 cpu cycles on the table, but in the worst case you perform significantly worse than masking interrupts when using ldrex/strex

faisal · Postby **faisal** » Mon Oct 22, 2018 5:13 pm

Thanks for the detailed reply!

ChibiOS Free Embedded RTOS

Potential benefit of using stdatomic.h

Potential benefit of using stdatomic.h

Re: Potential benefit of using stdatomic.h

Re: Potential benefit of using stdatomic.h

Re: Potential benefit of using stdatomic.h

Who is online