Any ideas on the potential benefits of using processor intrinsics for atomic operations which the kernel and HAL use all over the place? STM32 has the ldrex/strex instructions, and I'm sure other processor families have their own version of supporting atomic operations.
As the per the current kernel design, what are we leaving on the table performance wise by not using those instructions?
Potential benefit of using stdatomic.h
- Giovanni
- Site Admin
- Posts: 14457
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: Potential benefit of using stdatomic.h
Hi,
I don't see much use for those right now, this will change with multicore architectures where, in critical zones, you need to protect not just vs interrupt sources but also other cores. The kernel will require some changes, probably NIL will be a testbed, RT will follow.
Not performance advantages anyway.
Giovanni
I don't see much use for those right now, this will change with multicore architectures where, in critical zones, you need to protect not just vs interrupt sources but also other cores. The kernel will require some changes, probably NIL will be a testbed, RT will follow.
Not performance advantages anyway.
Giovanni
Re: Potential benefit of using stdatomic.h
FWIW, this is what using ldrex/strex on arm looks like to write a boolean atomically.
As you can see it must be implemented as a loop.
Purely hypothetically this *could* be an infinite loop if you somehow managed to have an interrupt fire precisely between the ldrex/strex instructions every time whatever thread this code ran in was scheduled. You would have to be incredibly unlucky, but it is technically possible.
Strictly speaking you *could* get a very slight cycle count improvement in some cases using ldrex/strex instead of masking and unmasking interrupts, but you would have to have just the right situation for it to make sense.
Pretend you want to atomically swap some integer to 0 and do something with the old value (maybe the integer is a bunch of flags for example).
Given the instruction cycle durations from here: http://infocenter.arm.com/help/index.js ... DIGAC.html
Assuming R2 is the address of your integer, R3 will be the old value and R1 contains the new value
Implemented using interrupt masking
implemented using ldrex/strex
as you can see you can save a whopping 2 cpu cycles in the best case in this example using ldrex/strex.
However, if something interrupts your exclusive load/store you have to take a branch and try again, which makes the worst case significantly worse than simply masking interrupts.
Not to mention that if you want to perform *multiple* operations on memory in your critical zone you basically cannot use ldrex/strex on their own, since the operations wouldn't be atomic as a whole.
TL;DR; Giovanni is correct. In the best case you are leaving 2 cpu cycles on the table, but in the worst case you perform significantly worse than masking interrupts when using ldrex/strex
As you can see it must be implemented as a loop.
Purely hypothetically this *could* be an infinite loop if you somehow managed to have an interrupt fire precisely between the ldrex/strex instructions every time whatever thread this code ran in was scheduled. You would have to be incredibly unlucky, but it is technically possible.
Strictly speaking you *could* get a very slight cycle count improvement in some cases using ldrex/strex instead of masking and unmasking interrupts, but you would have to have just the right situation for it to make sense.
Pretend you want to atomically swap some integer to 0 and do something with the old value (maybe the integer is a bunch of flags for example).
Given the instruction cycle durations from here: http://infocenter.arm.com/help/index.js ... DIGAC.html
Assuming R2 is the address of your integer, R3 will be the old value and R1 contains the new value
Implemented using interrupt masking
Code: Select all
CPSID I // 2 cycles
LDR R3, [R2] // 2 cycles
STR R1, [R2] // 2 cycles
CPSIE I // 2 cycles
// do your stuff with old value outside critical zone
implemented using ldrex/strex
Code: Select all
again:
LDREX R3, [R2] // 2 cycles
STREX R0, R1, [R2] // 2 cycles
CMP R0, #0 // 1 cycle
BNE again // 1 cycle if branch not taken
// do your stuff with old value outside critical zone
as you can see you can save a whopping 2 cpu cycles in the best case in this example using ldrex/strex.
However, if something interrupts your exclusive load/store you have to take a branch and try again, which makes the worst case significantly worse than simply masking interrupts.
Not to mention that if you want to perform *multiple* operations on memory in your critical zone you basically cannot use ldrex/strex on their own, since the operations wouldn't be atomic as a whole.
TL;DR; Giovanni is correct. In the best case you are leaving 2 cpu cycles on the table, but in the worst case you perform significantly worse than masking interrupts when using ldrex/strex
Return to “Development and Feedback”
Who is online
Users browsing this forum: No registered users and 42 guests