FPU is not beeing used ?

This forum is dedicated to feedback, discussions about ongoing or future developments, ideas and suggestions regarding the ChibiOS projects are welcome. This forum is NOT for support.
User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 11:08 am

Hmm... Or could there be an issue with linker flags, not telling the linker correctly which function should be pulled out of the lib for the SP-FPU?

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: FPU is not beeing used ?

Postby Giovanni » Mon Jan 06, 2020 12:14 pm

It could be, isn't the linker getting the same options?

Edit: it is getting the same options, I just tried.

Code: Select all

arm-none-eabi-gcc -c -mcpu=cortex-m4 -mthumb -O2 -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16  -Wall -Wextra -Wundef -Wstrict-prototypes -Wa,-alms=./build/lst/main.lst -DCORTEX_USE_FPU=TRUE  -MD -MP -MF ./.dep/main.o.d -I. -I./cfg -I../../../os/license -I../../../os/common/portability/GCC -I../../../os/common/startup/ARMCMx/compilers/GCC -I../../../os/common/startup/ARMCMx/devices/STM32G4xx -I../../../os/common/ext/ARM/CMSIS/Core/Include -I../../../os/common/ext/ST/STM32G4xx -I../../../os/hal/include -I../../../os/hal/ports/common/ARMCMx -I../../../os/hal/ports/STM32/STM32G4xx -I../../../os/hal/ports/STM32/LLD/DACv1 -I../../../os/hal/ports/STM32/LLD/DMAv1 -I../../../os/hal/ports/STM32/LLD/EXTIv1 -I../../../os/hal/ports/STM32/LLD/GPIOv2 -I../../../os/hal/ports/STM32/LLD/I2Cv2 -I../../../os/hal/ports/STM32/LLD/RTCv3 -I../../../os/hal/ports/STM32/LLD/QUADSPIv1 -I../../../os/hal/ports/STM32/LLD/RNGv1 -I../../../os/hal/ports/STM32/LLD/SPIv2 -I../../../os/hal/ports/STM32/LLD/TIMv1 -I../../../os/hal/ports/STM32/LLD/USARTv2 -I../../../os/hal/ports/STM32/LLD/USBv1 -I../../../os/hal/ports/STM32/LLD/xWDGv1 -I../../../os/hal/boards/ST_NUCLEO64_G474RE -I../../../os/hal/osal/rt-nil -I../../../os/rt/include -I../../../os/oslib/include -I../../../os/common/ports/ARMCMx -I../../../os/common/ports/ARMCMx/compilers/GCC -I../../../test/lib -I../../../test/rt/source/test -I../../../test/oslib/source/test main.c -o build/obj/main.o

arm-none-eabi-gcc ./build/obj/crt0_v7m.o ./build/obj/vectors.o ./build/obj/chcoreasm_v7m.o   ./build/obj/crt1.o ./build/obj/hal.o ./build/obj/hal_st.o ./build/obj/hal_buffers.o ./build/obj/hal_queues.o ./build/obj/hal_flash.o ./build/obj/hal_mmcsd.o ./build/obj/hal_pal.o ./build/obj/hal_serial.o ./build/obj/nvic.o ./build/obj/stm32_isr.o ./build/obj/hal_lld.o ./build/obj/stm32_dma.o ./build/obj/stm32_exti.o ./build/obj/hal_pal_lld.o ./build/obj/hal_st_lld.o ./build/obj/hal_serial_lld.o ./build/obj/board.o ./build/obj/osal.o ./build/obj/chsys.o ./build/obj/chdebug.o ./build/obj/chtrace.o ./build/obj/chvt.o ./build/obj/chschd.o ./build/obj/chthreads.o ./build/obj/chregistry.o ./build/obj/chsem.o ./build/obj/chmtx.o ./build/obj/chcond.o ./build/obj/chevents.o ./build/obj/chmsg.o ./build/obj/chdynamic.o ./build/obj/chmboxes.o ./build/obj/chmemcore.o ./build/obj/chmemheaps.o ./build/obj/chmempools.o ./build/obj/chpipes.o ./build/obj/chobjcaches.o ./build/obj/chdelegates.o ./build/obj/chfactory.o ./build/obj/chcore.o ./build/obj/chcore_v7m.o ./build/obj/ch_test.o ./build/obj/rt_test_root.o ./build/obj/rt_test_sequence_001.o ./build/obj/rt_test_sequence_002.o ./build/obj/rt_test_sequence_003.o ./build/obj/rt_test_sequence_004.o ./build/obj/rt_test_sequence_005.o ./build/obj/rt_test_sequence_006.o ./build/obj/rt_test_sequence_007.o ./build/obj/rt_test_sequence_008.o ./build/obj/rt_test_sequence_009.o ./build/obj/rt_test_sequence_010.o ./build/obj/rt_test_sequence_011.o ./build/obj/oslib_test_root.o ./build/obj/oslib_test_sequence_001.o ./build/obj/oslib_test_sequence_002.o ./build/obj/oslib_test_sequence_003.o ./build/obj/oslib_test_sequence_004.o ./build/obj/oslib_test_sequence_005.o ./build/obj/oslib_test_sequence_006.o ./build/obj/oslib_test_sequence_007.o ./build/obj/oslib_test_sequence_008.o ./build/obj/oslib_test_sequence_009.o ./build/obj/main.o    -mcpu=cortex-m4 -mthumb -O2 -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nostartfiles  -Wl,-Map=./build/ch.map,--cref,--no-warn-mismatch,--library-path=../../../os/common/startup/ARMCMx/compilers/GCC/ld,--script=../../../os/common/startup/ARMCMx/compilers/GCC/ld/STM32G474xE.ld,--gc-sections,--defsym=__process_stack_size__=0x400,--defsym=__main_stack_size__=0x400   -o build/ch.elf


Giovanni

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 1:43 pm

The same for me, except I'm using -os instead of -o2

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 3:11 pm

Final conclusion: The FPU-usage is speeding up the own pow(b,n) function with factor >10 but not the calculation done by the math.lib

Note: Using for all calculations
float result, b;
unsigned int n;

Kernel: 6.0.3
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: Jan 6 2020 - 14:05:19
ChibiOS/RT Shell

FPU: => software
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1889 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.4976 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.3850 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.1913 s
------------------------------------------------

FPU: => hardware
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2701 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.3552 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: FPU is not beeing used ?

Postby Giovanni » Mon Jan 06, 2020 5:21 pm

Thanks for the test, this is pointing to a compiler issue.

Could you try the latest compiler 9.2.1? It seems it has many more libraries compiled with different options, look under arm-none-eabi/lib/thumb then
compare with 7.3.1.

Giovanni

mobyfab
Posts: 483
Joined: Sat Nov 19, 2011 6:47 pm
Location: Le Mans, France
Has thanked: 21 times
Been thanked: 30 times

Re: FPU is not beeing used ?

Postby mobyfab » Sat Apr 11, 2020 4:23 pm

Hi,

You can try to use the optimized function from the ARM libraries.
I've made a repo to include into projects: https://github.com/fpoussin/cm4-dsp-lib
This will take advantage of the fpu.

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Wed May 13, 2020 11:56 am

Update: With GCC 9.2.1 the powf(b,n) is about three time faster as with GCC 7.2.1 (using FPU), the other calculations were not affected.
Note: Using the new Kernel 6.1.1 has no influence (as expected)

-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:41:29

ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1851 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.2727 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------

-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:44:10

ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2527 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=01.0398 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0551 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------


Return to “Development and Feedback”

Who is online

Users browsing this forum: No registered users and 62 guests