FPU is not beeing used ?

psavr · Postby **psavr** » Mon Jan 06, 2020 11:08 am

Hmm... Or could there be an issue with linker flags, not telling the linker correctly which function should be pulled out of the lib for the SP-FPU?

Postby **Giovanni** » Mon Jan 06, 2020 12:14 pm

It could be, isn't the linker getting the same options?

Edit: it is getting the same options, I just tried.

Code: Select all

arm-none-eabi-gcc -c -mcpu=cortex-m4 -mthumb -O2 -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16  -Wall -Wextra -Wundef -Wstrict-prototypes -Wa,-alms=./build/lst/main.lst -DCORTEX_USE_FPU=TRUE  -MD -MP -MF ./.dep/main.o.d -I. -I./cfg -I../../../os/license -I../../../os/common/portability/GCC -I../../../os/common/startup/ARMCMx/compilers/GCC -I../../../os/common/startup/ARMCMx/devices/STM32G4xx -I../../../os/common/ext/ARM/CMSIS/Core/Include -I../../../os/common/ext/ST/STM32G4xx -I../../../os/hal/include -I../../../os/hal/ports/common/ARMCMx -I../../../os/hal/ports/STM32/STM32G4xx -I../../../os/hal/ports/STM32/LLD/DACv1 -I../../../os/hal/ports/STM32/LLD/DMAv1 -I../../../os/hal/ports/STM32/LLD/EXTIv1 -I../../../os/hal/ports/STM32/LLD/GPIOv2 -I../../../os/hal/ports/STM32/LLD/I2Cv2 -I../../../os/hal/ports/STM32/LLD/RTCv3 -I../../../os/hal/ports/STM32/LLD/QUADSPIv1 -I../../../os/hal/ports/STM32/LLD/RNGv1 -I../../../os/hal/ports/STM32/LLD/SPIv2 -I../../../os/hal/ports/STM32/LLD/TIMv1 -I../../../os/hal/ports/STM32/LLD/USARTv2 -I../../../os/hal/ports/STM32/LLD/USBv1 -I../../../os/hal/ports/STM32/LLD/xWDGv1 -I../../../os/hal/boards/ST_NUCLEO64_G474RE -I../../../os/hal/osal/rt-nil -I../../../os/rt/include -I../../../os/oslib/include -I../../../os/common/ports/ARMCMx -I../../../os/common/ports/ARMCMx/compilers/GCC -I../../../test/lib -I../../../test/rt/source/test -I../../../test/oslib/source/test main.c -o build/obj/main.o

arm-none-eabi-gcc ./build/obj/crt0_v7m.o ./build/obj/vectors.o ./build/obj/chcoreasm_v7m.o   ./build/obj/crt1.o ./build/obj/hal.o ./build/obj/hal_st.o ./build/obj/hal_buffers.o ./build/obj/hal_queues.o ./build/obj/hal_flash.o ./build/obj/hal_mmcsd.o ./build/obj/hal_pal.o ./build/obj/hal_serial.o ./build/obj/nvic.o ./build/obj/stm32_isr.o ./build/obj/hal_lld.o ./build/obj/stm32_dma.o ./build/obj/stm32_exti.o ./build/obj/hal_pal_lld.o ./build/obj/hal_st_lld.o ./build/obj/hal_serial_lld.o ./build/obj/board.o ./build/obj/osal.o ./build/obj/chsys.o ./build/obj/chdebug.o ./build/obj/chtrace.o ./build/obj/chvt.o ./build/obj/chschd.o ./build/obj/chthreads.o ./build/obj/chregistry.o ./build/obj/chsem.o ./build/obj/chmtx.o ./build/obj/chcond.o ./build/obj/chevents.o ./build/obj/chmsg.o ./build/obj/chdynamic.o ./build/obj/chmboxes.o ./build/obj/chmemcore.o ./build/obj/chmemheaps.o ./build/obj/chmempools.o ./build/obj/chpipes.o ./build/obj/chobjcaches.o ./build/obj/chdelegates.o ./build/obj/chfactory.o ./build/obj/chcore.o ./build/obj/chcore_v7m.o ./build/obj/ch_test.o ./build/obj/rt_test_root.o ./build/obj/rt_test_sequence_001.o ./build/obj/rt_test_sequence_002.o ./build/obj/rt_test_sequence_003.o ./build/obj/rt_test_sequence_004.o ./build/obj/rt_test_sequence_005.o ./build/obj/rt_test_sequence_006.o ./build/obj/rt_test_sequence_007.o ./build/obj/rt_test_sequence_008.o ./build/obj/rt_test_sequence_009.o ./build/obj/rt_test_sequence_010.o ./build/obj/rt_test_sequence_011.o ./build/obj/oslib_test_root.o ./build/obj/oslib_test_sequence_001.o ./build/obj/oslib_test_sequence_002.o ./build/obj/oslib_test_sequence_003.o ./build/obj/oslib_test_sequence_004.o ./build/obj/oslib_test_sequence_005.o ./build/obj/oslib_test_sequence_006.o ./build/obj/oslib_test_sequence_007.o ./build/obj/oslib_test_sequence_008.o ./build/obj/oslib_test_sequence_009.o ./build/obj/main.o    -mcpu=cortex-m4 -mthumb -O2 -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nostartfiles  -Wl,-Map=./build/ch.map,--cref,--no-warn-mismatch,--library-path=../../../os/common/startup/ARMCMx/compilers/GCC/ld,--script=../../../os/common/startup/ARMCMx/compilers/GCC/ld/STM32G474xE.ld,--gc-sections,--defsym=__process_stack_size__=0x400,--defsym=__main_stack_size__=0x400   -o build/ch.elf

Giovanni

psavr · Postby **psavr** » Mon Jan 06, 2020 1:43 pm

The same for me, except I'm using -os instead of -o2

psavr · Postby **psavr** » Mon Jan 06, 2020 3:11 pm

Final conclusion: The FPU-usage is speeding up the own pow(b,n) function with factor >10 but not the calculation done by the math.lib

Note: Using for all calculations
float result, b;
unsigned int n;

Kernel: 6.0.3
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: Jan 6 2020 - 14:05:19
ChibiOS/RT Shell

FPU: => software
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1889 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.4976 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.3850 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.1913 s
------------------------------------------------

FPU: => hardware
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2701 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.3552 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------

Postby **Giovanni** » Mon Jan 06, 2020 5:21 pm

Thanks for the test, this is pointing to a compiler issue.

Could you try the latest compiler 9.2.1? It seems it has many more libraries compiled with different options, look under arm-none-eabi/lib/thumb then
compare with 7.3.1.

Giovanni

mobyfab · Postby **mobyfab** » Sat Apr 11, 2020 4:23 pm

Hi,

You can try to use the optimized function from the ARM libraries.
I've made a repo to include into projects: https://github.com/fpoussin/cm4-dsp-lib
This will take advantage of the fpu.

psavr · Postby **psavr** » Wed May 13, 2020 11:56 am

Update: With GCC 9.2.1 the powf(b,n) is about three time faster as with GCC 7.2.1 (using FPU), the other calculations were not affected.
Note: Using the new Kernel 6.1.1 has no influence (as expected)

-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:41:29

ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1851 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.2727 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------

-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:44:10

ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2527 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=01.0398 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0551 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------

ChibiOS Free Embedded RTOS

FPU is not beeing used ?

Re: FPU is not beeing used ?

Re: FPU is not beeing used ?

Re: FPU is not beeing used ?

Re: FPU is not beeing used ?

Re: FPU is not beeing used ?

Re: FPU is not beeing used ?

Re: FPU is not beeing used ?

Who is online