FPU is not beeing used ?
Re: FPU is not beeing used ?
Hmm... Or could there be an issue with linker flags, not telling the linker correctly which function should be pulled out of the lib for the SP-FPU?
- Giovanni
- Site Admin
- Posts: 14457
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: FPU is not beeing used ?
It could be, isn't the linker getting the same options?
Edit: it is getting the same options, I just tried.
Giovanni
Edit: it is getting the same options, I just tried.
Code: Select all
arm-none-eabi-gcc -c -mcpu=cortex-m4 -mthumb -O2 -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16 -Wall -Wextra -Wundef -Wstrict-prototypes -Wa,-alms=./build/lst/main.lst -DCORTEX_USE_FPU=TRUE -MD -MP -MF ./.dep/main.o.d -I. -I./cfg -I../../../os/license -I../../../os/common/portability/GCC -I../../../os/common/startup/ARMCMx/compilers/GCC -I../../../os/common/startup/ARMCMx/devices/STM32G4xx -I../../../os/common/ext/ARM/CMSIS/Core/Include -I../../../os/common/ext/ST/STM32G4xx -I../../../os/hal/include -I../../../os/hal/ports/common/ARMCMx -I../../../os/hal/ports/STM32/STM32G4xx -I../../../os/hal/ports/STM32/LLD/DACv1 -I../../../os/hal/ports/STM32/LLD/DMAv1 -I../../../os/hal/ports/STM32/LLD/EXTIv1 -I../../../os/hal/ports/STM32/LLD/GPIOv2 -I../../../os/hal/ports/STM32/LLD/I2Cv2 -I../../../os/hal/ports/STM32/LLD/RTCv3 -I../../../os/hal/ports/STM32/LLD/QUADSPIv1 -I../../../os/hal/ports/STM32/LLD/RNGv1 -I../../../os/hal/ports/STM32/LLD/SPIv2 -I../../../os/hal/ports/STM32/LLD/TIMv1 -I../../../os/hal/ports/STM32/LLD/USARTv2 -I../../../os/hal/ports/STM32/LLD/USBv1 -I../../../os/hal/ports/STM32/LLD/xWDGv1 -I../../../os/hal/boards/ST_NUCLEO64_G474RE -I../../../os/hal/osal/rt-nil -I../../../os/rt/include -I../../../os/oslib/include -I../../../os/common/ports/ARMCMx -I../../../os/common/ports/ARMCMx/compilers/GCC -I../../../test/lib -I../../../test/rt/source/test -I../../../test/oslib/source/test main.c -o build/obj/main.o
arm-none-eabi-gcc ./build/obj/crt0_v7m.o ./build/obj/vectors.o ./build/obj/chcoreasm_v7m.o ./build/obj/crt1.o ./build/obj/hal.o ./build/obj/hal_st.o ./build/obj/hal_buffers.o ./build/obj/hal_queues.o ./build/obj/hal_flash.o ./build/obj/hal_mmcsd.o ./build/obj/hal_pal.o ./build/obj/hal_serial.o ./build/obj/nvic.o ./build/obj/stm32_isr.o ./build/obj/hal_lld.o ./build/obj/stm32_dma.o ./build/obj/stm32_exti.o ./build/obj/hal_pal_lld.o ./build/obj/hal_st_lld.o ./build/obj/hal_serial_lld.o ./build/obj/board.o ./build/obj/osal.o ./build/obj/chsys.o ./build/obj/chdebug.o ./build/obj/chtrace.o ./build/obj/chvt.o ./build/obj/chschd.o ./build/obj/chthreads.o ./build/obj/chregistry.o ./build/obj/chsem.o ./build/obj/chmtx.o ./build/obj/chcond.o ./build/obj/chevents.o ./build/obj/chmsg.o ./build/obj/chdynamic.o ./build/obj/chmboxes.o ./build/obj/chmemcore.o ./build/obj/chmemheaps.o ./build/obj/chmempools.o ./build/obj/chpipes.o ./build/obj/chobjcaches.o ./build/obj/chdelegates.o ./build/obj/chfactory.o ./build/obj/chcore.o ./build/obj/chcore_v7m.o ./build/obj/ch_test.o ./build/obj/rt_test_root.o ./build/obj/rt_test_sequence_001.o ./build/obj/rt_test_sequence_002.o ./build/obj/rt_test_sequence_003.o ./build/obj/rt_test_sequence_004.o ./build/obj/rt_test_sequence_005.o ./build/obj/rt_test_sequence_006.o ./build/obj/rt_test_sequence_007.o ./build/obj/rt_test_sequence_008.o ./build/obj/rt_test_sequence_009.o ./build/obj/rt_test_sequence_010.o ./build/obj/rt_test_sequence_011.o ./build/obj/oslib_test_root.o ./build/obj/oslib_test_sequence_001.o ./build/obj/oslib_test_sequence_002.o ./build/obj/oslib_test_sequence_003.o ./build/obj/oslib_test_sequence_004.o ./build/obj/oslib_test_sequence_005.o ./build/obj/oslib_test_sequence_006.o ./build/obj/oslib_test_sequence_007.o ./build/obj/oslib_test_sequence_008.o ./build/obj/oslib_test_sequence_009.o ./build/obj/main.o -mcpu=cortex-m4 -mthumb -O2 -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nostartfiles -Wl,-Map=./build/ch.map,--cref,--no-warn-mismatch,--library-path=../../../os/common/startup/ARMCMx/compilers/GCC/ld,--script=../../../os/common/startup/ARMCMx/compilers/GCC/ld/STM32G474xE.ld,--gc-sections,--defsym=__process_stack_size__=0x400,--defsym=__main_stack_size__=0x400 -o build/ch.elf
Giovanni
Re: FPU is not beeing used ?
Final conclusion: The FPU-usage is speeding up the own pow(b,n) function with factor >10 but not the calculation done by the math.lib
Note: Using for all calculations
float result, b;
unsigned int n;
Kernel: 6.0.3
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: Jan 6 2020 - 14:05:19
ChibiOS/RT Shell
FPU: => software
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1889 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.4976 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.3850 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.1913 s
------------------------------------------------
FPU: => hardware
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2701 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.3552 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------
Note: Using for all calculations
float result, b;
unsigned int n;
Kernel: 6.0.3
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: Jan 6 2020 - 14:05:19
ChibiOS/RT Shell
FPU: => software
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1889 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.4976 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.3850 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.1913 s
------------------------------------------------
FPU: => hardware
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2701 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.3552 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------
- Giovanni
- Site Admin
- Posts: 14457
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: FPU is not beeing used ?
Thanks for the test, this is pointing to a compiler issue.
Could you try the latest compiler 9.2.1? It seems it has many more libraries compiled with different options, look under arm-none-eabi/lib/thumb then
compare with 7.3.1.
Giovanni
Could you try the latest compiler 9.2.1? It seems it has many more libraries compiled with different options, look under arm-none-eabi/lib/thumb then
compare with 7.3.1.
Giovanni
-
- Posts: 483
- Joined: Sat Nov 19, 2011 6:47 pm
- Location: Le Mans, France
- Has thanked: 21 times
- Been thanked: 30 times
Re: FPU is not beeing used ?
Hi,
You can try to use the optimized function from the ARM libraries.
I've made a repo to include into projects: https://github.com/fpoussin/cm4-dsp-lib
This will take advantage of the fpu.
You can try to use the optimized function from the ARM libraries.
I've made a repo to include into projects: https://github.com/fpoussin/cm4-dsp-lib
This will take advantage of the fpu.
Re: FPU is not beeing used ?
Update: With GCC 9.2.1 the powf(b,n) is about three time faster as with GCC 7.2.1 (using FPU), the other calculations were not affected.
Note: Using the new Kernel 6.1.1 has no influence (as expected)
-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:41:29
ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1851 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.2727 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------
-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:44:10
ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2527 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=01.0398 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0551 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------
Note: Using the new Kernel 6.1.1 has no influence (as expected)
-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:41:29
ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.1851 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=03.2727 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0513 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------
-------------------------------------------
Kernel: 6.1.1
Compiler: GCC 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: May 13 2020 - 12:44:10
ChibiOS/RT Shell
>
>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, b^n=3125.000000000, duration=13.2527 s
------------------------------------------------
math.lib: result = powf(b,n)
b=5, n=5, b^n=3125.000000000, duration=01.0398 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, b^n=3125.000000000, duration=00.0551 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, b^n=3125.000000000, duration=00.0150 s
------------------------------------------------
Return to “Development and Feedback”
Who is online
Users browsing this forum: No registered users and 51 guests