FPU is not beeing used ?

This forum is dedicated to feedback, discussions about ongoing or future developments, ideas and suggestions regarding the ChibiOS projects are welcome. This forum is NOT for support.
User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 8:24 am

Hi I try to use the FPU on the STM32L432C but there is no speed improvement when I enable the FPU in the Makefile by:

# Enables the use of FPU (no, softfp, hard).
ifeq ($(USE_FPU),)
USE_FPU = hard
endif

I'm aware, there is no FPU support for double pow(b,n), but even for float (fpow(b,n), the calculation is only a little bit faster with "USE_FPU = hard" (=>3.2639s) than with "USE_FPU = no" (=> 3.3451s)

Is there anything else I have to do that the math.lib uses the FPU for float?

OUTPUT:
_____________________________________________________________________________________
Kernel: 6.0.3
Compiler: GCC 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Architecture: ARMv7E-M
Core Variant: Cortex-M4F
Port Info: Advanced kernel mode
Platform: STM32L4xx Ultra Low Power
Board: STMicroelectronics STM32 Nucleo32-L432KC
SysClk: 80 MHz
Build time: Jan 6 2020 - 08:10:27

ChibiOS/RT Shell
ch>pow 5 5 100000
------------------------------------------------
math.lib: result = pow(b,n)
b=5, n=5, a^n=3125.000000000, duration=13.0876 s
------------------------------------------------
math.lib: result = fpow(b,n)
b=5, n=5, a^n=3125.000000000, duration=03.2639 s
------------------------------------------------
no loop: result = n*n*n*n*n
b=5, n=5, a^n=3125, duration=00.0075 s
------------------------------------------------
for loop: result = b^n
b=5, n=5, a^n=3125, duration=00.0450 s
------------------------------------------------
ch>

CODE: (relevant part)
_____________________________________________________________________________________

unsigned int tsa,ts;
unsigned int n, ms,ss;
unsigned int ipower, ibase;
unsigned long int iresult;
double dbase, dpower, dresult;
float fbase, fpower, fresult;
dbase = atoi(argv[0]);
ibase = atoi(argv[0]);
fbase = atoi(argv[0]);
dpower = atoi(argv[1]);
ipower = atoi(argv[1]);
fpower = atoi(argv[1]);
//------------------------------------------------------
printf("------------------------------------------------\n\r");
printf("math.lib: result = pow(b,n)\n\r");
tsa = (unsigned int)chVTGetSystemTime();
for (unsigned int i=0; i<n; i++)
{
dresult = pow(dbase,dpower);
}
ts = (unsigned int)chVTGetSystemTime() - tsa;
ms = ts % 10000;
ts = ts / 10000;
ss = ts % 60;
printf("b=%d, n=%d, a^n=%f, duration=%02d.%04d s\n\r",ibase,ipower,dresult,ss,ms);
printf("------------------------------------------------\n\r");
printf("math.lib: result = fpow(b,n)\n\r");
tsa = (unsigned int)chVTGetSystemTime();
for (unsigned int i=0; i<n; i++)
{
fresult = powf(fbase,fpower);
}
ts = (unsigned int)chVTGetSystemTime() - tsa;
ms = ts % 10000;
ts = ts / 10000;
ss = ts % 60;
printf("b=%d, n=%d, a^n=%f, duration=%02d.%04d s\n\r",ibase,ipower,fresult,ss,ms);
printf("------------------------------------------------\n\r");

User avatar
Giovanni
Site Admin
Posts: 12949
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 732 times
Been thanked: 609 times
Contact:

Re: FPU is not beeing used ?

Postby Giovanni » Mon Jan 06, 2020 8:29 am

Hi,

Interesting, probably it depends on how the library has been compiled. You should look at the asm code in there, does it use the FPU directly or it checks for FPU presence with a SW fallback?

Try doing benchmarks on code generated by the compiler, that would be easier to analyze.

Giovanni

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 9:12 am

I was expecting that the library already has the option to make use of the fpu.

I have no experience with ARM asm reading (list file). There are fpu instrucitons used in the powf-function, but i am curious about the complexity:

fresult = powf(fbase,fpower);
8004298: eef0 0a68 vmov.f32 s1, s17
800429c: eeb0 0a48 vmov.f32 s0, s16
80042a0: f005 fe36 bl 8009f10 <powf>

08009f10 <powf>:
8009f10: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
8009f14: b087 sub sp, #28
8009f16: ed8d 0a00 vstr s0, [sp]
8009f1a: 9800 ldr r0, [sp, #0]
8009f1c: ee10 5a90 vmov r5, s1
8009f20: f5a0 0100 sub.w r1, r0, #8388608 ; 0x800000
8009f24: 006b lsls r3, r5, #1
8009f26: f1b1 4ffe cmp.w r1, #2130706432 ; 0x7f000000
8009f2a: ee10 ba90 vmov fp, s1
8009f2e: f103 32ff add.w r2, r3, #4294967295
8009f32: 49c9 ldr r1, [pc, #804] ; (800a258 <powf+0x348>)
8009f34: f080 80dc bcs.w 800a0f0 <powf+0x1e0>
8009f38: 428a cmp r2, r1
8009f3a: f200 813e bhi.w 800a1ba <powf+0x2aa>
8009f3e: 2300 movs r3, #0
8009f40: 9303 str r3, [sp, #12]
8009f42: f100 4640 add.w r6, r0, #3221225472 ; 0xc0000000
8009f46: f506 064d add.w r6, r6, #13434880 ; 0xcd0000
8009f4a: 4cc4 ldr r4, [pc, #784] ; (800a25c <powf+0x34c>)
8009f4c: 0df7 lsrs r7, r6, #23
8009f4e: 05ff lsls r7, r7, #23
8009f50: f3c6 46c3 ubfx r6, r6, #19, #4
8009f54: eb04 1606 add.w r6, r4, r6, lsl #4
8009f58: 1bc0 subs r0, r0, r7
8009f5a: f7f6 fcb5 bl 80008c8 <__aeabi_f2d>
8009f5e: e9d6 2300 ldrd r2, r3, [r6]
8009f62: f7f6 fd09 bl 8000978 <__aeabi_dmul>
8009f66: 2200 movs r2, #0
8009f68: 4bbd ldr r3, [pc, #756] ; (800a260 <powf+0x350>)
8009f6a: f7f6 fb4d bl 8000608 <__aeabi_dsub>
8009f6e: 4602 mov r2, r0
8009f70: 460b mov r3, r1
8009f72: 4680 mov r8, r0
8009f74: 4689 mov r9, r1
8009f76: f7f6 fcff bl 8000978 <__aeabi_dmul>
8009f7a: 4682 mov sl, r0
8009f7c: 15f8 asrs r0, r7, #23
8009f7e: 468b mov fp, r1
8009f80: f7f6 fc90 bl 80008a4 <__aeabi_i2d>
8009f84: e9d6 2302 ldrd r2, r3, [r6, #8]
8009f88: f7f6 fb40 bl 800060c <__adddf3>
8009f8c: e9d4 2348 ldrd r2, r3, [r4, #288] ; 0x120
8009f90: 4606 mov r6, r0
8009f92: 460f mov r7, r1
8009f94: 4640 mov r0, r8
8009f96: 4649 mov r1, r9
8009f98: f7f6 fcee bl 8000978 <__aeabi_dmul>
8009f9c: 4602 mov r2, r0
8009f9e: 460b mov r3, r1
8009fa0: 4630 mov r0, r6
8009fa2: 4639 mov r1, r7
8009fa4: f7f6 fb32 bl 800060c <__adddf3>
8009fa8: e9d4 2340 ldrd r2, r3, [r4, #256] ; 0x100
8009fac: e9cd 0100 strd r0, r1, [sp]
8009fb0: 4640 mov r0, r8
8009fb2: 4649 mov r1, r9
8009fb4: f7f6 fce0 bl 8000978 <__aeabi_dmul>
8009fb8: e9d4 2342 ldrd r2, r3, [r4, #264] ; 0x108
8009fbc: f7f6 fb26 bl 800060c <__adddf3>
8009fc0: 4652 mov r2, sl
8009fc2: 4606 mov r6, r0
8009fc4: 460f mov r7, r1
8009fc6: 465b mov r3, fp
8009fc8: 4650 mov r0, sl
8009fca: 4659 mov r1, fp
8009fcc: f7f6 fcd4 bl 8000978 <__aeabi_dmul>
8009fd0: 4602 mov r2, r0
8009fd2: 460b mov r3, r1
8009fd4: 4630 mov r0, r6
8009fd6: 4639 mov r1, r7
8009fd8: f7f6 fcce bl 8000978 <__aeabi_dmul>
8009fdc: e9d4 2344 ldrd r2, r3, [r4, #272] ; 0x110
8009fe0: 4606 mov r6, r0
8009fe2: 460f mov r7, r1
8009fe4: 4640 mov r0, r8
8009fe6: 4649 mov r1, r9
8009fe8: f7f6 fcc6 bl 8000978 <__aeabi_dmul>
8009fec: e9d4 2346 ldrd r2, r3, [r4, #280] ; 0x118
8009ff0: f7f6 fb0c bl 800060c <__adddf3>
8009ff4: 4652 mov r2, sl
8009ff6: 465b mov r3, fp
8009ff8: f7f6 fcbe bl 8000978 <__aeabi_dmul>
8009ffc: e9dd 2300 ldrd r2, r3, [sp]
800a000: f7f6 fb04 bl 800060c <__adddf3>
800a004: 4632 mov r2, r6
800a006: 463b mov r3, r7
800a008: f7f6 fb00 bl 800060c <__adddf3>
800a00c: 4606 mov r6, r0
800a00e: 4628 mov r0, r5
800a010: 460f mov r7, r1
800a012: f7f6 fc59 bl 80008c8 <__aeabi_f2d>
800a016: 4602 mov r2, r0
800a018: 460b mov r3, r1
800a01a: 4630 mov r0, r6
800a01c: 4639 mov r1, r7
800a01e: f7f6 fcab bl 8000978 <__aeabi_dmul>
800a022: 2500 movs r5, #0
800a024: 0bca lsrs r2, r1, #15
800a026: 2300 movs r3, #0
800a028: b292 uxth r2, r2
800a02a: f248 04be movw r4, #32958 ; 0x80be
800a02e: 429d cmp r5, r3
800a030: bf08 it eq
800a032: 4294 cmpeq r4, r2
800a034: 4606 mov r6, r0
800a036: 460f mov r7, r1
800a038: d375 bcc.n 800a126 <powf+0x216>
800a03a: f8df a234 ldr.w sl, [pc, #564] ; 800a270 <powf+0x360>
800a03e: e9da 8940 ldrd r8, r9, [sl, #256] ; 0x100
800a042: 4630 mov r0, r6
800a044: 4642 mov r2, r8
800a046: 464b mov r3, r9
800a048: 4639 mov r1, r7
800a04a: f7f6 fadf bl 800060c <__adddf3>
800a04e: 4642 mov r2, r8
800a050: 464b mov r3, r9
800a052: 4604 mov r4, r0
800a054: f7f6 fad8 bl 8000608 <__aeabi_dsub>
800a058: 4602 mov r2, r0
800a05a: 460b mov r3, r1
800a05c: 4630 mov r0, r6
800a05e: 4639 mov r1, r7
800a060: f7f6 fad2 bl 8000608 <__aeabi_dsub>
800a064: f004 021f and.w r2, r4, #31
800a068: eb0a 0cc2 add.w ip, sl, r2, lsl #3
800a06c: e9dc 8900 ldrd r8, r9, [ip]
800a070: e9da 2346 ldrd r2, r3, [sl, #280] ; 0x118
800a074: 4606 mov r6, r0
800a076: 460f mov r7, r1
800a078: e9cd 8900 strd r8, r9, [sp]
800a07c: f7f6 fc7c bl 8000978 <__aeabi_dmul>
800a080: 2200 movs r2, #0
800a082: 4b77 ldr r3, [pc, #476] ; (800a260 <powf+0x350>)
800a084: f7f6 fac2 bl 800060c <__adddf3>
800a088: e9da 2342 ldrd r2, r3, [sl, #264] ; 0x108
800a08c: e9cd 0104 strd r0, r1, [sp, #16]
800a090: 4630 mov r0, r6
800a092: 4639 mov r1, r7
800a094: f7f6 fc70 bl 8000978 <__aeabi_dmul>
800a098: e9da 2344 ldrd r2, r3, [sl, #272] ; 0x110
800a09c: f7f6 fab6 bl 800060c <__adddf3>
800a0a0: 4632 mov r2, r6
800a0a2: 4680 mov r8, r0
800a0a4: 4689 mov r9, r1
800a0a6: 463b mov r3, r7
800a0a8: 4630 mov r0, r6
800a0aa: 4639 mov r1, r7
800a0ac: f7f6 fc64 bl 8000978 <__aeabi_dmul>
800a0b0: 4602 mov r2, r0
800a0b2: 460b mov r3, r1
800a0b4: 4640 mov r0, r8
800a0b6: 4649 mov r1, r9
800a0b8: f7f6 fc5e bl 8000978 <__aeabi_dmul>
800a0bc: e9dd 2304 ldrd r2, r3, [sp, #16]
800a0c0: f7f6 faa4 bl 800060c <__adddf3>
800a0c4: 9b03 ldr r3, [sp, #12]
800a0c6: e9dd 8900 ldrd r8, r9, [sp]
800a0ca: 18e4 adds r4, r4, r3
800a0cc: 2200 movs r2, #0
800a0ce: eb18 0802 adds.w r8, r8, r2
800a0d2: ea4f 33c4 mov.w r3, r4, lsl #15
800a0d6: eb49 0903 adc.w r9, r9, r3
800a0da: 4642 mov r2, r8
800a0dc: 464b mov r3, r9
800a0de: f7f6 fc4b bl 8000978 <__aeabi_dmul>
800a0e2: f7f6 ff21 bl 8000f28 <__aeabi_d2f>
800a0e6: ee00 0a10 vmov s0, r0
800a0ea: b007 add sp, #28
800a0ec: e8bd 8ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
800a0f0: 428a cmp r2, r1
800a0f2: d862 bhi.n 800a1ba <powf+0x2aa>
800a0f4: 0043 lsls r3, r0, #1
800a0f6: 1e5a subs r2, r3, #1
800a0f8: 428a cmp r2, r1
800a0fa: d868 bhi.n 800a1ce <powf+0x2be>
800a0fc: 2800 cmp r0, #0
800a0fe: db30 blt.n 800a162 <powf+0x252>
800a100: 2300 movs r3, #0
800a102: 9303 str r3, [sp, #12]
800a104: f5b0 0f00 cmp.w r0, #8388608 ; 0x800000
800a108: f4bf af1b bcs.w 8009f42 <powf+0x32>
800a10c: eddf 7a55 vldr s15, [pc, #340] ; 800a264 <powf+0x354>
800a110: ed9d 7a00 vldr s14, [sp]
800a114: ee67 7a27 vmul.f32 s15, s14, s15
800a118: ee17 0a90 vmov r0, s15
800a11c: f020 4000 bic.w r0, r0, #2147483648 ; 0x80000000
800a120: f1a0 6038 sub.w r0, r0, #192937984 ; 0xb800000
800a124: e70d b.n 8009f42 <powf+0x32>
800a126: a348 add r3, pc, #288 ; (adr r3, 800a248 <powf+0x338>)
800a128: e9d3 2300 ldrd r2, r3, [r3]
800a12c: f7f6 feb4 bl 8000e98 <__aeabi_dcmpgt>
800a130: 2800 cmp r0, #0
800a132: d139 bne.n 800a1a8 <powf+0x298>
800a134: 2200 movs r2, #0
800a136: 4b4c ldr r3, [pc, #304] ; (800a268 <powf+0x358>)
800a138: 4630 mov r0, r6
800a13a: 4639 mov r1, r7
800a13c: f7f6 fe98 bl 8000e70 <__aeabi_dcmple>
800a140: bb28 cbnz r0, 800a18e <powf+0x27e>
800a142: a343 add r3, pc, #268 ; (adr r3, 800a250 <powf+0x340>)
800a144: e9d3 2300 ldrd r2, r3, [r3]
800a148: 4630 mov r0, r6
800a14a: 4639 mov r1, r7
800a14c: f7f6 fe86 bl 8000e5c <__aeabi_dcmplt>
800a150: 2800 cmp r0, #0
800a152: f43f af72 beq.w 800a03a <powf+0x12a>
800a156: 9803 ldr r0, [sp, #12]
800a158: b007 add sp, #28
800a15a: e8bd 4ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
800a15e: f001 b891 b.w 800b284 <__math_may_uflowf>
800a162: f3c5 53c7 ubfx r3, r5, #23, #8
800a166: 2b7e cmp r3, #126 ; 0x7e
800a168: dd17 ble.n 800a19a <powf+0x28a>
800a16a: 2b96 cmp r3, #150 ; 0x96
800a16c: dc22 bgt.n 800a1b4 <powf+0x2a4>
800a16e: 2201 movs r2, #1
800a170: f1c3 0396 rsb r3, r3, #150 ; 0x96
800a174: fa02 f303 lsl.w r3, r2, r3
800a178: 1e5a subs r2, r3, #1
800a17a: 422a tst r2, r5
800a17c: d10d bne.n 800a19a <powf+0x28a>
800a17e: 402b ands r3, r5
800a180: bf18 it ne
800a182: f44f 3380 movne.w r3, #65536 ; 0x10000
800a186: 9303 str r3, [sp, #12]
800a188: f020 4000 bic.w r0, r0, #2147483648 ; 0x80000000
800a18c: e7ba b.n 800a104 <powf+0x1f4>
800a18e: 9803 ldr r0, [sp, #12]
800a190: b007 add sp, #28
800a192: e8bd 4ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
800a196: f001 b86f b.w 800b278 <__math_uflowf>
800a19a: ed9d 0a00 vldr s0, [sp]
800a19e: b007 add sp, #28
800a1a0: e8bd 4ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
800a1a4: f001 b88c b.w 800b2c0 <__math_invalidf>
800a1a8: 9803 ldr r0, [sp, #12]
800a1aa: b007 add sp, #28
800a1ac: e8bd 4ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
800a1b0: f001 b86e b.w 800b290 <__math_oflowf>
800a1b4: 2300 movs r3, #0
800a1b6: 9303 str r3, [sp, #12]
800a1b8: e7e6 b.n 800a188 <powf+0x278>
800a1ba: bb3b cbnz r3, 800a20c <powf+0x2fc>
800a1bc: f480 0380 eor.w r3, r0, #4194304 ; 0x400000
800a1c0: 005b lsls r3, r3, #1
800a1c2: f513 0f00 cmn.w r3, #8388608 ; 0x800000
800a1c6: d81a bhi.n 800a1fe <powf+0x2ee>
800a1c8: eeb7 0a00 vmov.f32 s0, #112 ; 0x3f800000 1.0
800a1cc: e78d b.n 800a0ea <powf+0x1da>
800a1ce: eddd 7a00 vldr s15, [sp]
800a1d2: 2800 cmp r0, #0
800a1d4: ee27 0aa7 vmul.f32 s0, s15, s15
800a1d8: db55 blt.n 800a286 <powf+0x376>
800a1da: 2000 movs r0, #0
800a1dc: 2b00 cmp r3, #0
800a1de: d149 bne.n 800a274 <powf+0x364>
800a1e0: f1bb 0f00 cmp.w fp, #0
800a1e4: da81 bge.n 800a0ea <powf+0x1da>
800a1e6: b007 add sp, #28
800a1e8: e8bd 4ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
800a1ec: f001 b856 b.w 800b29c <__math_divzerof>
800a1f0: f48b 0b80 eor.w fp, fp, #4194304 ; 0x400000
800a1f4: ea4f 034b mov.w r3, fp, lsl #1
800a1f8: f513 0f00 cmn.w r3, #8388608 ; 0x800000
800a1fc: d9e4 bls.n 800a1c8 <powf+0x2b8>
800a1fe: eddd 7a00 vldr s15, [sp]
800a202: ee07 5a10 vmov s14, r5
800a206: ee37 0a87 vadd.f32 s0, s15, s14
800a20a: e76e b.n 800a0ea <powf+0x1da>
800a20c: f1b0 5f7e cmp.w r0, #1065353216 ; 0x3f800000
800a210: d0ee beq.n 800a1f0 <powf+0x2e0>
800a212: 0040 lsls r0, r0, #1
800a214: f1b0 4f7f cmp.w r0, #4278190080 ; 0xff000000
800a218: d8f1 bhi.n 800a1fe <powf+0x2ee>
800a21a: f1b3 4f7f cmp.w r3, #4278190080 ; 0xff000000
800a21e: d1ee bne.n 800a1fe <powf+0x2ee>
800a220: f1b0 4ffe cmp.w r0, #2130706432 ; 0x7f000000
800a224: d0d0 beq.n 800a1c8 <powf+0x2b8>
800a226: f1b0 4ffe cmp.w r0, #2130706432 ; 0x7f000000
800a22a: ea6f 0b0b mvn.w fp, fp
800a22e: bf34 ite cc
800a230: 2000 movcc r0, #0
800a232: 2001 movcs r0, #1
800a234: ea4f 7bdb mov.w fp, fp, lsr #31
800a238: 4558 cmp r0, fp
800a23a: d038 beq.n 800a2ae <powf+0x39e>
800a23c: ed9f 0a0b vldr s0, [pc, #44] ; 800a26c <powf+0x35c>
800a240: e753 b.n 800a0ea <powf+0x1da>
800a242: bf00 nop
800a244: f3af 8000 nop.w
800a248: ffd1d571 .word 0xffd1d571
800a24c: 405fffff .word 0x405fffff
800a250: 00000000 .word 0x00000000
800a254: c062a000 .word 0xc062a000
800a258: fefffffe .word 0xfefffffe
800a25c: 0800df38 .word 0x0800df38
800a260: 3ff00000 .word 0x3ff00000
800a264: 4b000000 .word 0x4b000000
800a268: c062c000 .word 0xc062c000
800a26c: 00000000 .word 0x00000000
800a270: 0800e078 .word 0x0800e078
800a274: f1bb 0f00 cmp.w fp, #0
800a278: f6bf af37 bge.w 800a0ea <powf+0x1da>
800a27c: eef7 7a00 vmov.f32 s15, #112 ; 0x3f800000 1.0
800a280: ee87 0a80 vdiv.f32 s0, s15, s0
800a284: e731 b.n 800a0ea <powf+0x1da>
800a286: f3c5 52c7 ubfx r2, r5, #23, #8
800a28a: f1a2 017f sub.w r1, r2, #127 ; 0x7f
800a28e: 2917 cmp r1, #23
800a290: d8a3 bhi.n 800a1da <powf+0x2ca>
800a292: f1c2 0096 rsb r0, r2, #150 ; 0x96
800a296: 2201 movs r2, #1
800a298: fa02 f000 lsl.w r0, r2, r0
800a29c: 1e41 subs r1, r0, #1
800a29e: 4229 tst r1, r5
800a2a0: d19b bne.n 800a1da <powf+0x2ca>
800a2a2: 4028 ands r0, r5
800a2a4: d09a beq.n 800a1dc <powf+0x2cc>
800a2a6: eeb1 0a40 vneg.f32 s0, s0
800a2aa: 4610 mov r0, r2
800a2ac: e796 b.n 800a1dc <powf+0x2cc>
800a2ae: ee07 5a90 vmov s15, r5
800a2b2: ee27 0aa7 vmul.f32 s0, s15, s15
800a2b6: e718 b.n 800a0ea <powf+0x1da>

User avatar
Giovanni
Site Admin
Posts: 12949
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 732 times
Been thanked: 609 times
Contact:

Re: FPU is not beeing used ?

Postby Giovanni » Mon Jan 06, 2020 9:46 am

This code internally calls double precision functions, apparently the library performs calculations in double precision internally.

__aeabi_f2d is for converting from float to double, __aeabi_dmul, __aeabi_dsub are double precision operations performed in SW.

I wounder if compiler options are OK in makefile, it should be:

USE_FPU_OPT = -mfloat-abi=$(USE_FPU) -mfpu=fpv4-sp-d16

It is also possible that the library is not designed for single precision at all, should look at the library code.

Giovanni

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 10:04 am

Yes from my Makefile

# Enables the use of FPU (no, softfp, hard).
ifeq ($(USE_FPU),)
USE_FPU = hard
endif

# FPU-related options.
ifeq ($(USE_FPU_OPT),)
USE_FPU_OPT = -mfloat-abi=$(USE_FPU) -mfpu=fpv4-sp-d16
endif

# Target settings.
MCU = cortex-m4

and of course:
# List all user libraries here
ULIBS = -lm

I've downloaded the source code of and have attached the pow() functions from ..\gcc-arm-none-eabi-7-2017-q4-major\src\newlib\newlib\libm\math
I guess the called funtion is "wf_pow.c"

Anithing else, I could do or check?
Attachments
pow.zip
(10.3 KiB) Downloaded 15 times

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 10:08 am

Indeed, the powf(float x, float y) function is casting the arguments to double :cry:
exc.arg1 = (double)x;
exc.arg2 = (double)y;

What is the aim, to provide a fpow() function for float and cast internally to double? For me this makes no sense...
=> well its still 4 times faster the pow() function for double.

User avatar
Giovanni
Site Admin
Posts: 12949
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 732 times
Been thanked: 609 times
Contact:

Re: FPU is not beeing used ?

Postby Giovanni » Mon Jan 06, 2020 10:12 am

That code is conditional, what is __OBSOLETE_MATH ?

You could try to include it in your project (so it is compiled with the same option of the whole application) and see if the result is the same.

Giovanni

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 10:21 am

How to find the definition of __OBSOLETE_MATH ???

Well, I'll do my own function, because for my purpose the exponent y is always an unsigned int, this makes things simple.
I was just wondering if I'm doing basically something wrong for using the FPU...

Thanks for having time to have look at my issue
Peter

User avatar
Giovanni
Site Admin
Posts: 12949
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 732 times
Been thanked: 609 times
Contact:

Re: FPU is not beeing used ?

Postby Giovanni » Mon Jan 06, 2020 10:25 am

I think you are not doing anything obviously wrong, most likely it is a library/compiler/combo/buildoptions problem.

It could be a good idea to involve the ARM people maintaining the compiler distribution.

Giovanni

User avatar
psavr
Posts: 26
Joined: Thu Feb 08, 2018 8:38 am
Location: Switzerland
Been thanked: 3 times

Re: FPU is not beeing used ?

Postby psavr » Mon Jan 06, 2020 10:41 am

Just to add: There is also a 2nd source directory: (I missed it)

\gcc-arm-none-eabi-7-2017-q4-major\src\newlib\newlib\libm\mathfp with "s_pow.c" and "sf_pow.c"
I guess this "sf_pow.c" would be the correct one, but for me: the asm code in the list file seems not to fit to this function.

Probably the math.lib should be rebuilt with the correct settings for optimal SP float FPU support.
Attachments
pow.zip
(10.3 KiB) Downloaded 20 times


Return to “Development and Feedback”

Who is online

Users browsing this forum: No registered users and 7 guests