a6230320c149a8e1ae790433fe0828e6060c53fa - platform_external_arm-optimized-routines

commit	a6230320c149a8e1ae790433fe0828e6060c53fa	[log] [tgz]
author	Szabolcs Nagy <szabolcs.nagy@arm.com>	Thu Jun 21 17:53:15 2018 +0100
committer	Szabolcs Nagy <szabolcs.nagy@arm.com>	Fri Jun 22 14:38:31 2018 +0100
tree	a39a631bc71a5786cda7746fceb7d04d5007ca24
parent	db6e4e96bea641fff18006803c4b1eea19d664f7 [diff]

Improve pow implementation

The log part of pow got rewritten to use a slightly different algorithm.
This improves precision and throughput while keeps the same table size.

Near 1 cases are no longer special cased, there is a slight performance
regression in that case.  And when the fma instruction is not available
this algorithm is expected to have slightly worse performance.

Worst-case error improved from 0.67 ULP to 0.57 ULP.

On Cortex-A72 i see
thruput near 1:  7% worse
latency near 1:  2% worse
thruput general: 8% better
latency general: 2% better

math/math_config.h[diff]
math/pow.c[diff]
math/pow_log_data.c[diff]

3 files changed

tree: a39a631bc71a5786cda7746fceb7d04d5007ca24

auxiliary/
math/
test/
.gitignore
config.mk.dist
LICENSE
Makefile
README