d69e504577169c5f75803f1b97a42822898a78b3 - platform_external_arm-optimized-routines

commit	d69e504577169c5f75803f1b97a42822898a78b3	[log] [tgz]
author	Szabolcs Nagy <szabolcs.nagy@arm.com>	Tue Jun 05 16:15:27 2018 +0100
committer	Szabolcs Nagy <szabolcs.nagy@arm.com>	Wed Jun 06 16:17:19 2018 +0100
tree	6196f61c3386e50ad8257d6a1f21c90ef39dddb8
parent	a7711a35d57cae0c9fcf0cd61903bbf4701240cf [diff]

Add new log2 implementation

Similar algorithm is used as in log, but there are more operations
(and more error) due to the 1/ln2 multiplier.

There is separate code path when fma instruction is not available for
computing x/c - 1 precisely, for which the table size is doubled,
and to compute (x/c - 1)/ln2 precisely.

The worst case error is 0.547 ULP (0.55 without fma), the read only
global data size is 1168 bytes (2192 without fma).  The non-nearest
rounding error is less than 1 ULP.

Improvements on Cortex-A72 compared to current glibc master:
log latency: 2.04x
log thruput: 1.87x

math/include/mathlib.h[diff]
math/log2.c[Added - diff]
math/log2_data.c[Added - diff]
math/math_config.h[diff]
test/mathtest.c[diff]
test/testcases/directed/log2.tst[Added - diff]
test/testcases/random/double.tst[diff]

7 files changed

tree: 6196f61c3386e50ad8257d6a1f21c90ef39dddb8