ded60aa7f8bd05700454795dad69b1f42ca3bda2 - platform_external_llvm80

commit	ded60aa7f8bd05700454795dad69b1f42ca3bda2	[log] [tgz]
author	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	Wed Jun 06 22:22:32 2018 +0000
committer	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	Wed Jun 06 22:22:32 2018 +0000
tree	66e07c7f700afee70e063b0f552ea366e003a932
parent	e07c2606ba321b13198fa3c222a5076075d73cf3 [diff]

[AMDGPU] Improve reciprocal handling

When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

    bool c = fabs(x) > 0x1.0p+96f;
    float s = c ? 0x1.0p-32f : 1.0f;
    x *= s;
    return s * v_rcp_f32(x)

in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.

The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.

OpenCL conformance passed with both enabled and disabled denorms.

Differential Revision: https://reviews.llvm.org/D47805

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334142 91177308-0d34-0410-b5e6-96231b3b80d8

2 files changed

tree: 66e07c7f700afee70e063b0f552ea366e003a932