[X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS

Summary:
The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates.

For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch.

Reviewers: spatel

Reviewed By: spatel

Subscribers: RKSimon, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D46498

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331606 91177308-0d34-0410-b5e6-96231b3b80d8
4 files changed