[x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization)
Further perf tests on Jaguar indicate that:
vxorps %ymm0, %ymm0, %ymm0
vcmpps $15, %ymm0, %ymm0, %ymm0
is consistently faster (by about 9%) than:
vpcmpeqd %xmm0, %xmm0, %xmm0
vinsertf128 $1, %xmm0, %ymm0, %ymm0
Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.
Committed on behalf of @dtemirbulatov
Differential Revision: https://reviews.llvm.org/D32416
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302989 91177308-0d34-0410-b5e6-96231b3b80d8
9 files changed