b10a96eb1c3bc4f60aceb86f6bcb473895b06d3b - platform_external_llvm

commit	b10a96eb1c3bc4f60aceb86f6bcb473895b06d3b	[log] [tgz]
author	Sanjay Patel <spatel@rotateright.com>	Thu Dec 15 18:03:38 2016 +0000
committer	Sanjay Patel <spatel@rotateright.com>	Thu Dec 15 18:03:38 2016 +0000
tree	48790deb33a892f5d443a5773e13e68c0c3a3a16
parent	6677747efbc6d4d7e5cab4077a573c2bbc144d13 [diff]

[x86] use a single shufps when it can save instructions This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289837 91177308-0d34-0410-b5e6-96231b3b80d8