[X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.

Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325534 91177308-0d34-0410-b5e6-96231b3b80d8
14 files changed