[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32.

This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.

I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.

I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326991 91177308-0d34-0410-b5e6-96231b3b80d8
11 files changed