[X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting.

This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337656 91177308-0d34-0410-b5e6-96231b3b80d8
3 files changed