[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS.

Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis.

This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346697 91177308-0d34-0410-b5e6-96231b3b80d8
2 files changed