AMDGPU: Custom lower v4i16/v4f16 vector operations

Avoids stack access.

Also handle extract hi elt pattern from truncate + shift
to avoid a couple test regressions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332453 91177308-0d34-0410-b5e6-96231b3b80d8
10 files changed