AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls

This improves code for the same reasons as scalarizing 32-bit
element vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338418 91177308-0d34-0410-b5e6-96231b3b80d8
4 files changed