[AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16

This further improves Ahmed's change in rL299482.  See the new comment for the
rationale.

The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.

Differential Revision: https://reviews.llvm.org/D32028

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300276 91177308-0d34-0410-b5e6-96231b3b80d8
4 files changed