make fast unaligned memory accesses implicit with SSE4.2 or SSE4a

This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.

Differential Revision: http://reviews.llvm.org/D12288



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245950 91177308-0d34-0410-b5e6-96231b3b80d8
2 files changed