Fixes for the ARM-specific bswap_16, bswap_32, and bswap_64.

1. Make the feature test work by excluding known-deficient processors, so
we don't have to maintain a complete list of all the processors that support
REV and REV16.

2. Don't abuse 'register' to get an effect similar to GCC's +l constraint,
but which was unnecessarily restrictive.

3. Fix __swap64md so _x isn't clobbered, breaking 64-bit swaps.

4. Make <byteswap.h> (which declars bswap_16 and friends) use <endian.h>
rather than <sys/endian.h>, so we get the machine-dependent implementations.

Change-Id: I6a38fad7a9fbe394aff141489617eb3883e1e944
diff --git a/libc/include/byteswap.h b/libc/include/byteswap.h
index 16d2ad4..74b0e91 100644
--- a/libc/include/byteswap.h
+++ b/libc/include/byteswap.h
@@ -28,7 +28,8 @@
 #ifndef _BYTESWAP_H_
 #define _BYTESWAP_H_
 
-#include <sys/endian.h>
+/* endian.h rather than sys/endian.h so we get the machine-specific file. */
+#include <endian.h>
 
 #define  bswap_16(x)   swap16(x)
 #define  bswap_32(x)   swap32(x)