libc/arch-arm/bionic/memcpy.a9.S: memcpy from cortex-strings.

This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.

This patch is a follow up the existing gerrit change:

I7f6f77995f3ca903ad9c66d14261441667a2a935

This version includes a tweak for performance on misaligned
buffers and splits the header comment into license and
documentation sections.

Change-Id: Ibd2e23c8d8e01357ba0247be1d05192de3ceba69
Signed-off-by: Will Newton <will.newton@linaro.org>
diff --git a/libc/arch-arm/bionic/memcpy.a9.S b/libc/arch-arm/bionic/memcpy.a9.S
index 550989a..2ba1ff5 100644
--- a/libc/arch-arm/bionic/memcpy.a9.S
+++ b/libc/arch-arm/bionic/memcpy.a9.S
@@ -28,6 +28,9 @@
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+ */
+
+/*
    This memcpy routine is optimised for Cortex-A15 cores and takes advantage
    of VFP or NEON when built with the appropriate flags.