string: optimize memcpy

Further optimize integer memcpy. Small cases now include copies up
to 32 bytes. 64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies. Comments have been rewritten.

Improves glibc's memcpy-random benchmark by ~10% on Neoverse N1.
1 file changed