commit | a92a3c7b7858cd28a3c42e57e770839aad1f4441 | [log] [tgz] |
---|---|---|
author | Jake Weinstein <xboxlover360@gmail.com> | Thu Aug 25 20:03:25 2016 -0400 |
committer | Meninblack007 <sanyam.53jain@gmail.com> | Wed Mar 29 20:43:22 2017 +0530 |
tree | aa4c6799f53f1e9ae51159ef361730480e2e36cc | |
parent | 4c683977777441187277916a14a280b38f3edd7b [diff] |
libc: optimize memcpy and memmove for 32-bit kryo * Memcpy is based on Scorpion due to Qualcomm's 128-bit cache line size optimizations * Memmove is identical to Denver, but uses 128-bit cache line size. * PLDOFFSET and PLDSIZE taken from ARM64 Kryo memcpy routine * Tested on OnePlus 3 with MSM8996 Before: BM_string_memcpy/8 1000k 8 0.934 GiB/s BM_string_memcpy/64 1000k 11 5.785 GiB/s BM_string_memcpy/512 1000k 25 19.918 GiB/s BM_string_memcpy/1024 50M 42 23.938 GiB/s BM_string_memcpy/8Ki 10M 473 17.291 GiB/s BM_string_memcpy/16Ki 5M 565 28.976 GiB/s BM_string_memcpy/32Ki 1000k 1105 29.631 GiB/s BM_string_memcpy/64Ki 1000k 2194 29.864 GiB/s BM_string_memmove/8 1000k 9 0.816 GiB/s BM_string_memmove/64 1000k 12 5.126 GiB/s BM_string_memmove/512 1000k 27 18.544 GiB/s BM_string_memmove/1024 50M 46 22.108 GiB/s BM_string_memmove/8Ki 5M 323 25.323 GiB/s BM_string_memmove/16Ki 5M 641 25.544 GiB/s BM_string_memmove/32Ki 1000k 1235 26.523 GiB/s BM_string_memmove/64Ki 1000k 2428 26.984 GiB/s After: BM_string_memcpy/8 1000k 6 1.221 GiB/s BM_string_memcpy/64 1000k 7 8.601 GiB/s BM_string_memcpy/512 1000k 18 27.405 GiB/s BM_string_memcpy/1024 50M 33 30.354 GiB/s BM_string_memcpy/8Ki 10M 255 32.020 GiB/s BM_string_memcpy/16Ki 5M 527 31.055 GiB/s BM_string_memcpy/32Ki 1000k 1186 27.615 GiB/s BM_string_memcpy/64Ki 1000k 2365 27.710 GiB/s BM_string_memmove/8 1000k 7 1.007 GiB/s BM_string_memmove/64 1000k 9 6.425 GiB/s BM_string_memmove/512 1000k 21 24.302 GiB/s BM_string_memmove/1024 50M 37 27.002 GiB/s BM_string_memmove/8Ki 10M 284 28.759 GiB/s BM_string_memmove/16Ki 5M 585 27.959 GiB/s BM_string_memmove/32Ki 1000k 1335 24.542 GiB/s BM_string_memmove/64Ki 1000k 2651 24.719 GiB/s Change-Id: Id7a9c37ef75a306dd5cf8d374d79d0fe83f8a3ba
The C library. Stuff like fopen(3)
and kill(2)
.
The math library. Traditionally Unix systems kept stuff like sin(3)
and cos(3)
in a separate library to save space in the days before shared libraries.
The dynamic linker interface library. This is actually just a bunch of stubs that the dynamic linker replaces with pointers to its own implementation at runtime. This is where stuff like dlopen(3)
lives.
The C++ ABI support functions. The C++ compiler doesn't know how to implement thread-safe static initialization and the like, so it just calls functions that are supplied by the system. Stuff like __cxa_guard_acquire
and __cxa_pure_virtual
live here.
The dynamic linker. When you run a dynamically-linked executable, its ELF file has a DT_INTERP
entry that says "use the following program to start me". On Android, that's either linker
or linker64
(depending on whether it's a 32-bit or 64-bit executable). It's responsible for loading the ELF executable into memory and resolving references to symbols (so that when your code tries to jump to fopen(3)
, say, it lands in the right place).
The tests/
directory contains unit tests. Roughly arranged as one file per publicly-exported header file.
The benchmarks/
directory contains benchmarks.
Adding a system call usually involves:
As mentioned above, this is currently a two-step process:
This is fully automated (and these days handled by the libcore team, because they own icu, and that needs to be updated in sync with bionic):
If you make a change that is likely to have a wide effect on the tree (such as a libc header change), you should run make checkbuild
. A regular make
will not build the entire tree; just the minimum number of projects that are required for the device. Tests, additional developer tools, and various other modules will not be built. Note that make checkbuild
will not be complete either, as make tests
covers a few additional modules, but generally speaking make checkbuild
is enough.
The tests are all built from the tests/ directory.
$ mma $ adb remount $ adb sync $ adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests32 $ adb shell \ /data/nativetest/bionic-unit-tests-static/bionic-unit-tests-static32 # Only for 64-bit targets $ adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests64 $ adb shell \ /data/nativetest64/bionic-unit-tests-static/bionic-unit-tests-static64
The host tests require that you have lunch
ed either an x86 or x86_64 target.
$ mma $ mm bionic-unit-tests-run-on-host32 $ mm bionic-unit-tests-run-on-host64 # For 64-bit *targets* only.
As a way to check that our tests do in fact test the correct behavior (and not just the behavior we think is correct), it is possible to run the tests against the host's glibc. The executables are already in your path.
$ mma $ bionic-unit-tests-glibc32 $ bionic-unit-tests-glibc64
For either host or target coverage, you must first:
$ export NATIVE_COVERAGE=true
bionic_coverage=true
in libc/Android.mk
and libm/Android.mk
.$ mma $ adb sync $ adb shell \ GCOV_PREFIX=/data/local/tmp/gcov \ GCOV_PREFIX_STRIP=`echo $ANDROID_BUILD_TOP | grep -o / | wc -l` \ /data/nativetest/bionic-unit-tests/bionic-unit-tests32 $ acov
acov
will pull all coverage information from the device, push it to the right directories, run lcov
, and open the coverage report in your browser.
First, build and run the host tests as usual (see above).
$ croot $ lcov -c -d $ANDROID_PRODUCT_OUT -o coverage.info $ genhtml -o covreport coverage.info # or lcov --list coverage.info
The coverage report is now available at covreport/index.html
.
Bionic's test runner will run each test in its own process by default to prevent tests failures from impacting other tests. This also has the added benefit of running them in parallel, so they are much faster.
However, this also makes it difficult to run the tests under GDB. To prevent each test from being forked, run the tests with the flag --no-isolate
.
This probably belongs in the NDK documentation rather than here, but these are the known ABI bugs in the 32-bit ABI:
time_t
is 32-bit. http://b/5819737. In the 64-bit ABI, time_t is 64-bit.
off_t
is 32-bit. There is off64_t
, and in newer releases there is almost-complete support for _FILE_OFFSET_BITS
. Unfortunately our stdio implementation uses 32-bit offsets and -- worse -- function pointers to functions that use 32-bit offsets, so there's no good way to implement the last few pieces http://b/24807045. In the 64-bit ABI, off_t is off64_t.
sigset_t
is too small on ARM and x86 (but correct on MIPS), so support for real-time signals is broken. http://b/5828899 In the 64-bit ABI, sigset_t
is the correct size for every architecture.