Improve strcmp performance for misaligned strings

This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. Replace the simple byte-wise compare in the
misaligned case with a dword compare with page boundary checks in
place. For simplicity its uses a 4K page boundary so that it does not
have to query the actual page size on the system.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A64 (using the
new proposed memcmp input data from test_strcmp.xml):

  - Small improvement for aligned arguments with sizes up to 56 bytes
    (from 10% to 20%).

  - Large improvements for unaligned arguments for small sizes (from
    3 to 256 bytes).

Benchmark                              Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------
BM_string_strcmp/1/0/0              +0.0034         +0.0034            11            11            11            11
BM_string_strcmp/2/0/0              +0.0000         +0.0000            11            11            11            11
BM_string_strcmp/3/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/4/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/5/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/6/0/0              -0.1719         -0.1719            11             9            11             9
BM_string_strcmp/7/0/0              -0.1724         -0.1724            11             9            11             9
BM_string_strcmp/8/0/0              -0.1718         -0.1718            11             9            11             9
BM_string_strcmp/9/0/0              -0.2008         -0.2008            16            13            16            13
BM_string_strcmp/10/0/0             -0.2008         -0.2008            16            13            16            13
BM_string_strcmp/11/0/0             -0.2040         -0.2040            16            13            16            13
BM_string_strcmp/12/0/0             -0.1991         -0.1991            16            13            16            13
BM_string_strcmp/13/0/0             -0.1997         -0.1997            16            13            16            13
BM_string_strcmp/14/0/0             -0.1988         -0.1989            16            13            16            13
BM_string_strcmp/15/0/0             -0.2006         -0.2006            16            13            16            13
BM_string_strcmp/16/0/0             -0.2043         -0.2043            16            13            16            13
BM_string_strcmp/24/0/0             -0.1927         -0.1927            18            15            18            15
BM_string_strcmp/32/0/0             -0.1743         -0.1743            20            17            20            17
BM_string_strcmp/40/0/0             -0.1427         -0.1427            22            19            22            19
BM_string_strcmp/48/0/0             -0.1053         -0.1053            24            22            24            22
BM_string_strcmp/56/0/0             -0.0805         -0.0805            26            24            26            24
BM_string_strcmp/64/0/0             -0.0454         -0.0454            28            27            28            27
BM_string_strcmp/72/0/0             -0.0303         -0.0303            30            29            30            29
BM_string_strcmp/80/0/0             -0.0111         -0.0111            32            32            32            32
BM_string_strcmp/88/0/0             -0.0004         -0.0004            34            34            34            34
BM_string_strcmp/96/0/0             -0.0058         -0.0058            37            37            37            37
BM_string_strcmp/104/0/0            +0.0000         +0.0000            40            40            40            40
BM_string_strcmp/112/0/0            -0.0457         -0.0457            61            58            61            58
BM_string_strcmp/120/0/0            -0.0486         -0.0487            61            58            61            58
BM_string_strcmp/128/0/0            -0.0499         -0.0499            64            61            64            61
BM_string_strcmp/136/0/0            -0.0529         -0.0529            66            63            66            63
BM_string_strcmp/144/0/0            -0.0492         -0.0492            69            66            69            66
BM_string_strcmp/160/0/0            -0.0459         -0.0459            74            71            74            71
BM_string_strcmp/176/0/0            -0.0400         -0.0401            79            76            79            76
BM_string_strcmp/192/0/0            -0.0378         -0.0378            85            81            85            81
BM_string_strcmp/208/0/0            -0.0009         -0.0009            89            89            89            89
BM_string_strcmp/224/0/0            -0.0003         -0.0003            95            95            95            95
BM_string_strcmp/240/0/0            -0.0320         -0.0320           100            96           100            96
BM_string_strcmp/256/0/0            -0.0303         -0.0304           105           102           105           102
BM_string_strcmp/512/0/0            -0.0171         -0.0171           187           183           187           183
BM_string_strcmp/1024/0/0           -0.0091         -0.0091           350           347           350           347
BM_string_strcmp/8192/0/0           -0.0030         -0.0031          2668          2660          2668          2660
BM_string_strcmp/16384/0/0          +0.0007         +0.0007          5449          5452          5448          5452
BM_string_strcmp/32768/0/0          +0.0635         +0.0635         10868         11558         10867         11557
BM_string_strcmp/65536/0/0          -0.0017         -0.0017         21824         21786         21822         21784
BM_string_strcmp/131072/0/0         +0.0012         +0.0012         43485         43536         43480         43532
BM_string_strcmp/1/4/0              +0.7630         +0.7630             7            12             7            12
BM_string_strcmp/2/4/0              +0.9265         +0.9265            12            23            12            23
BM_string_strcmp/3/4/0              -0.0000         -0.0000            14            14            14            14
BM_string_strcmp/4/4/0              +0.0372         +0.0372            19            19            19            19
BM_string_strcmp/6/4/0              -0.0921         -0.0921            20            19            20            19
BM_string_strcmp/7/4/0              -0.0291         -0.0291            19            19            19            19
BM_string_strcmp/8/4/0              +0.0648         +0.0648            20            22            20            22
BM_string_strcmp/9/4/0              +0.0001         -0.0055            22            22            22            22
BM_string_strcmp/10/4/0             -0.1924         -0.1924            23            19            23            19
BM_string_strcmp/11/4/0             -0.2347         -0.2347            24            19            24            19
BM_string_strcmp/12/4/0             -0.2738         -0.2739            26            19            26            19
BM_string_strcmp/13/4/0             -0.3804         -0.3804            42            26            42            26
BM_string_strcmp/14/4/0             -0.3581         -0.3582            41            26            41            26
BM_string_strcmp/15/4/0             -0.3905         -0.3905            43            26            43            26
BM_string_strcmp/16/4/0             -0.4068         -0.4068            44            26            44            26
BM_string_strcmp/24/4/0             -0.4917         -0.4917            57            29            57            29
BM_string_strcmp/32/4/0             -0.5607         -0.5607            70            31            70            31
BM_string_strcmp/40/4/0             -0.5940         -0.5940            82            33            82            33
BM_string_strcmp/48/4/0             -0.5303         -0.5302            95            45            95            45
BM_string_strcmp/56/4/0             -0.4975         -0.4975           108            54           108            54
BM_string_strcmp/64/4/0             -0.5167         -0.5167           121            58           121            58
BM_string_strcmp/72/4/0             -0.5325         -0.5325           133            62           133            62
BM_string_strcmp/80/4/0             -0.5523         -0.5523           146            65           146            65
BM_string_strcmp/88/4/0             -0.5686         -0.5686           159            69           159            69
BM_string_strcmp/96/4/0             -0.5815         -0.5815           172            72           172            72
BM_string_strcmp/104/4/0            -0.5931         -0.5931           185            75           185            75
BM_string_strcmp/112/4/0            -0.6046         -0.6046           197            78           197            78
BM_string_strcmp/120/4/0            -0.6113         -0.6113           210            82           210            82
BM_string_strcmp/128/4/0            -0.6186         -0.6186           223            85           223            85
BM_string_strcmp/136/4/0            -0.6278         -0.6278           237            88           237            88
BM_string_strcmp/144/4/0            -0.6410         -0.6410           253            91           253            91
BM_string_strcmp/160/4/0            -0.6506         -0.6506           280            98           280            98
BM_string_strcmp/176/4/0            -0.6593         -0.6593           304           104           304           104
BM_string_strcmp/192/4/0            -0.6647         -0.6647           330           111           330           111
BM_string_strcmp/208/4/0            -0.6741         -0.6741           357           116           357           116
BM_string_strcmp/224/4/0            -0.6761         -0.6761           381           123           381           123
BM_string_strcmp/240/4/0            -0.6824         -0.6824           406           129           406           129
BM_string_strcmp/256/4/0            -0.6846         -0.6846           432           136           432           136
BM_string_strcmp/1/0/4              +1.0024         +1.0024             7            14             7            14
BM_string_strcmp/2/0/4              +0.1591         +0.1591            12            14            12            14
BM_string_strcmp/3/0/4              -0.0015         -0.0015            14            14            14            14
BM_string_strcmp/4/0/4              -0.0809         -0.0809            15            14            15            14
BM_string_strcmp/5/0/4              -0.1535         -0.1536            17            14            17            14
BM_string_strcmp/6/0/4              -0.2111         -0.2111            18            14            18            14
BM_string_strcmp/7/0/4              -0.2650         -0.2650            19            14            19            14
BM_string_strcmp/8/0/4              -0.3118         -0.3118            20            14            20            14
BM_string_strcmp/9/0/4              -0.1741         -0.1740            22            18            22            18
BM_string_strcmp/10/0/4             -0.2201         -0.2201            23            18            23            18
BM_string_strcmp/11/0/4             -0.2610         -0.2610            24            18            24            18
BM_string_strcmp/12/0/4             -0.2987         -0.2987            26            18            26            18
BM_string_strcmp/13/0/4             -0.5748         -0.5748            42            18            42            18
BM_string_strcmp/14/0/4             -0.5796         -0.5796            43            18            43            18
BM_string_strcmp/15/0/4             -0.6167         -0.6167            47            18            47            18
BM_string_strcmp/16/0/4             -0.6303         -0.6303            49            18            49            18
BM_string_strcmp/24/0/4             -0.6557         -0.6557            61            21            61            21
BM_string_strcmp/32/0/4             -0.6612         -0.6612            70            24            70            24
BM_string_strcmp/40/0/4             -0.6812         -0.6813            82            26            82            26
BM_string_strcmp/48/0/4             -0.6974         -0.6974            95            29            95            29
BM_string_strcmp/56/0/4             -0.7151         -0.7151           108            31           108            31
BM_string_strcmp/64/0/4             -0.5717         -0.5717           121            52           121            52
BM_string_strcmp/72/0/4             -0.5927         -0.5927           134            54           134            54
BM_string_strcmp/80/0/4             -0.6004         -0.6004           146            58           146            58
BM_string_strcmp/88/0/4             -0.6145         -0.6145           159            61           159            61
BM_string_strcmp/96/0/4             -0.6287         -0.6287           172            64           172            64
BM_string_strcmp/104/0/4            -0.6351         -0.6351           185            67           185            67
BM_string_strcmp/112/0/4            -0.6423         -0.6423           197            71           197            71
BM_string_strcmp/120/0/4            -0.6489         -0.6489           210            74           210            74
BM_string_strcmp/128/0/4            -0.6578         -0.6578           223            76           223            76
BM_string_strcmp/136/0/4            -0.6597         -0.6597           236            80           236            80
BM_string_strcmp/144/0/4            -0.6674         -0.6674           250            83           250            83
BM_string_strcmp/160/0/4            -0.6751         -0.6751           274            89           274            89
BM_string_strcmp/176/0/4            -0.6798         -0.6798           300            96           300            96
BM_string_strcmp/192/0/4            -0.6873         -0.6855           327           102           325           102
BM_string_strcmp/208/0/4            -0.6903         -0.6903           351           109           351           109
BM_string_strcmp/224/0/4            -0.6907         -0.6907           376           116           376           116
BM_string_strcmp/240/0/4            -0.6897         -0.6897           402           125           402           125
BM_string_strcmp/256/0/4            -0.6937         -0.6937           427           131           427           131
BM_string_strcmp/1/4/4              +0.0009         +0.0009            14            14            14            14
BM_string_strcmp/2/4/4              -0.2229         -0.2229            14            11            14            11
BM_string_strcmp/3/4/4              -0.2256         -0.2256            14            11            14            11
BM_string_strcmp/4/4/4              -0.2241         -0.2240            14            11            14            11
BM_string_strcmp/5/4/4              -0.2220         -0.2220            20            15            20            15
BM_string_strcmp/6/4/4              -0.2267         -0.2267            20            15            20            15
BM_string_strcmp/7/4/4              -0.2228         -0.2227            20            15            20            15
BM_string_strcmp/8/4/4              -0.2219         -0.2219            20            15            20            15
BM_string_strcmp/9/4/4              -0.2220         -0.2220            20            15            20            15
BM_string_strcmp/10/4/4             -0.2227         -0.2227            20            15            20            15
BM_string_strcmp/11/4/4             -0.2210         -0.2210            20            15            20            15
BM_string_strcmp/12/4/4             -0.2224         -0.2224            20            15            20            15
BM_string_strcmp/13/4/4             -0.1778         -0.1778            21            17            21            17
BM_string_strcmp/14/4/4             -0.1863         -0.1863            21            17            21            17
BM_string_strcmp/15/4/4             -0.1780         -0.1780            21            17            21            17
BM_string_strcmp/16/4/4             +0.0031         +0.0031            21            21            21            21
BM_string_strcmp/24/4/4             +0.0041         +0.0041            24            24            24            24
BM_string_strcmp/32/4/4             -0.0001         -0.0000            25            25            25            25
BM_string_strcmp/40/4/4             +0.0016         +0.0016            26            26            26            26
BM_string_strcmp/48/4/4             +0.0001         +0.0001            28            28            28            28
BM_string_strcmp/56/4/4             -0.0001         -0.0001            30            30            30            30
BM_string_strcmp/64/4/4             -0.0342         -0.0342            32            31            32            31
BM_string_strcmp/72/4/4             -0.0186         -0.0186            34            34            34            34
BM_string_strcmp/80/4/4             +0.0004         +0.0004            36            36            36            36
BM_string_strcmp/88/4/4             -0.0000         -0.0000            39            39            39            39
BM_string_strcmp/96/4/4             -0.0510         -0.0510            62            59            62            59
BM_string_strcmp/104/4/4            -0.0502         -0.0502            63            60            63            60
BM_string_strcmp/112/4/4            -0.0490         -0.0490            65            62            65            62
BM_string_strcmp/120/4/4            -0.0387         -0.0387            67            65            67            65
BM_string_strcmp/128/4/4            -0.0426         -0.0426            70            67            70            67
BM_string_strcmp/136/4/4            -0.0408         -0.0408            73            70            73            70
BM_string_strcmp/144/4/4            -0.0194         -0.0194            75            74            75            74
BM_string_strcmp/160/4/4            -0.0035         -0.0035            81            81            81            81
BM_string_strcmp/176/4/4            -0.0001         -0.0001            86            86            86            86
BM_string_strcmp/192/4/4            -0.0002         -0.0002            91            91            91            91
BM_string_strcmp/208/4/4            -0.0335         -0.0335            96            93            96            93
BM_string_strcmp/224/4/4            -0.0314         -0.0314           101            98           101            98
BM_string_strcmp/240/4/4            -0.0303         -0.0303           106           103           106           103
BM_string_strcmp/256/4/4            -0.0288         -0.0288           111           108           111           108

[1] Commit id: f98f2a6780d686ca3d44f8011c7823d42d9b083a

Test: bionic tests and benchmarks on aarch64.
Change-Id: I75f8948782b8bd459d21f15e75e1d420905f5e5a
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
1 file changed
tree: 6eae8c82a3d5bf99422185b593144d03b0c01254
  1. benchmarks/
  2. build/
  3. docs/
  4. libc/
  5. libdl/
  6. libm/
  7. libstdc++/
  8. linker/
  9. tests/
  10. tools/
  11. .clang-format
  12. .gitignore
  13. .gitreview
  14. android-changes-for-ndk-developers.md
  15. Android.bp
  16. Android.mk
  17. CleanSpec.mk
  18. CPPLINT.cfg
  19. OWNERS
  20. PREUPLOAD.cfg
  21. README.md
README.md

Using bionic

See the additional documentation.

Working on bionic

What are the big pieces of bionic?

libc/ --- libc.so, libc.a

The C library. Stuff like fopen(3) and kill(2).

libm/ --- libm.so, libm.a

The math library. Traditionally Unix systems kept stuff like sin(3) and cos(3) in a separate library to save space in the days before shared libraries.

libdl/ --- libdl.so

The dynamic linker interface library. This is actually just a bunch of stubs that the dynamic linker replaces with pointers to its own implementation at runtime. This is where stuff like dlopen(3) lives.

libstdc++/ --- libstdc++.so

The C++ ABI support functions. The C++ compiler doesn't know how to implement thread-safe static initialization and the like, so it just calls functions that are supplied by the system. Stuff like __cxa_guard_acquire and __cxa_pure_virtual live here.

linker/ --- /system/bin/linker and /system/bin/linker64

The dynamic linker. When you run a dynamically-linked executable, its ELF file has a DT_INTERP entry that says "use the following program to start me". On Android, that's either linker or linker64 (depending on whether it's a 32-bit or 64-bit executable). It's responsible for loading the ELF executable into memory and resolving references to symbols (so that when your code tries to jump to fopen(3), say, it lands in the right place).

tests/ --- unit tests

The tests/ directory contains unit tests. Roughly arranged as one file per publicly-exported header file.

benchmarks/ --- benchmarks

The benchmarks/ directory contains benchmarks, with its own documentation.

What's in libc/?

Adding libc wrappers for system calls

The first question you should ask is "should I add a libc wrapper for this system call?". The answer is usually "no".

The answer is "yes" if the system call is part of the POSIX standard.

The answer is probably "yes" if the system call has a wrapper in at least one other C library.

The answer may be "yes" if the system call has three/four distinct users in different projects, and there isn't a more specific library that would make more sense as the place to add the wrapper.

In all other cases, you should use syscall(3) instead.

Adding a system call usually involves:

  1. Add entries to SYSCALLS.TXT. See SYSCALLS.TXT itself for documentation on the format.
  2. Run the gensyscalls.py script.
  3. Add constants (and perhaps types) to the appropriate header file. Note that you should check to see whether the constants are already in kernel uapi header files, in which case you just need to make sure that the appropriate POSIX header file in libc/include/ includes the relevant file or files.
  4. Add function declarations to the appropriate header file. Don't forget to include the appropriate __INTRODUCED_IN().
  5. Add the function name to the correct section in libc/libc.map.txt and run ./libc/tools/genversion-scripts.py.
  6. Add at least basic tests. Even a test that deliberately supplies an invalid argument helps check that we're generating the right symbol and have the right declaration in the header file, and that you correctly updated the maps in step 5. (You can use strace(1) to confirm that the correct system call is being made.)

Updating kernel header files

As mentioned above, this is currently a two-step process:

  1. Use generate_uapi_headers.sh to go from a Linux source tree to appropriate contents for external/kernel-headers/.
  2. Run update_all.py to scrub those headers and import them into bionic.

Note that if you're actually just trying to expose device-specific headers to build your device drivers, you shouldn't modify bionic. Instead use TARGET_DEVICE_KERNEL_HEADERS and friends described in config.mk.

Updating tzdata

This is fully automated (and these days handled by the libcore team, because they own icu, and that needs to be updated in sync with bionic):

  1. Run update-tzdata.py in external/icu/tools/.

Verifying changes

If you make a change that is likely to have a wide effect on the tree (such as a libc header change), you should run make checkbuild. A regular make will not build the entire tree; just the minimum number of projects that are required for the device. Tests, additional developer tools, and various other modules will not be built. Note that make checkbuild will not be complete either, as make tests covers a few additional modules, but generally speaking make checkbuild is enough.

Running the tests

The tests are all built from the tests/ directory.

Device tests

$ mma # In $ANDROID_ROOT/bionic.
$ adb root && adb remount && adb sync
$ adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests
$ adb shell \
    /data/nativetest/bionic-unit-tests-static/bionic-unit-tests-static
# Only for 64-bit targets
$ adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests
$ adb shell \
    /data/nativetest64/bionic-unit-tests-static/bionic-unit-tests-static

Note that we use our own custom gtest runner that offers a superset of the options documented at https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#running-test-programs-advanced-options, in particular for test isolation and parallelism (both on by default).

Device tests via CTS

Most of the unit tests are executed by CTS. By default, CTS runs as a non-root user, so the unit tests must also pass when not run as root. Some tests cannot do any useful work unless run as root. In this case, the test should check getuid() == 0 and do nothing otherwise (typically we log in this case to prevent accidents!). Obviously, if the test can be rewritten to not require root, that's an even better solution.

Currently, the list of bionic CTS tests is generated at build time by running a host version of the test executable and dumping the list of all tests. In order for this to continue to work, all architectures must have the same number of tests, and the host version of the executable must also have the same number of tests.

Running the gtests directly is orders of magnitude faster than using CTS, but in cases where you really have to run CTS:

$ make cts # In $ANDROID_ROOT.
$ adb unroot # Because real CTS doesn't run as root.
# This will sync any *test* changes, but not *code* changes:
$ cts-tradefed \
    run singleCommand cts --skip-preconditions -m CtsBionicTestCases

Host tests

The host tests require that you have lunched either an x86 or x86_64 target. Note that due to ABI limitations (specifically, the size of pthread_mutex_t), 32-bit bionic requires PIDs less than 65536. To enforce this, set /proc/sys/kernel/pid_max to 65536.

$ ./tests/run-on-host.sh 32
$ ./tests/run-on-host.sh 64   # For x86_64-bit *targets* only.

You can supply gtest flags as extra arguments to this script.

Against glibc

As a way to check that our tests do in fact test the correct behavior (and not just the behavior we think is correct), it is possible to run the tests against the host's glibc.

$ ./tests/run-on-host.sh glibc

Gathering test coverage

For either host or target coverage, you must first:

  • $ export NATIVE_COVERAGE=true
    • Note that the build system is ignorant to this flag being toggled, i.e. if you change this flag, you will have to manually rebuild bionic.
  • Set bionic_coverage=true in libc/Android.mk and libm/Android.mk.

Coverage from device tests

$ mma
$ adb sync
$ adb shell \
    GCOV_PREFIX=/data/local/tmp/gcov \
    GCOV_PREFIX_STRIP=`echo $ANDROID_BUILD_TOP | grep -o / | wc -l` \
    /data/nativetest/bionic-unit-tests/bionic-unit-tests
$ acov

acov will pull all coverage information from the device, push it to the right directories, run lcov, and open the coverage report in your browser.

Coverage from host tests

First, build and run the host tests as usual (see above).

$ croot
$ lcov -c -d $ANDROID_PRODUCT_OUT -o coverage.info
$ genhtml -o covreport coverage.info # or lcov --list coverage.info

The coverage report is now available at covreport/index.html.

Attaching GDB to the tests

Bionic's test runner will run each test in its own process by default to prevent tests failures from impacting other tests. This also has the added benefit of running them in parallel, so they are much faster.

However, this also makes it difficult to run the tests under GDB. To prevent each test from being forked, run the tests with the flag --no-isolate.

32-bit ABI bugs

See 32-bit ABI bugs.