Merge "simpleperf: update simpleperf prebuilts to build 4445499."
diff --git a/simpleperf/demo/README.md b/simpleperf/demo/README.md
index 2c5f2b3..2a293c5 100644
--- a/simpleperf/demo/README.md
+++ b/simpleperf/demo/README.md
@@ -10,7 +10,7 @@
 ## Introduction
 
 Simpleperf is a native profiler used on Android platform. It can be used to profile Android
-applications. It's document is at [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md).
+applications. Its documentation is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md).
 Instructions of preparing your Android application for profiling are [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md#Android-application-profiling).
 This directory is to show examples of using simpleperf to profile Android applications. The
 meaning of each directory is as below:
@@ -22,20 +22,30 @@
 
 It can be downloaded as below:
 
-    $ git clone https://android.googlesource.com/platform/system/extras
-    $ cd extras/simpleperf/demo
+```sh
+$ git clone https://android.googlesource.com/platform/system/extras
+$ cd extras/simpleperf/demo
+```
 
-## Profiling Java application
+The testing environment:
 
-    Android Studio project: SimpleExamplePureJava
-    test device: Android O (Google Pixel XL)
-    test device: Android N (Google Nexus 5X)
+```
+Android Studio 3.0
+test device: Android O (Google Pixel 2)
+test device: Android N (Google Nexus 6P)
+Please make sure your device having Android version >= N.
+```
+
+## Profile a Java application
+
+Android Studio project: SimpleExamplePureJava
 
 steps:
-1. Build and install app:
-```
+1. Build and install the application:
+
+```sh
 # Open SimpleperfExamplesPureJava project with Android Studio,
-# and build this project sucessfully, otherwise the `./gradlew` command below will fail.
+# and build this project successfully, otherwise the `./gradlew` command below will fail.
 $ cd SimpleperfExamplePureJava
 
 # On windows, use "gradlew" instead.
@@ -44,32 +54,28 @@
 ```
 
 2. Record profiling data:
-```
+
+```sh
 $ cd ../../scripts/
+# app_profiler.py collects profiling data in perf.data, and binaries on device in binary_cache/.
 $ python app_profiler.py -p com.example.simpleperf.simpleperfexamplepurejava
 ```
 
 3. Show profiling data:
-```
-a. show call graph in txt mode
-    $ python report.py -g | more
-b. show call graph in gui mode
-    $ python report.py -g --gui
-c. show samples in source code
-    $ python annotate.py -s ../demo/SimpleperfExamplePureJava
-    $ find annotated_files -name "MainActivity.java"
-        check the annoated source file MainActivity.java.
+
+```sh
+# report_html.py generates profiling result in report.html.
+$ python report_html.py --add_source_code --source_dirs ../demo --add_disassembly
 ```
 
-## Profiling Java/C++ application
+## Profile a Java/C++ application
 
-    Android Studio project: SimpleExampleWithNative
-    test device: Android O (Google Pixel XL)
-    test device: Android N (Google Nexus 5X)
+Android Studio project: SimpleExampleWithNative
 
 steps:
-1. Build and install app:
-```
+1. Build and install the application:
+
+```sh
 # Open SimpleperfExamplesWithNative project with Android Studio,
 # and build this project sucessfully, otherwise the `./gradlew` command below will fail.
 $ cd SimpleperfExampleWithNative
@@ -80,33 +86,28 @@
 ```
 
 2. Record profiling data:
-```
+
+```sh
 $ cd ../../scripts/
+# app_profiler.py collects profiling data in perf.data, and binaries on device in binary_cache/.
 $ python app_profiler.py -p com.example.simpleperf.simpleperfexamplewithnative
-    It runs the application and collects profiling data in perf.data, binaries on device in binary_cache/.
 ```
 
 3. Show profiling data:
-```
-a. show call graph in txt mode
-    $ python report.py -g | more
-b. show call graph in gui mode
-    $ python report.py -g --gui
-c. show samples in source code
-    $ python annotate.py -s ../demo/SimpleperfExampleWithNative
-    $ find annotated_files -name "native-lib.cpp"
-        check the annoated source file native-lib.cpp.
+
+```sh
+# report_html.py generates profiling result in report.html.
+$ python report_html.py --add_source_code --source_dirs ../demo --add_disassembly
 ```
 
-## Profiling Kotlin application
+## Profile a Kotlin application
 
-    Android Studio project: SimpleExampleOfKotlin
-    test device: Android O (Google Pixel XL)
-    test device: Android N (Google Nexus 5X)
+Android Studio project: SimpleExampleOfKotlin
 
 steps:
-1. Build and install app:
-```
+1. Build and install the application:
+
+```sh
 # Open SimpleperfExamplesOfKotlin project with Android Studio,
 # and build this project sucessfully, otherwise the `./gradlew` command below will fail.
 $ cd SimpleperfExampleOfKotlin
@@ -117,19 +118,16 @@
 ```
 
 2. Record profiling data:
-```
+
+```sh
 $ cd ../../scripts/
+# app_profiler.py collects profiling data in perf.data, and binaries on device in binary_cache/.
 $ python app_profiler.py -p com.example.simpleperf.simpleperfexampleofkotlin
-    It runs the application and collects profiling data in perf.data, binaries on device in binary_cache/.
 ```
 
 3. Show profiling data:
-```
-a. show call graph in txt mode
-    $ python report.py -g | more
-b. show call graph in gui mode
-    $ python report.py -g --gui
-c. show samples in source code
-    $ python annotate.py -s ../demo/SimpleperfExampleOfKotlin
-    $ find . -name "MainActivity.kt"
+
+```sh
+# report_html.py generates profiling result in report.html.
+$ python report_html.py --add_source_code --source_dirs ../demo --add_disassembly
 ```
diff --git a/simpleperf/doc/README.md b/simpleperf/doc/README.md
index db58485..a82eecc 100644
--- a/simpleperf/doc/README.md
+++ b/simpleperf/doc/README.md
@@ -15,12 +15,6 @@
 - [Simpleperf introduction](#simpleperf-introduction)
     - [Why simpleperf](#why-simpleperf)
     - [Tools in simpleperf](#tools-in-simpleperf)
-    - [Simpleperf's profiling principle](#simpleperfs-profiling-principle)
-    - [Main simpleperf commands](#main-simpleperf-commands)
-        - [Simpleperf list](#simpleperf-list)
-        - [Simpleperf stat](#simpleperf-stat)
-        - [Simpleperf record](#simpleperf-record)
-        - [Simpleperf report](#simpleperf-report)
 - [Android application profiling](#android-application-profiling)
     - [Prepare an Android application](#prepare-an-android-application)
     - [Record and report profiling data (using command-lines)](#record-and-report-profiling-data-using-commandlines)
@@ -30,6 +24,13 @@
     - [Annotate source code](#annotate-source-code)
     - [Trace offcpu time](#trace-offcpu-time)
     - [Profile from launch of an application](#profile-from-launch-of-an-application)
+- [Executable commands reference](#executable-commands-reference)
+    - [Simpleperf's profiling principle](#simpleperfs-profiling-principle)
+    - [Main simpleperf commands](#main-simpleperf-commands)
+        - [Simpleperf list](#simpleperf-list)
+        - [Simpleperf stat](#simpleperf-stat)
+        - [Simpleperf record](#simpleperf-record)
+        - [Simpleperf report](#simpleperf-report)
 - [Answers to common issues](#answers-to-common-issues)
     - [Why we suggest profiling on android >= N devices](#why-we-suggest-profiling-on-android-n-devices)
 
@@ -110,361 +111,6 @@
 `simpleperf_report_lib.py` provides a python interface for parsing profiling data.
 
 
-### Simpleperf's profiling principle
-
-Modern CPUs have a hardware component called the performance monitoring unit
-(PMU). The PMU has several hardware counters, counting events like how many cpu
-cycles have happened, how many instructions have executed, or how many cache
-misses have happened.
-
-The Linux kernel wraps these hardware counters into hardware perf events. In
-addition, the Linux kernel also provides hardware independent software events
-and tracepoint events. The Linux kernel exposes all this to userspace via the
-perf_event_open system call, which simpleperf uses.
-
-Simpleperf has three main functions: stat, record and report.
-
-The stat command gives a summary of how many events have happened in the
-profiled processes in a time period. Here’s how it works:
-1. Given user options, simpleperf enables profiling by making a system call to
-linux kernel.
-2. Linux kernel enables counters while scheduling on the profiled processes.
-3. After profiling, simpleperf reads counters from linux kernel, and reports a
-counter summary.
-
-The record command records samples of the profiled process in a time period.
-Here’s how it works:
-1. Given user options, simpleperf enables profiling by making a system call to
-linux kernel.
-2. Simpleperf creates mapped buffers between simpleperf and linux kernel.
-3. Linux kernel enable counters while scheduling on the profiled processes.
-4. Each time a given number of events happen, linux kernel dumps a sample to a
-mapped buffer.
-5. Simpleperf reads samples from the mapped buffers and generates perf.data.
-
-The report command reads a "perf.data" file and any shared libraries used by
-the profiled processes, and outputs a report showing where the time was spent.
-
-
-### Main simpleperf commands
-
-Simpleperf supports several subcommands, including list, stat, record and report.
-Each subcommand supports different options. This section only covers the most
-important subcommands and options. To see all subcommands and options,
-use --help.
-
-    # List all subcommands.
-    $ simpleperf --help
-
-    # Print help message for record subcommand.
-    $ simpleperf record --help
-
-
-#### Simpleperf list
-
-simpleperf list is used to list all events available on the device. Different
-devices may support different events because of differences in hardware and
-kernel.
-
-    $ simpleperf list
-    List of hw-cache events:
-      branch-loads
-      ...
-    List of hardware events:
-      cpu-cycles
-      instructions
-      ...
-    List of software events:
-      cpu-clock
-      task-clock
-      ...
-
-
-#### Simpleperf stat
-
-simpleperf stat is used to get a raw event counter information of the profiled program
-or system-wide. By passing options, we can select which events to use, which
-processes/threads to monitor, how long to monitor and the print interval.
-Below is an example.
-
-    # Stat using default events (cpu-cycles,instructions,...), and monitor
-    # process 7394 for 10 seconds.
-    $ simpleperf stat -p 7394 --duration 10
-    Performance counter statistics:
-
-     1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
-       510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
-         4,692,338  branch-misses      # 468.118 K/sec                    (100%)
-    886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
-               753  context-switches   # 75.121 /sec                      (100%)
-               870  page-faults        # 86.793 /sec                      (100%)
-
-    Total test time: 10.023829 seconds.
-
-**Select events**
-We can select which events to use via -e option. Below are examples:
-
-    # Stat event cpu-cycles.
-    $ simpleperf stat -e cpu-cycles -p 11904 --duration 10
-
-    # Stat event cache-references and cache-misses.
-    $ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
-
-When running the stat command, if the number of hardware events is larger than
-the number of hardware counters available in the PMU, the kernel shares hardware
-counters between events, so each event is only monitored for part of the total
-time. In the example below, there is a percentage at the end of each row,
-showing the percentage of the total time that each event was actually monitored.
-
-    # Stat using event cache-references, cache-references:u,....
-    $ simpleperf stat -p 7394 -e     cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
-    Performance counter statistics:
-
-    4,331,018  cache-references     # 4.861 M/sec    (87%)
-    3,064,089  cache-references:u   # 3.439 M/sec    (87%)
-    1,364,959  cache-references:k   # 1.532 M/sec    (87%)
-       91,721  cache-misses         # 102.918 K/sec  (87%)
-       45,735  cache-misses:u       # 51.327 K/sec   (87%)
-       38,447  cache-misses:k       # 43.131 K/sec   (87%)
-    9,688,515  instructions         # 10.561 M/sec   (89%)
-
-    Total test time: 1.026802 seconds.
-
-In the example above, each event is monitored about 87% of the total time. But
-there is no guarantee that any pair of events are always monitored at the same
-time. If we want to have some events monitored at the same time, we can use
---group option. Below is an example.
-
-    # Stat using event cache-references, cache-references:u,....
-    $ simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
-    Performance counter statistics:
-
-    3,638,900  cache-references     # 4.786 M/sec          (74%)
-       65,171  cache-misses         # 1.790953% miss rate  (74%)
-    2,390,433  cache-references:u   # 3.153 M/sec          (74%)
-       32,280  cache-misses:u       # 1.350383% miss rate  (74%)
-      879,035  cache-references:k   # 1.251 M/sec          (68%)
-       30,303  cache-misses:k       # 3.447303% miss rate  (68%)
-    8,921,161  instructions         # 10.070 M/sec         (86%)
-
-    Total test time: 1.029843 seconds.
-
-**Select target to monitor**
-We can select which processes or threads to monitor via -p option or -t option.
-Monitoring a process is the same as monitoring all threads in the process.
-Simpleperf can also fork a child process to run the new command and then monitor
-the child process. Below are examples.
-
-    # Stat process 11904 and 11905.
-    $ simpleperf stat -p 11904,11905 --duration 10
-
-    # Stat thread 11904 and 11905.
-    $ simpleperf stat -t 11904,11905 --duration 10
-
-    # Start a child process running `ls`, and stat it.
-    $ simpleperf stat ls
-
-**Decide how long to monitor**
-When monitoring existing threads, we can use --duration option to decide how long
-to monitor. When monitoring a child process running a new command, simpleperf
-monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
-at any time. Below are examples.
-
-    # Stat process 11904 for 10 seconds.
-    $ simpleperf stat -p 11904 --duration 10
-
-    # Stat until the child process running `ls` finishes.
-    $ simpleperf stat ls
-
-    # Stop monitoring using Ctrl-C.
-    $ simpleperf stat -p 11904 --duration 10
-    ^C
-
-**Decide the print interval**
-When monitoring perf counters, we can also use --interval option to decide the print
-interval. Below are examples.
-
-    # Print stat for process 11904 every 300ms.
-    $ simpleperf stat -p 11904 --duration 10 --interval 300
-
-    # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
-    # system wide profiling needs root privilege
-    $ su 0 simpleperf stat -a --duration 10 --interval 300
-
-**Display counters in systrace**
-simpleperf can also work with systrace to dump counters in the collected trace.
-Below is an example to do a system wide stat
-
-    # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
-    $ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
-    # on host launch systrace to collect trace for 10 seconds
-    (HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
-    # open the collected new.html in browser and perf counters will be shown up
-
-
-#### Simpleperf record
-
-simpleperf record is used to dump records of the profiled program. By passing
-options, we can select which events to use, which processes/threads to monitor,
-what frequency to dump records, how long to monitor, and where to store records.
-
-    # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
-    # using default sample frequency (4000 samples per second), writing records
-    # to perf.data.
-    $ simpleperf record -p 7394 --duration 10
-    simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
-
-**Select events**
-In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
-As a hardware event, it is both accurate and efficient. We can also use other
-events via -e option. Below is an example.
-
-    # Record using event instructions.
-    $ simpleperf record -e instructions -p 11904 --duration 10
-
-**Select target to monitor**
-The way to select target in record command is similar to that in stat command.
-Below are examples.
-
-    # Record process 11904 and 11905.
-    $ simpleperf record -p 11904,11905 --duration 10
-
-    # Record thread 11904 and 11905.
-    $ simpleperf record -t 11904,11905 --duration 10
-
-    # Record a child process running `ls`.
-    $ simpleperf record ls
-
-**Set the frequency to record**
-We can set the frequency to dump records via the -f or -c options. For example,
--f 4000 means dumping approximately 4000 records every second when the monitored
-thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
-or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
-records every second. Another way is using -c option. For example, -c 10000
-means dumping one record whenever 10000 events happen. Below are examples.
-
-    # Record with sample frequency 1000: sample 1000 times every second running.
-    $ simpleperf record -f 1000 -p 11904,11905 --duration 10
-
-    # Record with sample period 100000: sample 1 time every 100000 events.
-    $ simpleperf record -c 100000 -t 11904,11905 --duration 10
-
-**Decide how long to monitor**
-The way to decide how long to monitor in record command is similar to that in
-stat command. Below are examples.
-
-    # Record process 11904 for 10 seconds.
-    $ simpleperf record -p 11904 --duration 10
-
-    # Record until the child process running `ls` finishes.
-    $ simpleperf record ls
-
-    # Stop monitoring using Ctrl-C.
-    $ simpleperf record -p 11904 --duration 10
-    ^C
-
-**Set the path to store records**
-By default, simpleperf stores records in perf.data in current directory. We can
-use -o option to set the path to store records. Below is an example.
-
-    # Write records to data/perf2.data.
-    $ simpleperf record -p 11904 -o data/perf2.data --duration 10
-
-
-#### Simpleperf report
-
-simpleperf report is used to report based on perf.data generated by simpleperf
-record command. Report command groups records into different sample entries,
-sorts sample entries based on how many events each sample entry contains, and
-prints out each sample entry. By passing options, we can select where to find
-perf.data and executable binaries used by the monitored program, filter out
-uninteresting records, and decide how to group records.
-
-Below is an example. Records are grouped into 4 sample entries, each entry is
-a row. There are several columns, each column shows piece of information
-belonging to a sample entry. The first column is Overhead, which shows the
-percentage of events inside current sample entry in total events. As the
-perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
-time used in each function.
-
-    # Reports perf.data, using only records sampled in libsudo-game-jni.so,
-    # grouping records using thread name(comm), process id(pid), thread id(tid),
-    # function name(symbol), and showing sample count for each row.
-    $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
-    Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
-    Arch: arm64
-    Event: cpu-cycles (type 0, config 0)
-    Samples: 28235
-    Event count: 546356211
-
-    Overhead  Sample  Command    Pid   Tid   Symbol
-    59.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
-    20.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
-    13.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
-    6.24%     1756    sudogame  7394  7394  @plt
-
-**Set the path to read records**
-By default, simpleperf reads perf.data in current directory. We can use -i
-option to select another file to read records.
-
-    $ simpleperf report -i data/perf2.data
-
-**Set the path to find executable binaries**
-If reporting function symbols, simpleperf needs to read executable binaries
-used by the monitored processes to get symbol table and debug information. By
-default, the paths are the executable binaries used by monitored processes while
-recording. However, these binaries may not exist when reporting or not contain
-symbol table and debug information. So we can use --symfs to redirect the paths.
-Below is an example.
-
-    $ simpleperf report
-    # In this case, when simpleperf wants to read executable binary /A/b,
-    # it reads file in /A/b.
-
-    $ simpleperf report --symfs /debug_dir
-    # In this case, when simpleperf wants to read executable binary /A/b,
-    # it prefers file in /debug_dir/A/b to file in /A/b.
-
-**Filter records**
-When reporting, it happens that not all records are of interest. Simpleperf
-supports five filters to select records of interest. Below are examples.
-
-    # Report records in threads having name sudogame.
-    $ simpleperf report --comms sudogame
-
-    # Report records in process 7394 or 7395
-    $ simpleperf report --pids 7394,7395
-
-    # Report records in thread 7394 or 7395.
-    $ simpleperf report --tids 7394,7395
-
-    # Report records in libsudo-game-jni.so.
-    $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
-
-    # Report records in function checkValid or canFindSolution_r.
-    $ simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
-
-**Decide how to group records into sample entries**
-Simpleperf uses --sort option to decide how to group sample entries. Below are
-examples.
-
-    # Group records based on their process id: records having the same process
-    # id are in the same sample entry.
-    $ simpleperf report --sort pid
-
-    # Group records based on their thread id and thread comm: records having
-    # the same thread id and thread name are in the same sample entry.
-    $ simpleperf report --sort tid,comm
-
-    # Group records based on their binary and function: records in the same
-    # binary and function are in the same sample entry.
-    $ simpleperf report --sort dso,symbol
-
-    # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
-    # thread, and belong to the same function in the same binary.
-    $ simpleperf report
-
-
 ## Android application profiling
 
 This section shows how to profile an Android application.
@@ -889,6 +535,362 @@
       -a .MainActivity --arch arm64 -r "-g -e cpu-cycles:u --duration 1" \
       --profile_from_launch
 
+## Executable commands reference
+
+### Simpleperf's profiling principle
+
+Modern CPUs have a hardware component called the performance monitoring unit
+(PMU). The PMU has several hardware counters, counting events like how many cpu
+cycles have happened, how many instructions have executed, or how many cache
+misses have happened.
+
+The Linux kernel wraps these hardware counters into hardware perf events. In
+addition, the Linux kernel also provides hardware independent software events
+and tracepoint events. The Linux kernel exposes all this to userspace via the
+perf_event_open system call, which simpleperf uses.
+
+Simpleperf has three main functions: stat, record and report.
+
+The stat command gives a summary of how many events have happened in the
+profiled processes in a time period. Here’s how it works:
+1. Given user options, simpleperf enables profiling by making a system call to
+linux kernel.
+2. Linux kernel enables counters while scheduling on the profiled processes.
+3. After profiling, simpleperf reads counters from linux kernel, and reports a
+counter summary.
+
+The record command records samples of the profiled process in a time period.
+Here’s how it works:
+1. Given user options, simpleperf enables profiling by making a system call to
+linux kernel.
+2. Simpleperf creates mapped buffers between simpleperf and linux kernel.
+3. Linux kernel enable counters while scheduling on the profiled processes.
+4. Each time a given number of events happen, linux kernel dumps a sample to a
+mapped buffer.
+5. Simpleperf reads samples from the mapped buffers and generates perf.data.
+
+The report command reads a "perf.data" file and any shared libraries used by
+the profiled processes, and outputs a report showing where the time was spent.
+
+
+### Main simpleperf commands
+
+Simpleperf supports several subcommands, including list, stat, record and report.
+Each subcommand supports different options. This section only covers the most
+important subcommands and options. To see all subcommands and options,
+use --help.
+
+    # List all subcommands.
+    $ simpleperf --help
+
+    # Print help message for record subcommand.
+    $ simpleperf record --help
+
+
+#### Simpleperf list
+
+simpleperf list is used to list all events available on the device. Different
+devices may support different events because of differences in hardware and
+kernel.
+
+    $ simpleperf list
+    List of hw-cache events:
+      branch-loads
+      ...
+    List of hardware events:
+      cpu-cycles
+      instructions
+      ...
+    List of software events:
+      cpu-clock
+      task-clock
+      ...
+
+
+#### Simpleperf stat
+
+simpleperf stat is used to get a raw event counter information of the profiled program
+or system-wide. By passing options, we can select which events to use, which
+processes/threads to monitor, how long to monitor and the print interval.
+Below is an example.
+
+    # Stat using default events (cpu-cycles,instructions,...), and monitor
+    # process 7394 for 10 seconds.
+    $ simpleperf stat -p 7394 --duration 10
+    Performance counter statistics:
+
+     1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
+       510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
+         4,692,338  branch-misses      # 468.118 K/sec                    (100%)
+    886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
+               753  context-switches   # 75.121 /sec                      (100%)
+               870  page-faults        # 86.793 /sec                      (100%)
+
+    Total test time: 10.023829 seconds.
+
+**Select events**
+We can select which events to use via -e option. Below are examples:
+
+    # Stat event cpu-cycles.
+    $ simpleperf stat -e cpu-cycles -p 11904 --duration 10
+
+    # Stat event cache-references and cache-misses.
+    $ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
+
+When running the stat command, if the number of hardware events is larger than
+the number of hardware counters available in the PMU, the kernel shares hardware
+counters between events, so each event is only monitored for part of the total
+time. In the example below, there is a percentage at the end of each row,
+showing the percentage of the total time that each event was actually monitored.
+
+    # Stat using event cache-references, cache-references:u,....
+    $ simpleperf stat -p 7394 -e     cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
+    Performance counter statistics:
+
+    4,331,018  cache-references     # 4.861 M/sec    (87%)
+    3,064,089  cache-references:u   # 3.439 M/sec    (87%)
+    1,364,959  cache-references:k   # 1.532 M/sec    (87%)
+       91,721  cache-misses         # 102.918 K/sec  (87%)
+       45,735  cache-misses:u       # 51.327 K/sec   (87%)
+       38,447  cache-misses:k       # 43.131 K/sec   (87%)
+    9,688,515  instructions         # 10.561 M/sec   (89%)
+
+    Total test time: 1.026802 seconds.
+
+In the example above, each event is monitored about 87% of the total time. But
+there is no guarantee that any pair of events are always monitored at the same
+time. If we want to have some events monitored at the same time, we can use
+--group option. Below is an example.
+
+    # Stat using event cache-references, cache-references:u,....
+    $ simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
+    Performance counter statistics:
+
+    3,638,900  cache-references     # 4.786 M/sec          (74%)
+       65,171  cache-misses         # 1.790953% miss rate  (74%)
+    2,390,433  cache-references:u   # 3.153 M/sec          (74%)
+       32,280  cache-misses:u       # 1.350383% miss rate  (74%)
+      879,035  cache-references:k   # 1.251 M/sec          (68%)
+       30,303  cache-misses:k       # 3.447303% miss rate  (68%)
+    8,921,161  instructions         # 10.070 M/sec         (86%)
+
+    Total test time: 1.029843 seconds.
+
+**Select target to monitor**
+We can select which processes or threads to monitor via -p option or -t option.
+Monitoring a process is the same as monitoring all threads in the process.
+Simpleperf can also fork a child process to run the new command and then monitor
+the child process. Below are examples.
+
+    # Stat process 11904 and 11905.
+    $ simpleperf stat -p 11904,11905 --duration 10
+
+    # Stat thread 11904 and 11905.
+    $ simpleperf stat -t 11904,11905 --duration 10
+
+    # Start a child process running `ls`, and stat it.
+    $ simpleperf stat ls
+
+**Decide how long to monitor**
+When monitoring existing threads, we can use --duration option to decide how long
+to monitor. When monitoring a child process running a new command, simpleperf
+monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
+at any time. Below are examples.
+
+    # Stat process 11904 for 10 seconds.
+    $ simpleperf stat -p 11904 --duration 10
+
+    # Stat until the child process running `ls` finishes.
+    $ simpleperf stat ls
+
+    # Stop monitoring using Ctrl-C.
+    $ simpleperf stat -p 11904 --duration 10
+    ^C
+
+**Decide the print interval**
+When monitoring perf counters, we can also use --interval option to decide the print
+interval. Below are examples.
+
+    # Print stat for process 11904 every 300ms.
+    $ simpleperf stat -p 11904 --duration 10 --interval 300
+
+    # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
+    # system wide profiling needs root privilege
+    $ su 0 simpleperf stat -a --duration 10 --interval 300
+
+**Display counters in systrace**
+simpleperf can also work with systrace to dump counters in the collected trace.
+Below is an example to do a system wide stat
+
+    # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
+    $ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
+    # on host launch systrace to collect trace for 10 seconds
+    (HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
+    # open the collected new.html in browser and perf counters will be shown up
+
+
+#### Simpleperf record
+
+simpleperf record is used to dump records of the profiled program. By passing
+options, we can select which events to use, which processes/threads to monitor,
+what frequency to dump records, how long to monitor, and where to store records.
+
+    # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
+    # using default sample frequency (4000 samples per second), writing records
+    # to perf.data.
+    $ simpleperf record -p 7394 --duration 10
+    simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
+
+**Select events**
+In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
+As a hardware event, it is both accurate and efficient. We can also use other
+events via -e option. Below is an example.
+
+    # Record using event instructions.
+    $ simpleperf record -e instructions -p 11904 --duration 10
+
+**Select target to monitor**
+The way to select target in record command is similar to that in stat command.
+Below are examples.
+
+    # Record process 11904 and 11905.
+    $ simpleperf record -p 11904,11905 --duration 10
+
+    # Record thread 11904 and 11905.
+    $ simpleperf record -t 11904,11905 --duration 10
+
+    # Record a child process running `ls`.
+    $ simpleperf record ls
+
+**Set the frequency to record**
+We can set the frequency to dump records via the -f or -c options. For example,
+-f 4000 means dumping approximately 4000 records every second when the monitored
+thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
+or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
+records every second. Another way is using -c option. For example, -c 10000
+means dumping one record whenever 10000 events happen. Below are examples.
+
+    # Record with sample frequency 1000: sample 1000 times every second running.
+    $ simpleperf record -f 1000 -p 11904,11905 --duration 10
+
+    # Record with sample period 100000: sample 1 time every 100000 events.
+    $ simpleperf record -c 100000 -t 11904,11905 --duration 10
+
+**Decide how long to monitor**
+The way to decide how long to monitor in record command is similar to that in
+stat command. Below are examples.
+
+    # Record process 11904 for 10 seconds.
+    $ simpleperf record -p 11904 --duration 10
+
+    # Record until the child process running `ls` finishes.
+    $ simpleperf record ls
+
+    # Stop monitoring using Ctrl-C.
+    $ simpleperf record -p 11904 --duration 10
+    ^C
+
+**Set the path to store records**
+By default, simpleperf stores records in perf.data in current directory. We can
+use -o option to set the path to store records. Below is an example.
+
+    # Write records to data/perf2.data.
+    $ simpleperf record -p 11904 -o data/perf2.data --duration 10
+
+
+#### Simpleperf report
+
+simpleperf report is used to report based on perf.data generated by simpleperf
+record command. Report command groups records into different sample entries,
+sorts sample entries based on how many events each sample entry contains, and
+prints out each sample entry. By passing options, we can select where to find
+perf.data and executable binaries used by the monitored program, filter out
+uninteresting records, and decide how to group records.
+
+Below is an example. Records are grouped into 4 sample entries, each entry is
+a row. There are several columns, each column shows piece of information
+belonging to a sample entry. The first column is Overhead, which shows the
+percentage of events inside current sample entry in total events. As the
+perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
+time used in each function.
+
+    # Reports perf.data, using only records sampled in libsudo-game-jni.so,
+    # grouping records using thread name(comm), process id(pid), thread id(tid),
+    # function name(symbol), and showing sample count for each row.
+    $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
+    Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
+    Arch: arm64
+    Event: cpu-cycles (type 0, config 0)
+    Samples: 28235
+    Event count: 546356211
+
+    Overhead  Sample  Command    Pid   Tid   Symbol
+    59.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
+    20.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
+    13.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
+    6.24%     1756    sudogame  7394  7394  @plt
+
+**Set the path to read records**
+By default, simpleperf reads perf.data in current directory. We can use -i
+option to select another file to read records.
+
+    $ simpleperf report -i data/perf2.data
+
+**Set the path to find executable binaries**
+If reporting function symbols, simpleperf needs to read executable binaries
+used by the monitored processes to get symbol table and debug information. By
+default, the paths are the executable binaries used by monitored processes while
+recording. However, these binaries may not exist when reporting or not contain
+symbol table and debug information. So we can use --symfs to redirect the paths.
+Below is an example.
+
+    $ simpleperf report
+    # In this case, when simpleperf wants to read executable binary /A/b,
+    # it reads file in /A/b.
+
+    $ simpleperf report --symfs /debug_dir
+    # In this case, when simpleperf wants to read executable binary /A/b,
+    # it prefers file in /debug_dir/A/b to file in /A/b.
+
+**Filter records**
+When reporting, it happens that not all records are of interest. Simpleperf
+supports five filters to select records of interest. Below are examples.
+
+    # Report records in threads having name sudogame.
+    $ simpleperf report --comms sudogame
+
+    # Report records in process 7394 or 7395
+    $ simpleperf report --pids 7394,7395
+
+    # Report records in thread 7394 or 7395.
+    $ simpleperf report --tids 7394,7395
+
+    # Report records in libsudo-game-jni.so.
+    $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
+
+    # Report records in function checkValid or canFindSolution_r.
+    $ simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
+
+**Decide how to group records into sample entries**
+Simpleperf uses --sort option to decide how to group sample entries. Below are
+examples.
+
+    # Group records based on their process id: records having the same process
+    # id are in the same sample entry.
+    $ simpleperf report --sort pid
+
+    # Group records based on their thread id and thread comm: records having
+    # the same thread id and thread name are in the same sample entry.
+    $ simpleperf report --sort tid,comm
+
+    # Group records based on their binary and function: records in the same
+    # binary and function are in the same sample entry.
+    $ simpleperf report --sort dso,symbol
+
+    # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
+    # thread, and belong to the same function in the same binary.
+    $ simpleperf report
+
 
 ## Answers to common issues