blob: f27db9e57edc6c4a8d93764ba39178eb6e094911 [file] [log] [blame]
Clement Courbetfd68be22018-04-04 11:37:06 +00001llvm-exegesis - LLVM Machine Instruction Benchmark
2==================================================
3
4SYNOPSIS
5--------
6
7:program:`llvm-exegesis` [*options*]
8
9DESCRIPTION
10-----------
11
12:program:`llvm-exegesis` is a benchmarking tool that uses information available
13in LLVM to measure host machine instruction characteristics like latency or port
14decomposition.
15
16Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
17generates a code snippet that makes execution as serial (resp. as parallel) as
18possible so that we can measure the latency (resp. uop decomposition) of the
19instruction.
20The code snippet is jitted and executed on the host subtarget. The time taken
21(resp. resource usage) is measured using hardware performance counters. The
22result is printed out as YAML to the standard output.
23
24The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet26520532018-05-18 12:33:57 +000025scheduling models. To that end, we also provide analysis of the results.
26
Clement Courbet956d5f32018-09-25 07:31:44 +000027:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
28snippets.
29
30EXAMPLE 1: benchmarking instructions
31------------------------------------
Clement Courbet26520532018-05-18 12:33:57 +000032
33Assume you have an X86-64 machine. To measure the latency of a single
34instruction, run:
35
36.. code-block:: bash
37
38 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
39
40Measuring the uop decomposition of an instruction works similarly:
41
42.. code-block:: bash
43
44 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
45
46The output is a YAML document (the default is to write to stdout, but you can
47redirect the output to a file using `-benchmarks-file`):
48
49.. code-block:: none
50
51 ---
52 key:
53 opcode_name: ADD64rr
54 mode: latency
55 config: ''
56 cpu_name: haswell
57 llvm_triple: x86_64-unknown-linux-gnu
58 num_repetitions: 10000
59 measurements:
60 - { key: latency, value: 1.0058, debug_string: '' }
61 error: ''
62 info: 'explicit self cycles, selecting one aliasing configuration.
63 Snippet:
64 ADD64rr R8, R8, R10
65 '
66 ...
67
68To measure the latency of all instructions for the host architecture, run:
69
70.. code-block:: bash
71
72 #!/bin/bash
Clement Courbet1a0db442018-06-01 14:49:06 +000073 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
Clement Courbet26520532018-05-18 12:33:57 +000074 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
75 do
76 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
77 done
78
79FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
80
Clement Courbet956d5f32018-09-25 07:31:44 +000081
82EXAMPLE 2: benchmarking a custom code snippet
83---------------------------------------------
84
85To measure the latency/uops of a custom piece of code, you can specify the
86`snippets-file` option (`-` reads from standard input).
87
88.. code-block:: bash
89
90 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
91
92Real-life code snippets typically depend on registers or memory.
93:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
94use has a corresponding def or is a "live in"). If your code depends on the
95value of some registers, you have two options:
Clement Courbetac267172018-09-25 07:48:38 +000096
97- Mark the register as requiring a definition. :program:`llvm-exegesis` will
98 automatically assign a value to the register. This can be done using the
99 directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
100 is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
101 the register width, it will be sign-extended.
102- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
103 using whatever value was in this registers on entry. This can be done using
104 the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet956d5f32018-09-25 07:31:44 +0000105
106For example, the following code snippet depends on the values of XMM1 (which
107will be set by the tool) and the memory buffer passed in RDI (live in).
108
109.. code-block:: none
110
111 # LLVM-EXEGESIS-LIVEIN RDI
112 # LLVM-EXEGESIS-DEFREG XMM1 42
113 vmulps (%rdi), %xmm1, %xmm2
114 vhaddps %xmm2, %xmm2, %xmm3
115 addq $0x10, %rdi
116
117
118EXAMPLE 3: analysis
119-------------------
Clement Courbet26520532018-05-18 12:33:57 +0000120
121Assuming you have a set of benchmarked instructions (either latency or uops) as
122YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
123following command:
124
125.. code-block:: bash
126
127 $ llvm-exegesis -mode=analysis \
128 -benchmarks-file=/tmp/benchmarks.yaml \
129 -analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim8a1edcb2018-09-27 13:49:52 +0000130 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet26520532018-05-18 12:33:57 +0000131
132This will group the instructions into clusters with the same performance
133characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
134following format:
135
136.. code-block:: none
137
138 cluster_id,opcode_name,config,sched_class
139 ...
140 2,ADD32ri8_DB,,WriteALU,1.00
141 2,ADD32ri_DB,,WriteALU,1.01
142 2,ADD32rr,,WriteALU,1.01
143 2,ADD32rr_DB,,WriteALU,1.00
144 2,ADD32rr_REV,,WriteALU,1.00
145 2,ADD64i32,,WriteALU,1.01
146 2,ADD64ri32,,WriteALU,1.01
147 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
148 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
149 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
150 2,ADD64ri8,,WriteALU,1.00
151 2,SETBr,,WriteSETCC,1.01
152 ...
153
154:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet1e7d8b32018-05-22 13:36:29 +0000155inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet7273b3f2018-05-24 10:47:05 +0000156example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet26520532018-05-18 12:33:57 +0000157
Clement Courbet7273b3f2018-05-24 10:47:05 +0000158.. image:: llvm-exegesis-analysis.png
159 :align: center
Clement Courbet26520532018-05-18 12:33:57 +0000160
161Note that the scheduling class names will be resolved only when
162:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
163be shown. This does not invalidate any of the analysis results though.
164
Clement Courbetfd68be22018-04-04 11:37:06 +0000165
166OPTIONS
167-------
168
169.. option:: -help
170
171 Print a summary of command line options.
172
173.. option:: -opcode-index=<LLVM opcode index>
174
Clement Courbet956d5f32018-09-25 07:31:44 +0000175 Specify the opcode to measure, by index. See example 1 for details.
176 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetfd68be22018-04-04 11:37:06 +0000177
Clement Courbet0a1aef02018-10-17 15:04:15 +0000178.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbetfd68be22018-04-04 11:37:06 +0000179
Clement Courbet0a1aef02018-10-17 15:04:15 +0000180 Specify the opcode to measure, by name. Several opcodes can be specified as
181 a comma-separated list. See example 1 for details.
Clement Courbet956d5f32018-09-25 07:31:44 +0000182 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
183
184 .. option:: -snippets-file=<filename>
185
186 Specify the custom code snippet to measure. See example 2 for details.
187 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetfd68be22018-04-04 11:37:06 +0000188
Clement Courbet26520532018-05-18 12:33:57 +0000189.. option:: -mode=[latency|uops|analysis]
Clement Courbetfd68be22018-04-04 11:37:06 +0000190
Clement Courbet26520532018-05-18 12:33:57 +0000191 Specify the run mode.
Clement Courbetfd68be22018-04-04 11:37:06 +0000192
193.. option:: -num-repetitions=<Number of repetition>
194
195 Specify the number of repetitions of the asm snippet.
196 Higher values lead to more accurate measurements but lengthen the benchmark.
197
Simon Pilgrim6cc4ab32018-06-18 20:05:02 +0000198.. option:: -benchmarks-file=</path/to/file>
Clement Courbet26520532018-05-18 12:33:57 +0000199
200 File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
201 results. "-" uses stdin/stdout.
202
203.. option:: -analysis-clusters-output-file=</path/to/file>
204
205 If provided, write the analysis clusters as CSV to this file. "-" prints to
206 stdout.
207
208.. option:: -analysis-inconsistencies-output-file=</path/to/file>
209
210 If non-empty, write inconsistencies found during analysis to this file. `-`
211 prints to stdout.
212
213.. option:: -analysis-numpoints=<dbscan numPoints parameter>
214
215 Specify the numPoints parameters to be used for DBSCAN clustering
216 (`analysis` mode).
217
218.. option:: -analysis-espilon=<dbscan epsilon parameter>
219
220 Specify the numPoints parameters to be used for DBSCAN clustering
221 (`analysis` mode).
222
Simon Pilgrim6cc4ab32018-06-18 20:05:02 +0000223.. option:: -ignore-invalid-sched-class=false
Clement Courbet288479f2018-06-18 11:27:47 +0000224
Simon Pilgrim6cc4ab32018-06-18 20:05:02 +0000225 If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet288479f2018-06-18 11:27:47 +0000226
Clement Courbetf4fb61b2018-10-25 07:44:01 +0000227 .. option:: -mcpu=<cpu name>
228
229 If set, measure the cpu characteristics using the counters for this CPU. This
230 is useful when creating new sched models (the host CPU is unknown to LLVM).
Clement Courbetfd68be22018-04-04 11:37:06 +0000231
232EXIT STATUS
233-----------
234
235:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
236printed to standard error, and the tool returns a non 0 value.