Blame - docs/CommandGuide/llvm-exegesis.rst - platform_external_llvm80

blob: f27db9e57edc6c4a8d93764ba39178eb6e094911 [file] [log] [blame]

Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
				4	SYNOPSIS
				5	--------
				6
				7	:program:`llvm-exegesis` [options]
				8
				9	DESCRIPTION
				10	-----------
				11
				12	:program:`llvm-exegesis` is a benchmarking tool that uses information available
				13	in LLVM to measure host machine instruction characteristics like latency or port
				14	decomposition.
				15
				16	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				17	generates a code snippet that makes execution as serial (resp. as parallel) as
				18	possible so that we can measure the latency (resp. uop decomposition) of the
				19	instruction.
				20	The code snippet is jitted and executed on the host subtarget. The time taken
				21	(resp. resource usage) is measured using hardware performance counters. The
				22	result is printed out as YAML to the standard output.
				23
				24	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	25	scheduling models. To that end, we also provide analysis of the results.
				26
Clement Courbet	956d5f3	2018-09-25 07:31:44 +0000	[diff] [blame]	27	:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
				28	snippets.
				29
				30	EXAMPLE 1: benchmarking instructions
				31	------------------------------------
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	32
				33	Assume you have an X86-64 machine. To measure the latency of a single
				34	instruction, run:
				35
				36	.. code-block:: bash
				37
				38	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				39
				40	Measuring the uop decomposition of an instruction works similarly:
				41
				42	.. code-block:: bash
				43
				44	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
				45
				46	The output is a YAML document (the default is to write to stdout, but you can
				47	redirect the output to a file using `-benchmarks-file`):
				48
				49	.. code-block:: none
				50
				51	---
				52	key:
				53	opcode_name: ADD64rr
				54	mode: latency
				55	config: ''
				56	cpu_name: haswell
				57	llvm_triple: x86_64-unknown-linux-gnu
				58	num_repetitions: 10000
				59	measurements:
				60	- { key: latency, value: 1.0058, debug_string: '' }
				61	error: ''
				62	info: 'explicit self cycles, selecting one aliasing configuration.
				63	Snippet:
				64	ADD64rr R8, R8, R10
				65	'
				66	...
				67
				68	To measure the latency of all instructions for the host architecture, run:
				69
				70	.. code-block:: bash
				71
				72	#!/bin/bash
Clement Courbet	1a0db44	2018-06-01 14:49:06 +0000	[diff] [blame]	73	readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=) - 1))
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	74	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				75	do
				76	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				77	done
				78
				79	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				80
Clement Courbet	956d5f3	2018-09-25 07:31:44 +0000	[diff] [blame]	81
				82	EXAMPLE 2: benchmarking a custom code snippet
				83	---------------------------------------------
				84
				85	To measure the latency/uops of a custom piece of code, you can specify the
				86	`snippets-file` option (`-` reads from standard input).
				87
				88	.. code-block:: bash
				89
				90	$ echo "vzeroupper" \| llvm-exegesis -mode=uops -snippets-file=-
				91
				92	Real-life code snippets typically depend on registers or memory.
				93	:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
				94	use has a corresponding def or is a "live in"). If your code depends on the
				95	value of some registers, you have two options:
Clement Courbet	ac26717	2018-09-25 07:48:38 +0000	[diff] [blame]	96
				97	- Mark the register as requiring a definition. :program:`llvm-exegesis` will
				98	automatically assign a value to the register. This can be done using the
				99	directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
				100	is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
				101	the register width, it will be sign-extended.
				102	- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
				103	using whatever value was in this registers on entry. This can be done using
				104	the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet	956d5f3	2018-09-25 07:31:44 +0000	[diff] [blame]	105
				106	For example, the following code snippet depends on the values of XMM1 (which
				107	will be set by the tool) and the memory buffer passed in RDI (live in).
				108
				109	.. code-block:: none
				110
				111	# LLVM-EXEGESIS-LIVEIN RDI
				112	# LLVM-EXEGESIS-DEFREG XMM1 42
				113	vmulps (%rdi), %xmm1, %xmm2
				114	vhaddps %xmm2, %xmm2, %xmm3
				115	addq $0x10, %rdi
				116
				117
				118	EXAMPLE 3: analysis
				119	-------------------
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	120
				121	Assuming you have a set of benchmarked instructions (either latency or uops) as
				122	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				123	following command:
				124
				125	.. code-block:: bash
				126
				127	$ llvm-exegesis -mode=analysis \
				128	-benchmarks-file=/tmp/benchmarks.yaml \
				129	-analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim	8a1edcb	2018-09-27 13:49:52 +0000	[diff] [blame]	130	-analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	131
				132	This will group the instructions into clusters with the same performance
				133	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				134	following format:
				135
				136	.. code-block:: none
				137
				138	cluster_id,opcode_name,config,sched_class
				139	...
				140	2,ADD32ri8_DB,,WriteALU,1.00
				141	2,ADD32ri_DB,,WriteALU,1.01
				142	2,ADD32rr,,WriteALU,1.01
				143	2,ADD32rr_DB,,WriteALU,1.00
				144	2,ADD32rr_REV,,WriteALU,1.00
				145	2,ADD64i32,,WriteALU,1.01
				146	2,ADD64ri32,,WriteALU,1.01
				147	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				148	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				149	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				150	2,ADD64ri8,,WriteALU,1.00
				151	2,SETBr,,WriteSETCC,1.01
				152	...
				153
				154	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	1e7d8b3	2018-05-22 13:36:29 +0000	[diff] [blame]	155	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	7273b3f	2018-05-24 10:47:05 +0000	[diff] [blame]	156	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	157
Clement Courbet	7273b3f	2018-05-24 10:47:05 +0000	[diff] [blame]	158	.. image:: llvm-exegesis-analysis.png
				159	:align: center
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	160
				161	Note that the scheduling class names will be resolved only when
				162	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				163	be shown. This does not invalidate any of the analysis results though.
				164
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	165
				166	OPTIONS
				167	-------
				168
				169	.. option:: -help
				170
				171	Print a summary of command line options.
				172
				173	.. option:: -opcode-index=<LLVM opcode index>
				174
Clement Courbet	956d5f3	2018-09-25 07:31:44 +0000	[diff] [blame]	175	Specify the opcode to measure, by index. See example 1 for details.
				176	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	177
Clement Courbet	0a1aef0	2018-10-17 15:04:15 +0000	[diff] [blame]	178	.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	179
Clement Courbet	0a1aef0	2018-10-17 15:04:15 +0000	[diff] [blame]	180	Specify the opcode to measure, by name. Several opcodes can be specified as
				181	a comma-separated list. See example 1 for details.
Clement Courbet	956d5f3	2018-09-25 07:31:44 +0000	[diff] [blame]	182	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
				183
				184	.. option:: -snippets-file=<filename>
				185
				186	Specify the custom code snippet to measure. See example 2 for details.
				187	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	188
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	189	.. option:: -mode=[latency\|uops\|analysis]
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	190
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	191	Specify the run mode.
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	192
				193	.. option:: -num-repetitions=<Number of repetition>
				194
				195	Specify the number of repetitions of the asm snippet.
				196	Higher values lead to more accurate measurements but lengthen the benchmark.
				197
Simon Pilgrim	6cc4ab3	2018-06-18 20:05:02 +0000	[diff] [blame]	198	.. option:: -benchmarks-file=</path/to/file>
Clement Courbet	2652053	2018-05-18 12:33:57 +0000	[diff] [blame]	199
				200	File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
				201	results. "-" uses stdin/stdout.
				202
				203	.. option:: -analysis-clusters-output-file=</path/to/file>
				204
				205	If provided, write the analysis clusters as CSV to this file. "-" prints to
				206	stdout.
				207
				208	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				209
				210	If non-empty, write inconsistencies found during analysis to this file. `-`
				211	prints to stdout.
				212
				213	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				214
				215	Specify the numPoints parameters to be used for DBSCAN clustering
				216	(`analysis` mode).
				217
				218	.. option:: -analysis-espilon=<dbscan epsilon parameter>
				219
				220	Specify the numPoints parameters to be used for DBSCAN clustering
				221	(`analysis` mode).
				222
Simon Pilgrim	6cc4ab3	2018-06-18 20:05:02 +0000	[diff] [blame]	223	.. option:: -ignore-invalid-sched-class=false
Clement Courbet	288479f	2018-06-18 11:27:47 +0000	[diff] [blame]	224
Simon Pilgrim	6cc4ab3	2018-06-18 20:05:02 +0000	[diff] [blame]	225	If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet	288479f	2018-06-18 11:27:47 +0000	[diff] [blame]	226
Clement Courbet	f4fb61b	2018-10-25 07:44:01 +0000	[diff] [blame]	227	.. option:: -mcpu=<cpu name>
				228
				229	If set, measure the cpu characteristics using the counters for this CPU. This
				230	is useful when creating new sched models (the host CPU is unknown to LLVM).
Clement Courbet	fd68be2	2018-04-04 11:37:06 +0000	[diff] [blame]	231
				232	EXIT STATUS
				233	-----------
				234
				235	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				236	printed to standard error, and the tool returns a non 0 value.