Blame - docs/TransformMetadata.rst - platform_external_llvm

blob: 68649424b717c0bdd5f72ee71d2b7329d5ea8213 [file] [log] [blame]

Michael Kruse	9a395de	2018-12-12 17:32:52 +0000	[diff] [blame]	1	.. _transformation-metadata:
				2
				3	============================
				4	Code Transformation Metadata
				5	============================
				6
				7	.. contents::
				8	:local:
				9
				10	Overview
				11	========
				12
				13	LLVM transformation passes can be controlled by attaching metadata to
				14	the code to transform. By default, transformation passes use heuristics
				15	to determine whether or not to perform transformations, and when doing
				16	so, other details of how the transformations are applied (e.g., which
				17	vectorization factor to select).
				18	Unless the optimizer is otherwise directed, transformations are applied
				19	conservatively. This conservatism generally allows the optimizer to
				20	avoid unprofitable transformations, but in practice, this results in the
				21	optimizer not applying transformations that would be highly profitable.
				22
				23	Frontends can give additional hints to LLVM passes on which
				24	transformations they should apply. This can be additional knowledge that
				25	cannot be derived from the emitted IR, or directives passed from the
				26	user/programmer. OpenMP pragmas are an example of the latter.
				27
				28	If any such metadata is dropped from the program, the code's semantics
				29	must not change.
				30
				31	Metadata on Loops
				32	=================
				33
				34	Attributes can be attached to loops as described in :ref:`llvm.loop`.
				35	Attributes can describe properties of the loop, disable transformations,
				36	force specific transformations and set transformation options.
				37
				38	Because metadata nodes are immutable (with the exception of
				39	``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
				40	metadata), in order to add or remove a loop attributes, a new ``MDNode``
				41	must be created and assigned as the new ``llvm.loop`` metadata. Any
				42	connection between the old ``MDNode`` and the loop is lost. The
				43	``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
				44	the loop effectively gets a new identifier. For instance,
				45	``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
				46	the parallel access property is to be preserved after adding/removing
				47	loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
				48	updated to the new LoopID.
				49
				50	Transformation Metadata Structure
				51	=================================
				52
				53	Some attributes describe code transformations (unrolling, vectorizing,
				54	loop distribution, etc.). They can either be a hint to the optimizer
				55	that a transformation might be beneficial, instruction to use a specific
				56	option, , or convey a specific request from the user (such as
				57	``#pragma clang loop`` or ``#pragma omp simd``).
				58
				59	If a transformation is forced but cannot be carried-out for any reason,
				60	an optimization-missed warning must be emitted. Semantic information
				61	such as a transformation being safe (e.g.
				62	``llvm.mem.parallel_loop_access``) can be unused by the optimizer
				63	without generating a warning.
				64
				65	Unless explicitly disabled, any optimization pass may heuristically
				66	determine whether a transformation is beneficial and apply it. If
				67	metadata for another transformation was specified, applying a different
				68	transformation before it might be inadvertent due to being applied on a
				69	different loop or the loop not existing anymore. To avoid having to
				70	explicitly disable an unknown number of passes, the attribute
				71	``llvm.loop.disable_nonforced`` disables all optional, high-level,
				72	restructuring transformations.
				73
				74	The following example avoids the loop being altered before being
				75	vectorized, for instance being unrolled.
				76
				77	.. code-block:: llvm
				78
				79	br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
				80	...
				81	!0 = distinct !{!0, !1, !2}
				82	!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				83	!2 = !{!"llvm.loop.disable_nonforced"}
				84
				85	After a transformation is applied, follow-up attributes are set on the
				86	transformed and/or new loop(s). This allows additional attributes
				87	including followup-transformations to be specified. Specifying multiple
				88	transformations in the same metadata node is possible for compatibility
				89	reasons, but their execution order is undefined. For instance, when
				90	``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
				91	specified at the same time, unrolling may occur either before or after
				92	vectorization.
				93
				94	As an example, the following instructs a loop to be vectorized and only
				95	then unrolled.
				96
				97	.. code-block:: llvm
				98
				99	!0 = distinct !{!0, !1, !2, !3}
				100	!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				101	!2 = !{!"llvm.loop.disable_nonforced"}
				102	!3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}
				103
				104	If, and only if, no followup is specified, the pass may add attributes itself.
				105	For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
				106	all attributes from the original loop excluding its loop vectorizer
				107	attributes. To avoid this, an empty followup attribute can be used, e.g.
				108
				109	.. code-block:: llvm
				110
				111	!3 = !{!"llvm.loop.vectorize.followup_vectorized"}
				112
				113	The followup attributes of a transformation that cannot be applied will
				114	never be added to a loop and are therefore effectively ignored. This means
				115	that any followup-transformation in such attributes requires that its
				116	prior transformations are applied before the followup-transformation.
				117	The user should receive a warning about the first transformation in the
				118	transformation chain that could not be applied if it a forced
				119	transformation. All following transformations are skipped.
				120
				121	Pass-Specific Transformation Metadata
				122	=====================================
				123
				124	Transformation options are specific to each transformation. In the
				125	following, we present the model for each LLVM loop optimization pass and
				126	the metadata to influence them.
				127
				128	Loop Vectorization and Interleaving
				129	-----------------------------------
				130
				131	Loop vectorization and interleaving is interpreted as a single
				132	transformation. It is interpreted as forced if
				133	``!{"llvm.loop.vectorize.enable", i1 true}`` is set.
				134
				135	Assuming the pre-vectorization loop is
				136
				137	.. code-block:: c
				138
				139	for (int i = 0; i < n; i+=1) // original loop
				140	Stmt(i);
				141
				142	then the code after vectorization will be approximately (assuming an
				143	SIMD width of 4):
				144
				145	.. code-block:: c
				146
				147	int i = 0;
				148	if (rtc) {
				149	for (; i + 3 < n; i+=4) // vectorized/interleaved loop
				150	Stmt(i:i+3);
				151	}
				152	for (; i < n; i+=1) // epilogue loop
				153	Stmt(i);
				154
				155	where ``rtc`` is a generated runtime check.
				156
				157	``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
				158	the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
				159	combined with the original loop's attributes to avoid it being
				160	vectorized multiple times.
				161
				162	``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
				163	the remainder loop. If not specified, it will have the original loop's
				164	attributes combined with ``llvm.loop.isvectorized`` and
				165	``llvm.loop.unroll.runtime.disable`` (unless the original loop already
				166	has unroll metadata).
				167
				168	The attributes specified by ``llvm.loop.vectorize.followup_all`` are
				169	added to both loops.
				170
				171	When using a follow-up attribute, it replaces any automatically deduced
				172	attributes for the generated loop in question. Therefore it is
				173	recommended to add ``llvm.loop.isvectorized`` to
				174	``llvm.loop.vectorize.followup_all`` which avoids that the loop
				175	vectorizer tries to optimize the loops again.
				176
				177	Loop Unrolling
				178	--------------
				179
				180	Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
				181	metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
				182	is present. Unrolling can be full unrolling, partial unrolling of a loop
				183	with constant trip count or runtime unrolling of a loop with a trip
				184	count unknown at compile-time.
				185
				186	If the loop has been unrolled fully, there is no followup-loop. For
				187	partial/runtime unrolling, the original loop of
				188
				189	.. code-block:: c
				190
				191	for (int i = 0; i < n; i+=1) // original loop
				192	Stmt(i);
				193
				194	is transformed into (using an unroll factor of 4):
				195
				196	.. code-block:: c
				197
				198	int i = 0;
				199	for (; i + 3 < n; i+=4) // unrolled loop
				200	Stmt(i);
				201	Stmt(i+1);
				202	Stmt(i+2);
				203	Stmt(i+3);
				204	}
				205	for (; i < n; i+=1) // remainder loop
				206	Stmt(i);
				207
				208	``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
				209	the unrolled loop. If not specified, the attributes of the original loop
				210	without the ``llvm.loop.unroll.*`` attributes are copied and
				211	``llvm.loop.unroll.disable`` added to it.
				212
				213	``llvm.loop.unroll.followup_remainder`` defines the attributes of the
				214	remainder loop. If not specified the remainder loop will have no
				215	attributes. The remainder loop might not be present due to being fully
				216	unrolled in which case this attribute has no effect.
				217
				218	Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
				219	unrolled and remainder loops.
				220
				221	To avoid that the partially unrolled loop is unrolled again, it is
				222	recommended to add ``llvm.loop.unroll.disable`` to
				223	``llvm.loop.unroll.followup_all``. If no follow-up attribute specified
				224	for a generated loop, it is added automatically.
				225
				226	Unroll-And-Jam
				227	--------------
				228
				229	Unroll-and-jam uses the following transformation model (here with an
				230	unroll factor if 2). Currently, it does not support a fallback version
				231	when the transformation is unsafe.
				232
				233	.. code-block:: c
				234
				235	for (int i = 0; i < n; i+=1) { // original outer loop
				236	Fore(i);
				237	for (int j = 0; j < m; j+=1) // original inner loop
				238	SubLoop(i, j);
				239	Aft(i);
				240	}
				241
				242	.. code-block:: c
				243
				244	int i = 0;
				245	for (; i + 1 < n; i+=2) { // unrolled outer loop
				246	Fore(i);
				247	Fore(i+1);
				248	for (int j = 0; j < m; j+=1) { // unrolled inner loop
				249	SubLoop(i, j);
				250	SubLoop(i+1, j);
				251	}
				252	Aft(i);
				253	Aft(i+1);
				254	}
				255	for (; i < n; i+=1) { // remainder outer loop
				256	Fore(i);
				257	for (int j = 0; j < m; j+=1) // remainder inner loop
				258	SubLoop(i, j);
				259	Aft(i);
				260	}
				261
				262	``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
				263	of the unrolled outer loop. If not specified, the attributes of the
				264	original outer loop without the ``llvm.loop.unroll.*`` attributes are
				265	copied and ``llvm.loop.unroll.disable`` added to it.
				266
				267	``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
				268	of the unrolled inner loop. If not specified, the attributes of the
				269	original inner loop are used unchanged.
				270
				271	``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
				272	attributes of the outer remainder loop. If not specified it will not
				273	have any attributes. The remainder loop might not be present due to
				274	being fully unrolled.
				275
				276	``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
				277	attributes of the inner remainder loop. If not specified it will have
				278	the attributes of the original inner loop. It the outer remainder loop
				279	is unrolled, the inner remainder loop might be present multiple times.
				280
				281	Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
				282	added to all of the aforementioned output loops.
				283
				284	To avoid that the unrolled loop is unrolled again, it is
				285	recommended to add ``llvm.loop.unroll.disable`` to
				286	``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam
				287	as well as an additional inner loop unrolling. If no follow-up
				288	attribute specified for a generated loop, it is added automatically.
				289
				290	Loop Distribution
				291	-----------------
				292
				293	The LoopDistribution pass tries to separate vectorizable parts of a loop
				294	from the non-vectorizable part (which otherwise would make the entire
				295	loop non-vectorizable). Conceptually, it transforms a loop such as
				296
				297	.. code-block:: c
				298
				299	for (int i = 1; i < n; i+=1) { // original loop
				300	A[i] = i;
				301	B[i] = 2 + B[i];
				302	C[i] = 3 + C[i - 1];
				303	}
				304
				305	into the following code:
				306
				307	.. code-block:: c
				308
				309	if (rtc) {
				310	for (int i = 1; i < n; i+=1) // coincident loop
				311	A[i] = i;
				312	for (int i = 1; i < n; i+=1) // coincident loop
				313	B[i] = 2 + B[i];
				314	for (int i = 1; i < n; i+=1) // sequential loop
				315	C[i] = 3 + C[i - 1];
				316	} else {
				317	for (int i = 1; i < n; i+=1) { // fallback loop
				318	A[i] = i;
				319	B[i] = 2 + B[i];
				320	C[i] = 3 + C[i - 1];
				321	}
				322	}
				323
				324	where ``rtc`` is a generated runtime check.
				325
				326	``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
				327	all loops without loop-carried dependencies (i.e. vectorizable loops).
				328	There might be more than one such loops. If not defined, the loops will
				329	inherit the original loop's attributes.
				330
				331	``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
				332	loop with potentially unsafe dependencies. There should be at most one
				333	such loop. If not defined, the loop will inherit the original loop's
				334	attributes.
				335
				336	``llvm.loop.distribute.followup_fallback`` defines the loop attributes
				337	for the fallback loop, which is a copy of the original loop for when
				338	loop versioning is required. If undefined, the fallback loop inherits
				339	all attributes from the original loop.
				340
				341	Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
				342	all of the aforementioned output loops.
				343
				344	It is recommended to add ``llvm.loop.disable_nonforced`` to
				345	``llvm.loop.distribute.followup_fallback``. This avoids that the
				346	fallback version (which is likely never executed) is further optimzed
				347	which would increase the code size.
				348
				349	Versioning LICM
				350	---------------
				351
				352	The pass hoists code out of loops that are only loop-invariant when
				353	dynamic conditions apply. For instance, it transforms the loop
				354
				355	.. code-block:: c
				356
				357	for (int i = 0; i < n; i+=1) // original loop
				358	A[i] = B[0];
				359
				360	into:
				361
				362	.. code-block:: c
				363
				364	if (rtc) {
				365	auto b = B[0];
				366	for (int i = 0; i < n; i+=1) // versioned loop
				367	A[i] = b;
				368	} else {
				369	for (int i = 0; i < n; i+=1) // unversioned loop
				370	A[i] = B[0];
				371	}
				372
				373	The runtime condition (``rtc``) checks that the array ``A`` and the
				374	element `B[0]` do not alias.
				375
				376	Currently, this transformation does not support followup-attributes.
				377
				378	Loop Interchange
				379	----------------
				380
				381	Currently, the ``LoopInterchange`` pass does not use any metadata.
				382
				383	Ambiguous Transformation Order
				384	==============================
				385
				386	If there multiple transformations defined, the order in which they are
				387	executed depends on the order in LLVM's pass pipeline, which is subject
				388	to change. The default optimization pipeline (anything higher than
				389	``-O0``) has the following order.
				390
				391	When using the legacy pass manager:
				392
				393	- LoopInterchange (if enabled)
				394	- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
				395	- VersioningLICM (if enabled)
				396	- LoopDistribute
				397	- LoopVectorizer
				398	- LoopUnrollAndJam (if enabled)
				399	- LoopUnroll (partial and runtime unrolling)
				400
				401	When using the legacy pass manager with LTO:
				402
				403	- LoopInterchange (if enabled)
				404	- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
				405	- LoopVectorizer
				406	- LoopUnroll (partial and runtime unrolling)
				407
				408	When using the new pass manager:
				409
				410	- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
				411	- LoopDistribute
				412	- LoopVectorizer
				413	- LoopUnrollAndJam (if enabled)
				414	- LoopUnroll (partial and runtime unrolling)
				415
				416	Leftover Transformations
				417	========================
				418
				419	Forced transformations that have not been applied after the last
				420	transformation pass should be reported to the user. The transformation
				421	passes themselves cannot be responsible for this reporting because they
				422	might not be in the pipeline, there might be multiple passes able to
				423	apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
				424	transformation attribute may be 'hidden' inside another passes' followup
				425	attribute.
				426
				427	The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
				428	emits such warnings. It should be placed after the last transformation
				429	pass.
				430
				431	The current pass pipeline has a fixed order in which transformations
				432	passes are executed. A transformation can be in the followup of a pass
				433	that is executed later and thus leftover. For instance, a loop nest
				434	cannot be distributed and then interchanged with the current pass
				435	pipeline. The loop distribution will execute, but there is no loop
				436	interchange pass following such that any loop interchange metadata will
				437	be ignored. The ``-transform-warning`` should emit a warning in this
				438	case.
				439
				440	Future versions of LLVM may fix this by executing transformations using
				441	a dynamic ordering.