Michael Kruse | 9a395de | 2018-12-12 17:32:52 +0000 | [diff] [blame] | 1 | .. _transformation-metadata: |
| 2 | |
| 3 | ============================ |
| 4 | Code Transformation Metadata |
| 5 | ============================ |
| 6 | |
| 7 | .. contents:: |
| 8 | :local: |
| 9 | |
| 10 | Overview |
| 11 | ======== |
| 12 | |
| 13 | LLVM transformation passes can be controlled by attaching metadata to |
| 14 | the code to transform. By default, transformation passes use heuristics |
| 15 | to determine whether or not to perform transformations, and when doing |
| 16 | so, other details of how the transformations are applied (e.g., which |
| 17 | vectorization factor to select). |
| 18 | Unless the optimizer is otherwise directed, transformations are applied |
| 19 | conservatively. This conservatism generally allows the optimizer to |
| 20 | avoid unprofitable transformations, but in practice, this results in the |
| 21 | optimizer not applying transformations that would be highly profitable. |
| 22 | |
| 23 | Frontends can give additional hints to LLVM passes on which |
| 24 | transformations they should apply. This can be additional knowledge that |
| 25 | cannot be derived from the emitted IR, or directives passed from the |
| 26 | user/programmer. OpenMP pragmas are an example of the latter. |
| 27 | |
| 28 | If any such metadata is dropped from the program, the code's semantics |
| 29 | must not change. |
| 30 | |
| 31 | Metadata on Loops |
| 32 | ================= |
| 33 | |
| 34 | Attributes can be attached to loops as described in :ref:`llvm.loop`. |
| 35 | Attributes can describe properties of the loop, disable transformations, |
| 36 | force specific transformations and set transformation options. |
| 37 | |
| 38 | Because metadata nodes are immutable (with the exception of |
| 39 | ``MDNode::replaceOperandWith`` which is dangerous to use on uniqued |
| 40 | metadata), in order to add or remove a loop attributes, a new ``MDNode`` |
| 41 | must be created and assigned as the new ``llvm.loop`` metadata. Any |
| 42 | connection between the old ``MDNode`` and the loop is lost. The |
| 43 | ``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e. |
| 44 | the loop effectively gets a new identifier. For instance, |
| 45 | ``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if |
| 46 | the parallel access property is to be preserved after adding/removing |
| 47 | loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be |
| 48 | updated to the new LoopID. |
| 49 | |
| 50 | Transformation Metadata Structure |
| 51 | ================================= |
| 52 | |
| 53 | Some attributes describe code transformations (unrolling, vectorizing, |
| 54 | loop distribution, etc.). They can either be a hint to the optimizer |
| 55 | that a transformation might be beneficial, instruction to use a specific |
| 56 | option, , or convey a specific request from the user (such as |
| 57 | ``#pragma clang loop`` or ``#pragma omp simd``). |
| 58 | |
| 59 | If a transformation is forced but cannot be carried-out for any reason, |
| 60 | an optimization-missed warning must be emitted. Semantic information |
| 61 | such as a transformation being safe (e.g. |
| 62 | ``llvm.mem.parallel_loop_access``) can be unused by the optimizer |
| 63 | without generating a warning. |
| 64 | |
| 65 | Unless explicitly disabled, any optimization pass may heuristically |
| 66 | determine whether a transformation is beneficial and apply it. If |
| 67 | metadata for another transformation was specified, applying a different |
| 68 | transformation before it might be inadvertent due to being applied on a |
| 69 | different loop or the loop not existing anymore. To avoid having to |
| 70 | explicitly disable an unknown number of passes, the attribute |
| 71 | ``llvm.loop.disable_nonforced`` disables all optional, high-level, |
| 72 | restructuring transformations. |
| 73 | |
| 74 | The following example avoids the loop being altered before being |
| 75 | vectorized, for instance being unrolled. |
| 76 | |
| 77 | .. code-block:: llvm |
| 78 | |
| 79 | br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0 |
| 80 | ... |
| 81 | !0 = distinct !{!0, !1, !2} |
| 82 | !1 = !{!"llvm.loop.vectorize.enable", i1 true} |
| 83 | !2 = !{!"llvm.loop.disable_nonforced"} |
| 84 | |
| 85 | After a transformation is applied, follow-up attributes are set on the |
| 86 | transformed and/or new loop(s). This allows additional attributes |
| 87 | including followup-transformations to be specified. Specifying multiple |
| 88 | transformations in the same metadata node is possible for compatibility |
| 89 | reasons, but their execution order is undefined. For instance, when |
| 90 | ``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are |
| 91 | specified at the same time, unrolling may occur either before or after |
| 92 | vectorization. |
| 93 | |
| 94 | As an example, the following instructs a loop to be vectorized and only |
| 95 | then unrolled. |
| 96 | |
| 97 | .. code-block:: llvm |
| 98 | |
| 99 | !0 = distinct !{!0, !1, !2, !3} |
| 100 | !1 = !{!"llvm.loop.vectorize.enable", i1 true} |
| 101 | !2 = !{!"llvm.loop.disable_nonforced"} |
| 102 | !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}} |
| 103 | |
| 104 | If, and only if, no followup is specified, the pass may add attributes itself. |
| 105 | For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and |
| 106 | all attributes from the original loop excluding its loop vectorizer |
| 107 | attributes. To avoid this, an empty followup attribute can be used, e.g. |
| 108 | |
| 109 | .. code-block:: llvm |
| 110 | |
| 111 | !3 = !{!"llvm.loop.vectorize.followup_vectorized"} |
| 112 | |
| 113 | The followup attributes of a transformation that cannot be applied will |
| 114 | never be added to a loop and are therefore effectively ignored. This means |
| 115 | that any followup-transformation in such attributes requires that its |
| 116 | prior transformations are applied before the followup-transformation. |
| 117 | The user should receive a warning about the first transformation in the |
| 118 | transformation chain that could not be applied if it a forced |
| 119 | transformation. All following transformations are skipped. |
| 120 | |
| 121 | Pass-Specific Transformation Metadata |
| 122 | ===================================== |
| 123 | |
| 124 | Transformation options are specific to each transformation. In the |
| 125 | following, we present the model for each LLVM loop optimization pass and |
| 126 | the metadata to influence them. |
| 127 | |
| 128 | Loop Vectorization and Interleaving |
| 129 | ----------------------------------- |
| 130 | |
| 131 | Loop vectorization and interleaving is interpreted as a single |
| 132 | transformation. It is interpreted as forced if |
| 133 | ``!{"llvm.loop.vectorize.enable", i1 true}`` is set. |
| 134 | |
| 135 | Assuming the pre-vectorization loop is |
| 136 | |
| 137 | .. code-block:: c |
| 138 | |
| 139 | for (int i = 0; i < n; i+=1) // original loop |
| 140 | Stmt(i); |
| 141 | |
| 142 | then the code after vectorization will be approximately (assuming an |
| 143 | SIMD width of 4): |
| 144 | |
| 145 | .. code-block:: c |
| 146 | |
| 147 | int i = 0; |
| 148 | if (rtc) { |
| 149 | for (; i + 3 < n; i+=4) // vectorized/interleaved loop |
| 150 | Stmt(i:i+3); |
| 151 | } |
| 152 | for (; i < n; i+=1) // epilogue loop |
| 153 | Stmt(i); |
| 154 | |
| 155 | where ``rtc`` is a generated runtime check. |
| 156 | |
| 157 | ``llvm.loop.vectorize.followup_vectorized`` will set the attributes for |
| 158 | the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is |
| 159 | combined with the original loop's attributes to avoid it being |
| 160 | vectorized multiple times. |
| 161 | |
| 162 | ``llvm.loop.vectorize.followup_epilogue`` will set the attributes for |
| 163 | the remainder loop. If not specified, it will have the original loop's |
| 164 | attributes combined with ``llvm.loop.isvectorized`` and |
| 165 | ``llvm.loop.unroll.runtime.disable`` (unless the original loop already |
| 166 | has unroll metadata). |
| 167 | |
| 168 | The attributes specified by ``llvm.loop.vectorize.followup_all`` are |
| 169 | added to both loops. |
| 170 | |
| 171 | When using a follow-up attribute, it replaces any automatically deduced |
| 172 | attributes for the generated loop in question. Therefore it is |
| 173 | recommended to add ``llvm.loop.isvectorized`` to |
| 174 | ``llvm.loop.vectorize.followup_all`` which avoids that the loop |
| 175 | vectorizer tries to optimize the loops again. |
| 176 | |
| 177 | Loop Unrolling |
| 178 | -------------- |
| 179 | |
| 180 | Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}`` |
| 181 | metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``) |
| 182 | is present. Unrolling can be full unrolling, partial unrolling of a loop |
| 183 | with constant trip count or runtime unrolling of a loop with a trip |
| 184 | count unknown at compile-time. |
| 185 | |
| 186 | If the loop has been unrolled fully, there is no followup-loop. For |
| 187 | partial/runtime unrolling, the original loop of |
| 188 | |
| 189 | .. code-block:: c |
| 190 | |
| 191 | for (int i = 0; i < n; i+=1) // original loop |
| 192 | Stmt(i); |
| 193 | |
| 194 | is transformed into (using an unroll factor of 4): |
| 195 | |
| 196 | .. code-block:: c |
| 197 | |
| 198 | int i = 0; |
| 199 | for (; i + 3 < n; i+=4) // unrolled loop |
| 200 | Stmt(i); |
| 201 | Stmt(i+1); |
| 202 | Stmt(i+2); |
| 203 | Stmt(i+3); |
| 204 | } |
| 205 | for (; i < n; i+=1) // remainder loop |
| 206 | Stmt(i); |
| 207 | |
| 208 | ``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of |
| 209 | the unrolled loop. If not specified, the attributes of the original loop |
| 210 | without the ``llvm.loop.unroll.*`` attributes are copied and |
| 211 | ``llvm.loop.unroll.disable`` added to it. |
| 212 | |
| 213 | ``llvm.loop.unroll.followup_remainder`` defines the attributes of the |
| 214 | remainder loop. If not specified the remainder loop will have no |
| 215 | attributes. The remainder loop might not be present due to being fully |
| 216 | unrolled in which case this attribute has no effect. |
| 217 | |
| 218 | Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the |
| 219 | unrolled and remainder loops. |
| 220 | |
| 221 | To avoid that the partially unrolled loop is unrolled again, it is |
| 222 | recommended to add ``llvm.loop.unroll.disable`` to |
| 223 | ``llvm.loop.unroll.followup_all``. If no follow-up attribute specified |
| 224 | for a generated loop, it is added automatically. |
| 225 | |
| 226 | Unroll-And-Jam |
| 227 | -------------- |
| 228 | |
| 229 | Unroll-and-jam uses the following transformation model (here with an |
| 230 | unroll factor if 2). Currently, it does not support a fallback version |
| 231 | when the transformation is unsafe. |
| 232 | |
| 233 | .. code-block:: c |
| 234 | |
| 235 | for (int i = 0; i < n; i+=1) { // original outer loop |
| 236 | Fore(i); |
| 237 | for (int j = 0; j < m; j+=1) // original inner loop |
| 238 | SubLoop(i, j); |
| 239 | Aft(i); |
| 240 | } |
| 241 | |
| 242 | .. code-block:: c |
| 243 | |
| 244 | int i = 0; |
| 245 | for (; i + 1 < n; i+=2) { // unrolled outer loop |
| 246 | Fore(i); |
| 247 | Fore(i+1); |
| 248 | for (int j = 0; j < m; j+=1) { // unrolled inner loop |
| 249 | SubLoop(i, j); |
| 250 | SubLoop(i+1, j); |
| 251 | } |
| 252 | Aft(i); |
| 253 | Aft(i+1); |
| 254 | } |
| 255 | for (; i < n; i+=1) { // remainder outer loop |
| 256 | Fore(i); |
| 257 | for (int j = 0; j < m; j+=1) // remainder inner loop |
| 258 | SubLoop(i, j); |
| 259 | Aft(i); |
| 260 | } |
| 261 | |
| 262 | ``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes |
| 263 | of the unrolled outer loop. If not specified, the attributes of the |
| 264 | original outer loop without the ``llvm.loop.unroll.*`` attributes are |
| 265 | copied and ``llvm.loop.unroll.disable`` added to it. |
| 266 | |
| 267 | ``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes |
| 268 | of the unrolled inner loop. If not specified, the attributes of the |
| 269 | original inner loop are used unchanged. |
| 270 | |
| 271 | ``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop |
| 272 | attributes of the outer remainder loop. If not specified it will not |
| 273 | have any attributes. The remainder loop might not be present due to |
| 274 | being fully unrolled. |
| 275 | |
| 276 | ``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop |
| 277 | attributes of the inner remainder loop. If not specified it will have |
| 278 | the attributes of the original inner loop. It the outer remainder loop |
| 279 | is unrolled, the inner remainder loop might be present multiple times. |
| 280 | |
| 281 | Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are |
| 282 | added to all of the aforementioned output loops. |
| 283 | |
| 284 | To avoid that the unrolled loop is unrolled again, it is |
| 285 | recommended to add ``llvm.loop.unroll.disable`` to |
| 286 | ``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam |
| 287 | as well as an additional inner loop unrolling. If no follow-up |
| 288 | attribute specified for a generated loop, it is added automatically. |
| 289 | |
| 290 | Loop Distribution |
| 291 | ----------------- |
| 292 | |
| 293 | The LoopDistribution pass tries to separate vectorizable parts of a loop |
| 294 | from the non-vectorizable part (which otherwise would make the entire |
| 295 | loop non-vectorizable). Conceptually, it transforms a loop such as |
| 296 | |
| 297 | .. code-block:: c |
| 298 | |
| 299 | for (int i = 1; i < n; i+=1) { // original loop |
| 300 | A[i] = i; |
| 301 | B[i] = 2 + B[i]; |
| 302 | C[i] = 3 + C[i - 1]; |
| 303 | } |
| 304 | |
| 305 | into the following code: |
| 306 | |
| 307 | .. code-block:: c |
| 308 | |
| 309 | if (rtc) { |
| 310 | for (int i = 1; i < n; i+=1) // coincident loop |
| 311 | A[i] = i; |
| 312 | for (int i = 1; i < n; i+=1) // coincident loop |
| 313 | B[i] = 2 + B[i]; |
| 314 | for (int i = 1; i < n; i+=1) // sequential loop |
| 315 | C[i] = 3 + C[i - 1]; |
| 316 | } else { |
| 317 | for (int i = 1; i < n; i+=1) { // fallback loop |
| 318 | A[i] = i; |
| 319 | B[i] = 2 + B[i]; |
| 320 | C[i] = 3 + C[i - 1]; |
| 321 | } |
| 322 | } |
| 323 | |
| 324 | where ``rtc`` is a generated runtime check. |
| 325 | |
| 326 | ``llvm.loop.distribute.followup_coincident`` sets the loop attributes of |
| 327 | all loops without loop-carried dependencies (i.e. vectorizable loops). |
| 328 | There might be more than one such loops. If not defined, the loops will |
| 329 | inherit the original loop's attributes. |
| 330 | |
| 331 | ``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the |
| 332 | loop with potentially unsafe dependencies. There should be at most one |
| 333 | such loop. If not defined, the loop will inherit the original loop's |
| 334 | attributes. |
| 335 | |
| 336 | ``llvm.loop.distribute.followup_fallback`` defines the loop attributes |
| 337 | for the fallback loop, which is a copy of the original loop for when |
| 338 | loop versioning is required. If undefined, the fallback loop inherits |
| 339 | all attributes from the original loop. |
| 340 | |
| 341 | Attributes defined in ``llvm.loop.distribute.followup_all`` are added to |
| 342 | all of the aforementioned output loops. |
| 343 | |
| 344 | It is recommended to add ``llvm.loop.disable_nonforced`` to |
| 345 | ``llvm.loop.distribute.followup_fallback``. This avoids that the |
| 346 | fallback version (which is likely never executed) is further optimzed |
| 347 | which would increase the code size. |
| 348 | |
| 349 | Versioning LICM |
| 350 | --------------- |
| 351 | |
| 352 | The pass hoists code out of loops that are only loop-invariant when |
| 353 | dynamic conditions apply. For instance, it transforms the loop |
| 354 | |
| 355 | .. code-block:: c |
| 356 | |
| 357 | for (int i = 0; i < n; i+=1) // original loop |
| 358 | A[i] = B[0]; |
| 359 | |
| 360 | into: |
| 361 | |
| 362 | .. code-block:: c |
| 363 | |
| 364 | if (rtc) { |
| 365 | auto b = B[0]; |
| 366 | for (int i = 0; i < n; i+=1) // versioned loop |
| 367 | A[i] = b; |
| 368 | } else { |
| 369 | for (int i = 0; i < n; i+=1) // unversioned loop |
| 370 | A[i] = B[0]; |
| 371 | } |
| 372 | |
| 373 | The runtime condition (``rtc``) checks that the array ``A`` and the |
| 374 | element `B[0]` do not alias. |
| 375 | |
| 376 | Currently, this transformation does not support followup-attributes. |
| 377 | |
| 378 | Loop Interchange |
| 379 | ---------------- |
| 380 | |
| 381 | Currently, the ``LoopInterchange`` pass does not use any metadata. |
| 382 | |
| 383 | Ambiguous Transformation Order |
| 384 | ============================== |
| 385 | |
| 386 | If there multiple transformations defined, the order in which they are |
| 387 | executed depends on the order in LLVM's pass pipeline, which is subject |
| 388 | to change. The default optimization pipeline (anything higher than |
| 389 | ``-O0``) has the following order. |
| 390 | |
| 391 | When using the legacy pass manager: |
| 392 | |
| 393 | - LoopInterchange (if enabled) |
| 394 | - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) |
| 395 | - VersioningLICM (if enabled) |
| 396 | - LoopDistribute |
| 397 | - LoopVectorizer |
| 398 | - LoopUnrollAndJam (if enabled) |
| 399 | - LoopUnroll (partial and runtime unrolling) |
| 400 | |
| 401 | When using the legacy pass manager with LTO: |
| 402 | |
| 403 | - LoopInterchange (if enabled) |
| 404 | - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) |
| 405 | - LoopVectorizer |
| 406 | - LoopUnroll (partial and runtime unrolling) |
| 407 | |
| 408 | When using the new pass manager: |
| 409 | |
| 410 | - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) |
| 411 | - LoopDistribute |
| 412 | - LoopVectorizer |
| 413 | - LoopUnrollAndJam (if enabled) |
| 414 | - LoopUnroll (partial and runtime unrolling) |
| 415 | |
| 416 | Leftover Transformations |
| 417 | ======================== |
| 418 | |
| 419 | Forced transformations that have not been applied after the last |
| 420 | transformation pass should be reported to the user. The transformation |
| 421 | passes themselves cannot be responsible for this reporting because they |
| 422 | might not be in the pipeline, there might be multiple passes able to |
| 423 | apply a transformation (e.g. ``LoopInterchange`` and Polly) or a |
| 424 | transformation attribute may be 'hidden' inside another passes' followup |
| 425 | attribute. |
| 426 | |
| 427 | The pass ``-transform-warning`` (``WarnMissedTransformationsPass``) |
| 428 | emits such warnings. It should be placed after the last transformation |
| 429 | pass. |
| 430 | |
| 431 | The current pass pipeline has a fixed order in which transformations |
| 432 | passes are executed. A transformation can be in the followup of a pass |
| 433 | that is executed later and thus leftover. For instance, a loop nest |
| 434 | cannot be distributed and then interchanged with the current pass |
| 435 | pipeline. The loop distribution will execute, but there is no loop |
| 436 | interchange pass following such that any loop interchange metadata will |
| 437 | be ignored. The ``-transform-warning`` should emit a warning in this |
| 438 | case. |
| 439 | |
| 440 | Future versions of LLVM may fix this by executing transformations using |
| 441 | a dynamic ordering. |