Ahmed Bougacha | bbf0c3a | 2016-11-04 17:57:34 +0000 | [diff] [blame] | 1 | ============================ |
| 2 | Global Instruction Selection |
| 3 | ============================ |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | :depth: 1 |
| 8 | |
| 9 | .. warning:: |
| 10 | This document is a work in progress. It reflects the current state of the |
| 11 | implementation, as well as open design and implementation issues. |
| 12 | |
| 13 | Introduction |
| 14 | ============ |
| 15 | |
| 16 | GlobalISel is a framework that provides a set of reusable passes and utilities |
| 17 | for instruction selection --- translation from LLVM IR to target-specific |
| 18 | Machine IR (MIR). |
| 19 | |
| 20 | GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to |
| 21 | solve three major problems: |
| 22 | |
| 23 | * **Performance** --- SelectionDAG introduces a dedicated intermediate |
| 24 | representation, which has a compile-time cost. |
| 25 | |
| 26 | GlobalISel directly operates on the post-isel representation used by the |
| 27 | rest of the code generator, MIR. |
| 28 | It does require extensions to that representation to support arbitrary |
| 29 | incoming IR: :ref:`gmir`. |
| 30 | |
| 31 | * **Granularity** --- SelectionDAG and FastISel operate on individual basic |
| 32 | blocks, losing some global optimization opportunities. |
| 33 | |
| 34 | GlobalISel operates on the whole function. |
| 35 | |
| 36 | * **Modularity** --- SelectionDAG and FastISel are radically different and share |
| 37 | very little code. |
| 38 | |
| 39 | GlobalISel is built in a way that enables code reuse. For instance, both the |
| 40 | optimized and fast selectors share the :ref:`pipeline`, and targets can |
| 41 | configure that pipeline to better suit their needs. |
| 42 | |
| 43 | |
| 44 | .. _gmir: |
| 45 | |
| 46 | Generic Machine IR |
| 47 | ================== |
| 48 | |
| 49 | Machine IR operates on physical registers, register classes, and (mostly) |
| 50 | target-specific instructions. |
| 51 | |
| 52 | To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to |
| 53 | Machine IR: |
| 54 | |
| 55 | .. contents:: |
| 56 | :local: |
| 57 | |
| 58 | ``NOTE``: |
| 59 | The generic MIR (GMIR) representation still contains references to IR |
| 60 | constructs (such as ``GlobalValue``). Removing those should let us write more |
| 61 | accurate tests, or delete IR after building the initial MIR. However, it is |
| 62 | not part of the GlobalISel effort. |
| 63 | |
| 64 | .. _gmir-instructions: |
| 65 | |
| 66 | Generic Instructions |
| 67 | -------------------- |
| 68 | |
| 69 | The main addition is support for pre-isel generic machine instructions (e.g., |
| 70 | ``G_ADD``). Like other target-independent instructions (e.g., ``COPY`` or |
| 71 | ``PHI``), these are available on all targets. |
| 72 | |
| 73 | ``TODO``: |
| 74 | While we're progressively adding instructions, one kind in particular exposes |
| 75 | interesting problems: compares and how to represent condition codes. |
| 76 | Some targets (x86, ARM) have generic comparisons setting multiple flags, |
| 77 | which are then used by predicated variants. |
| 78 | Others (IR) specify the predicate in the comparison and users just get a single |
| 79 | bit. SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to |
| 80 | represent this. |
| 81 | |
| 82 | The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides |
| 83 | a convenient way to create these generic instructions. |
| 84 | |
| 85 | .. _gmir-gvregs: |
| 86 | |
| 87 | Generic Virtual Registers |
| 88 | ------------------------- |
| 89 | |
| 90 | Generic instructions operate on a new kind of register: "generic" virtual |
| 91 | registers. As opposed to non-generic vregs, they are not assigned a Register |
| 92 | Class. Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned |
| 93 | a :ref:`gmir-regbank`. |
| 94 | |
| 95 | ``MachineRegisterInfo`` tracks the same information that it does for |
| 96 | non-generic vregs (e.g., use-def chains). Additionally, it also tracks the |
| 97 | :ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``, |
| 98 | its :ref:`gmir-regbank`, if any. |
| 99 | |
| 100 | For simplicity, most generic instructions only accept generic vregs: |
| 101 | |
| 102 | * instead of immediates, they use a gvreg defined by an instruction |
| 103 | materializing the immediate value (see :ref:`irtranslator-constants`). |
| 104 | * instead of physical register, they use a gvreg defined by a ``COPY``. |
| 105 | |
| 106 | ``NOTE``: |
| 107 | We started with an alternative representation, where MRI tracks a size for |
| 108 | each gvreg, and instructions have lists of types. |
| 109 | That had two flaws: the type and size are redundant, and there was no generic |
| 110 | way of getting a given operand's type (as there was no 1:1 mapping between |
| 111 | instruction types and operands). |
| 112 | We considered putting the type in some variant of MCInstrDesc instead: |
| 113 | See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs |
| 114 | need a type but this increases the memory footprint of the related objects |
| 115 | |
| 116 | .. _gmir-regbank: |
| 117 | |
| 118 | Register Bank |
| 119 | ------------- |
| 120 | |
| 121 | A Register Bank is a set of register classes defined by the target. |
| 122 | A bank has a size, which is the maximum store size of all covered classes. |
| 123 | |
| 124 | In general, cross-class copies inside a bank are expected to be cheaper than |
| 125 | copies across banks. They are also coalesceable by the register coalescer, |
| 126 | whereas cross-bank copies are not. |
| 127 | |
| 128 | Also, equivalent operations can be performed on different banks using different |
| 129 | instructions. |
| 130 | |
| 131 | For example, X86 can be seen as having 3 main banks: general-purpose, x87, and |
| 132 | vector (which could be further split into a bank per domain for single vs |
| 133 | double precision instructions). |
| 134 | |
| 135 | Register banks are described by a target-provided API, |
| 136 | :ref:`RegisterBankInfo <api-registerbankinfo>`. |
| 137 | |
| 138 | .. _gmir-llt: |
| 139 | |
| 140 | Low Level Type |
| 141 | -------------- |
| 142 | |
| 143 | Additionally, every generic virtual register has a type, represented by an |
| 144 | instance of the ``LLT`` class. |
| 145 | |
| 146 | Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed |
| 147 | integer types. Furthermore, it also has no distinction between integer and |
| 148 | floating-point types: it mainly conveys absolutely necessary information, such |
| 149 | as size and number of vector lanes: |
| 150 | |
| 151 | * ``sN`` for scalars |
| 152 | * ``pN`` for pointers |
| 153 | * ``<N x sM>`` for vectors |
| 154 | * ``unsized`` for labels, etc.. |
| 155 | |
| 156 | ``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG. |
| 157 | |
| 158 | Here are some LLT examples and their ``EVT`` and ``Type`` equivalents: |
| 159 | |
| 160 | ============= ========= ====================================== |
| 161 | LLT EVT IR Type |
| 162 | ============= ========= ====================================== |
| 163 | ``s1`` ``i1`` ``i1`` |
| 164 | ``s8`` ``i8`` ``i8`` |
| 165 | ``s32`` ``i32`` ``i32`` |
| 166 | ``s32`` ``f32`` ``float`` |
| 167 | ``s17`` ``i17`` ``i17`` |
| 168 | ``s16`` N/A ``{i8, i8}`` |
| 169 | ``s32`` N/A ``[4 x i8]`` |
| 170 | ``p0`` ``iPTR`` ``i8*``, ``i32*``, ``%opaque*`` |
| 171 | ``p2`` ``iPTR`` ``i8 addrspace(2)*`` |
| 172 | ``<4 x s32>`` ``v4f32`` ``<4 x float>`` |
| 173 | ``s64`` ``v1f64`` ``<1 x double>`` |
| 174 | ``<3 x s32>`` ``v3i32`` ``<3 x i32>`` |
| 175 | ``unsized`` ``Other`` ``label`` |
| 176 | ============= ========= ====================================== |
| 177 | |
| 178 | |
| 179 | Rationale: instructions already encode a specific interpretation of types |
| 180 | (e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``). Also encoding that |
| 181 | information in the type system requires introducing bitcast with no real |
| 182 | advantage for the selector. |
| 183 | |
| 184 | Pointer types are distinguished by address space. This matches IR, as opposed |
| 185 | to SelectionDAG where address space is an attribute on operations. |
| 186 | This representation better supports pointers having different sizes depending |
| 187 | on their addressspace. |
| 188 | |
| 189 | ``NOTE``: |
| 190 | Currently, LLT requires at least 2 elements in vectors, but some targets have |
| 191 | the concept of a '1-element vector'. Representing them as their underlying |
| 192 | scalar type is a nice simplification. |
| 193 | |
| 194 | ``TODO``: |
| 195 | Currently, non-generic virtual registers, defined by non-pre-isel-generic |
| 196 | instructions, cannot have a type, and thus cannot be used by a pre-isel generic |
| 197 | instruction. Instead, they are given a type using a COPY. We could relax that |
| 198 | and allow types on all vregs: this would reduce the number of MI required when |
| 199 | emitting target-specific MIR early in the pipeline. This should purely be |
| 200 | a compile-time optimization. |
| 201 | |
| 202 | .. _pipeline: |
| 203 | |
| 204 | Core Pipeline |
| 205 | ============= |
| 206 | |
| 207 | There are four required passes, regardless of the optimization mode: |
| 208 | |
| 209 | .. contents:: |
| 210 | :local: |
| 211 | |
| 212 | Additional passes can then be inserted at higher optimization levels or for |
| 213 | specific targets. For example, to match the current SelectionDAG set of |
| 214 | transformations: MachineCSE and a better MachineCombiner between every pass. |
| 215 | |
| 216 | ``NOTE``: |
| 217 | In theory, not all passes are always necessary. |
| 218 | As an additional compile-time optimization, we could skip some of the passes by |
| 219 | setting the relevant MachineFunction properties. For instance, if the |
| 220 | IRTranslator did not encounter any illegal instruction, it would set the |
| 221 | ``legalized`` property to avoid running the :ref:`milegalizer`. |
| 222 | Similarly, we considered specializing the IRTranslator per-target to directly |
| 223 | emit target-specific MI. |
| 224 | However, we instead decided to keep the core pipeline simple, and focus on |
| 225 | minimizing the overhead of the passes in the no-op cases. |
| 226 | |
| 227 | |
| 228 | .. _irtranslator: |
| 229 | |
| 230 | IRTranslator |
| 231 | ------------ |
| 232 | |
| 233 | This pass translates the input LLVM IR ``Function`` to a GMIR |
| 234 | ``MachineFunction``. |
| 235 | |
| 236 | ``TODO``: |
| 237 | This currently doesn't support the more complex instructions, in particular |
| 238 | those involving control flow (``switch``, ``invoke``, ...). |
| 239 | For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass. |
| 240 | |
| 241 | .. _api-calllowering: |
| 242 | |
| 243 | API: CallLowering |
| 244 | ^^^^^^^^^^^^^^^^^ |
| 245 | |
| 246 | The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also |
| 247 | implements the ABI's calling convention by lowering calls, returns, and |
| 248 | arguments to the appropriate physical register usage and instruction sequences. |
| 249 | |
| 250 | .. _irtranslator-aggregates: |
| 251 | |
| 252 | Aggregates |
| 253 | ^^^^^^^^^^ |
| 254 | |
| 255 | Aggregates are lowered to a single scalar vreg. |
| 256 | This differs from SelectionDAG's multiple vregs via ``GetValueVTs``. |
| 257 | |
| 258 | ``TODO``: |
| 259 | As some of the bits are undef (padding), we should consider augmenting the |
| 260 | representation with additional metadata (in effect, caching computeKnownBits |
| 261 | information on vregs). |
| 262 | See `PR26161 <http://llvm.org/PR26161>`_: [GlobalISel] Value to vreg during |
| 263 | IR to MachineInstr translation for aggregate type |
| 264 | |
| 265 | .. _irtranslator-constants: |
| 266 | |
| 267 | Constant Lowering |
| 268 | ^^^^^^^^^^^^^^^^^ |
| 269 | |
| 270 | The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined |
| 271 | by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions. |
| 272 | Currently, these instructions are always emitted in the entry basic block. |
| 273 | In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg. |
| 274 | |
| 275 | This is beneficial as it allows us to fold constants into immediate operands |
| 276 | during :ref:`instructionselect`, while still avoiding redundant materializations |
| 277 | for expensive non-foldable constants. |
| 278 | However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as |
| 279 | these vregs can have long live ranges. |
| 280 | |
| 281 | ``TODO``: |
| 282 | We're investigating better placement of these instructions, in fast and |
| 283 | optimized modes. |
| 284 | |
| 285 | |
| 286 | .. _milegalizer: |
| 287 | |
| 288 | Legalizer |
| 289 | --------- |
| 290 | |
| 291 | This pass transforms the generic machine instructions such that they are legal. |
| 292 | |
| 293 | A legal instruction is defined as: |
| 294 | |
| 295 | * **selectable** --- the target will later be able to select it to a |
| 296 | target-specific (non-generic) instruction. |
| 297 | |
| 298 | * operating on **vregs that can be loaded and stored** -- if necessary, the |
| 299 | target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand. |
| 300 | |
| 301 | As opposed to SelectionDAG, there are no legalization phases. In particular, |
| 302 | 'type' and 'operation' legalization are not separate. |
| 303 | |
| 304 | Legalization is iterative, and all state is contained in GMIR. To maintain the |
| 305 | validity of the intermediate code, instructions are introduced: |
| 306 | |
Quentin Colombet | ae44772 | 2017-09-25 22:03:06 +0000 | [diff] [blame] | 307 | * ``G_MERGE_VALUES`` --- concatenate multiple registers of the same |
| 308 | size into a single wider register. |
| 309 | |
| 310 | * ``G_UNMERGE_VALUES`` --- extract multiple registers of the same size |
| 311 | from a single wider register. |
Ahmed Bougacha | bbf0c3a | 2016-11-04 17:57:34 +0000 | [diff] [blame] | 312 | |
Quentin Colombet | 33e9c8a | 2017-09-25 22:03:01 +0000 | [diff] [blame] | 313 | * ``G_EXTRACT`` --- extract a simple register (as contiguous sequences of bits) |
Ahmed Bougacha | bbf0c3a | 2016-11-04 17:57:34 +0000 | [diff] [blame] | 314 | from a single wider register. |
| 315 | |
| 316 | As they are expected to be temporary byproducts of the legalization process, |
| 317 | they are combined at the end of the :ref:`milegalizer` pass. |
| 318 | If any remain, they are expected to always be selectable, using loads and stores |
| 319 | if necessary. |
| 320 | |
| 321 | .. _api-legalizerinfo: |
| 322 | |
| 323 | API: LegalizerInfo |
| 324 | ^^^^^^^^^^^^^^^^^^ |
| 325 | |
| 326 | Currently the API is broadly similar to SelectionDAG/TargetLowering, but |
| 327 | extended in two ways: |
| 328 | |
| 329 | * The set of available actions is wider, avoiding the currently very |
| 330 | overloaded ``Expand`` (which can cover everything from libcalls to |
| 331 | scalarization depending on the node's opcode). |
| 332 | |
| 333 | * Since there's no separate type legalization, independently varying |
| 334 | types on an instruction can have independent actions. For example a |
| 335 | ``G_ICMP`` has 2 independent types: the result and the inputs; we need |
| 336 | to be able to say that comparing 2 s32s is OK, but the s1 result |
| 337 | must be dealt with in another way. |
| 338 | |
| 339 | As such, the primary key when deciding what to do is the ``InstrAspect``, |
| 340 | essentially a tuple consisting of ``(Opcode, TypeIdx, Type)`` and mapping to a |
| 341 | suggested course of action. |
| 342 | |
| 343 | An example use might be: |
| 344 | |
| 345 | .. code-block:: c++ |
| 346 | |
| 347 | // The CPU can't deal with an s1 result, do something about it. |
| 348 | setAction({G_ICMP, 0, s1}, WidenScalar); |
| 349 | // An s32 input (the second type) is fine though. |
| 350 | setAction({G_ICMP, 1, s32}, Legal); |
| 351 | |
| 352 | |
| 353 | ``TODO``: |
| 354 | An alternative worth investigating is to generalize the API to represent |
| 355 | actions using ``std::function`` that implements the action, instead of explicit |
| 356 | enum tokens (``Legal``, ``WidenScalar``, ...). |
| 357 | |
| 358 | ``TODO``: |
| 359 | Moreover, we could use TableGen to initially infer legality of operation from |
| 360 | existing patterns (as any pattern we can select is by definition legal). |
| 361 | Expanding that to describe legalization actions is a much larger but |
| 362 | potentially useful project. |
| 363 | |
Ahmed Bougacha | bbf0c3a | 2016-11-04 17:57:34 +0000 | [diff] [blame] | 364 | .. _milegalizer-non-power-of-2: |
| 365 | |
| 366 | Non-power of 2 types |
| 367 | ^^^^^^^^^^^^^^^^^^^^ |
| 368 | |
| 369 | ``TODO``: |
| 370 | Types which have a size that isn't a power of 2 aren't currently supported. |
| 371 | The setAction API will probably require changes to support them. |
| 372 | Even notionally explicitly specified operations only make suggestions |
| 373 | like "Widen" or "Narrow". The eventual type is still unspecified and a |
| 374 | search is performed by repeated doubling/halving of the type's |
| 375 | size. |
| 376 | This is incorrect for types that aren't a power of 2. It's reasonable to |
| 377 | expect we could construct an efficient set of side-tables for more general |
| 378 | lookups though, encoding a map from the integers (i.e. the size of the current |
| 379 | type) to types (the legal size). |
| 380 | |
| 381 | .. _milegalizer-vector: |
| 382 | |
| 383 | Vector types |
| 384 | ^^^^^^^^^^^^ |
| 385 | |
| 386 | Vectors first get their element type legalized: ``<A x sB>`` becomes |
| 387 | ``<A x sC>`` such that at least one operation is legal with ``sC``. |
| 388 | |
| 389 | This is currently specified by the function ``setScalarInVectorAction``, called |
| 390 | for example as: |
| 391 | |
| 392 | setScalarInVectorAction(G_ICMP, s1, WidenScalar); |
| 393 | |
| 394 | Next the number of elements is chosen so that the entire operation is |
| 395 | legal. This aspect is not controllable at the moment, but probably |
| 396 | should be (you could imagine disagreements on whether a ``<2 x s8>`` |
| 397 | operation should be scalarized or extended to ``<8 x s8>``). |
| 398 | |
| 399 | |
| 400 | .. _regbankselect: |
| 401 | |
| 402 | RegBankSelect |
| 403 | ------------- |
| 404 | |
| 405 | This pass constrains the :ref:`gmir-gvregs` operands of generic |
| 406 | instructions to some :ref:`gmir-regbank`. |
| 407 | |
| 408 | It iteratively maps instructions to a set of per-operand bank assignment. |
| 409 | The possible mappings are determined by the target-provided |
| 410 | :ref:`RegisterBankInfo <api-registerbankinfo>`. |
| 411 | The mapping is then applied, possibly introducing ``COPY`` instructions if |
| 412 | necessary. |
| 413 | |
| 414 | It traverses the ``MachineFunction`` top down so that all operands are already |
| 415 | mapped when analyzing an instruction. |
| 416 | |
| 417 | This pass could also remap target-specific instructions when beneficial. |
| 418 | In the future, this could replace the ExeDepsFix pass, as we can directly |
| 419 | select the best variant for an instruction that's available on multiple banks. |
| 420 | |
| 421 | .. _api-registerbankinfo: |
| 422 | |
| 423 | API: RegisterBankInfo |
| 424 | ^^^^^^^^^^^^^^^^^^^^^ |
| 425 | |
| 426 | The ``RegisterBankInfo`` class describes multiple aspects of register banks. |
| 427 | |
| 428 | * **Banks**: ``addRegBankCoverage`` --- which register bank covers each |
| 429 | register class. |
| 430 | |
| 431 | * **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank |
| 432 | to another. |
| 433 | |
| 434 | * **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for |
| 435 | a given instruction. |
| 436 | |
| 437 | * **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other |
| 438 | possible bank assignments for a given instruction. |
| 439 | |
| 440 | ``TODO``: |
| 441 | All this information should eventually be static and generated by TableGen, |
| 442 | mostly using existing information augmented by bank descriptions. |
| 443 | |
| 444 | ``TODO``: |
| 445 | ``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping`` |
| 446 | because the latter is more expensive: as we move to static mapping info, |
| 447 | both methods should be free, and we should merge them. |
| 448 | |
| 449 | .. _regbankselect-modes: |
| 450 | |
| 451 | RegBankSelect Modes |
| 452 | ^^^^^^^^^^^^^^^^^^^ |
| 453 | |
| 454 | ``RegBankSelect`` currently has two modes: |
| 455 | |
| 456 | * **Fast** --- For each instruction, pick a target-provided "default" bank |
| 457 | assignment. This is the default at -O0. |
| 458 | |
| 459 | * **Greedy** --- For each instruction, pick the cheapest of several |
| 460 | target-provided bank assignment alternatives. |
| 461 | |
| 462 | We intend to eventually introduce an additional optimizing mode: |
| 463 | |
| 464 | * **Global** --- Across multiple instructions, pick the cheapest combination of |
| 465 | bank assignments. |
| 466 | |
| 467 | ``NOTE``: |
| 468 | On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at |
| 469 | backend -O1): because :ref:`gmir-llt` doesn't distinguish floating point from |
| 470 | integer scalars, the default assignment for loads and stores is the integer |
| 471 | bank, introducing cross-bank copies on most floating point operations. |
| 472 | |
| 473 | |
| 474 | .. _instructionselect: |
| 475 | |
| 476 | InstructionSelect |
| 477 | ----------------- |
| 478 | |
| 479 | This pass transforms generic machine instructions into equivalent |
| 480 | target-specific instructions. It traverses the ``MachineFunction`` bottom-up, |
| 481 | selecting uses before definitions, enabling trivial dead code elimination. |
| 482 | |
| 483 | .. _api-instructionselector: |
| 484 | |
| 485 | API: InstructionSelector |
| 486 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
| 487 | |
| 488 | The target implements the ``InstructionSelector`` class, containing the |
| 489 | target-specific selection logic proper. |
| 490 | |
| 491 | The instance is provided by the subtarget, so that it can specialize the |
| 492 | selector by subtarget feature (with, e.g., a vector selector overriding parts |
| 493 | of a general-purpose common selector). |
| 494 | We might also want to parameterize it by MachineFunction, to enable selector |
| 495 | variants based on function attributes like optsize. |
| 496 | |
| 497 | The simple API consists of: |
| 498 | |
| 499 | .. code-block:: c++ |
| 500 | |
| 501 | virtual bool select(MachineInstr &MI) |
| 502 | |
| 503 | This target-provided method is responsible for mutating (or replacing) a |
| 504 | possibly-generic MI into a fully target-specific equivalent. |
| 505 | It is also responsible for doing the necessary constraining of gvregs into the |
Daniel Sanders | 86274cd | 2017-10-23 17:18:44 +0000 | [diff] [blame] | 506 | appropriate register classes as well as passing through COPY instructions to |
| 507 | the register allocator. |
Ahmed Bougacha | bbf0c3a | 2016-11-04 17:57:34 +0000 | [diff] [blame] | 508 | |
| 509 | The ``InstructionSelector`` can fold other instructions into the selected MI, |
| 510 | by walking the use-def chain of the vreg operands. |
| 511 | As GlobalISel is Global, this folding can occur across basic blocks. |
| 512 | |
Daniel Sanders | 86274cd | 2017-10-23 17:18:44 +0000 | [diff] [blame] | 513 | SelectionDAG Rule Imports |
| 514 | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 515 | |
| 516 | TableGen will import SelectionDAG rules and provide the following function to |
| 517 | execute them: |
| 518 | |
| 519 | .. code-block:: c++ |
| 520 | |
| 521 | bool selectImpl(MachineInstr &MI) |
| 522 | |
| 523 | The ``--stats`` option can be used to determine what proportion of rules were |
| 524 | successfully imported. The easiest way to use this is to copy the |
| 525 | ``-gen-globalisel`` tablegen command from ``ninja -v`` and modify it. |
| 526 | |
| 527 | Similarly, the ``--warn-on-skipped-patterns`` option can be used to obtain the |
| 528 | reasons that rules weren't imported. This can be used to focus on the most |
| 529 | important rejection reasons. |
| 530 | |
| 531 | PatLeaf Predicates |
| 532 | ^^^^^^^^^^^^^^^^^^ |
| 533 | |
| 534 | PatLeafs cannot be imported because their C++ is implemented in terms of |
| 535 | ``SDNode`` objects. PatLeafs that handle immediate predicates should be |
| 536 | replaced by ``ImmLeaf``, ``IntImmLeaf``, or ``FPImmLeaf`` as appropriate. |
| 537 | |
| 538 | There's no standard answer for other PatLeafs. Some standard predicates have |
| 539 | been baked into TableGen but this should not generally be done. |
| 540 | |
| 541 | Custom SDNodes |
| 542 | ^^^^^^^^^^^^^^ |
| 543 | |
| 544 | Custom SDNodes should be mapped to Target Pseudos using ``GINodeEquiv``. This |
| 545 | will cause the instruction selector to import them but you will also need to |
| 546 | ensure the target pseudo is introduced to the MIR before the instruction |
| 547 | selector. Any preceeding pass is suitable but the legalizer will be a |
| 548 | particularly common choice. |
| 549 | |
| 550 | ComplexPatterns |
| 551 | ^^^^^^^^^^^^^^^ |
| 552 | |
| 553 | ComplexPatterns cannot be imported because their C++ is implemented in terms of |
| 554 | ``SDNode`` objects. GlobalISel versions should be defined with |
| 555 | ``GIComplexOperandMatcher`` and mapped to ComplexPattern with |
| 556 | ``GIComplexPatternEquiv``. |
| 557 | |
| 558 | The following predicates are useful for porting ComplexPattern: |
| 559 | |
| 560 | * isBaseWithConstantOffset() - Check for base+offset structures |
| 561 | * isOperandImmEqual() - Check for a particular constant |
| 562 | * isObviouslySafeToFold() - Check for reasons an instruction can't be sunk and folded into another. |
| 563 | |
| 564 | There are some important points for the C++ implementation: |
| 565 | |
| 566 | * Don't modify MIR in the predicate |
| 567 | * Renderer lambdas should capture by value to avoid use-after-free. They will be used after the predicate returns. |
| 568 | * Only create instructions in a renderer lambda. GlobalISel won't clean up things you create but don't use. |
Ahmed Bougacha | bbf0c3a | 2016-11-04 17:57:34 +0000 | [diff] [blame] | 569 | |
| 570 | |
| 571 | .. _maintainability: |
| 572 | |
| 573 | Maintainability |
| 574 | =============== |
| 575 | |
| 576 | .. _maintainability-iterative: |
| 577 | |
| 578 | Iterative Transformations |
| 579 | ------------------------- |
| 580 | |
| 581 | Passes are split into small, iterative transformations, with all state |
| 582 | represented in the MIR. |
| 583 | |
| 584 | This differs from SelectionDAG (in particular, the legalizer) using various |
| 585 | in-memory side-tables. |
| 586 | |
| 587 | |
| 588 | .. _maintainability-mir: |
| 589 | |
| 590 | MIR Serialization |
| 591 | ----------------- |
| 592 | |
| 593 | .. FIXME: Update the MIRLangRef to include GMI additions. |
| 594 | |
| 595 | :ref:`gmir` is serializable (see :doc:`MIRLangRef`). |
| 596 | Combined with :ref:`maintainability-iterative`, this enables much finer-grained |
| 597 | testing, rather than requiring large and fragile IR-to-assembly tests. |
| 598 | |
| 599 | The current "stage" in the :ref:`pipeline` is represented by a set of |
| 600 | ``MachineFunctionProperties``: |
| 601 | |
| 602 | * ``legalized`` |
| 603 | * ``regBankSelected`` |
| 604 | * ``selected`` |
| 605 | |
| 606 | |
| 607 | .. _maintainability-verifier: |
| 608 | |
| 609 | MachineVerifier |
| 610 | --------------- |
| 611 | |
| 612 | The pass approach lets us use the ``MachineVerifier`` to enforce invariants. |
| 613 | For instance, a ``regBankSelected`` function may not have gvregs without |
| 614 | a bank. |
| 615 | |
| 616 | ``TODO``: |
| 617 | The ``MachineVerifier`` being monolithic, some of the checks we want to do |
| 618 | can't be integrated to it: GlobalISel is a separate library, so we can't |
| 619 | directly reference it from CodeGen. For instance, legality checks are |
| 620 | currently done in RegBankSelect/InstructionSelect proper. We could #ifdef out |
| 621 | the checks, or we could add some sort of verifier API. |
| 622 | |
| 623 | |
| 624 | .. _progress: |
| 625 | |
| 626 | Progress and Future Work |
| 627 | ======================== |
| 628 | |
| 629 | The initial goal is to replace FastISel on AArch64. The next step will be to |
| 630 | replace SelectionDAG as the optimized ISel. |
| 631 | |
| 632 | ``NOTE``: |
| 633 | While we iterate on GlobalISel, we strive to avoid affecting the performance of |
| 634 | SelectionDAG, FastISel, or the other MIR passes. For instance, the types of |
| 635 | :ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``, |
| 636 | that is destroyed after :ref:`instructionselect`. |
| 637 | |
| 638 | .. _progress-fastisel: |
| 639 | |
| 640 | FastISel Replacement |
| 641 | -------------------- |
| 642 | |
| 643 | For the initial FastISel replacement, we intend to fallback to SelectionDAG on |
| 644 | selection failures. |
| 645 | |
| 646 | Currently, compile-time of the fast pipeline is within 1.5x of FastISel. |
| 647 | We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be |
| 648 | challenging given the multi-pass approach. |
| 649 | Still, supporting all IR (via a complete legalizer) and avoiding the fallback |
| 650 | to SelectionDAG in the worst case should enable better amortized performance |
| 651 | than SelectionDAG+FastISel. |
| 652 | |
| 653 | ``NOTE``: |
| 654 | We considered never having a fallback to SelectionDAG, instead deciding early |
| 655 | whether a given function is supported by GlobalISel or not. The decision would |
| 656 | be based on :ref:`milegalizer` queries. |
| 657 | We abandoned that for two reasons: |
| 658 | a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`; |
| 659 | b) to be robust against unforeseen failures and to enable iterative |
| 660 | improvements. |
| 661 | |
| 662 | .. _progress-targets: |
| 663 | |
| 664 | Support For Other Targets |
| 665 | ------------------------- |
| 666 | |
| 667 | In parallel, we're investigating adding support for other - ideally quite |
| 668 | different - targets. For instance, there is some initial AMDGPU support. |
| 669 | |
| 670 | |
| 671 | .. _porting: |
| 672 | |
| 673 | Porting GlobalISel to A New Target |
| 674 | ================================== |
| 675 | |
| 676 | There are four major classes to implement by the target: |
| 677 | |
| 678 | * :ref:`CallLowering <api-calllowering>` --- lower calls, returns, and arguments |
| 679 | according to the ABI. |
| 680 | * :ref:`RegisterBankInfo <api-registerbankinfo>` --- describe |
| 681 | :ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of |
| 682 | operands onto banks for each instruction. |
| 683 | * :ref:`LegalizerInfo <api-legalizerinfo>` --- describe what is legal, and how |
| 684 | to legalize what isn't. |
| 685 | * :ref:`InstructionSelector <api-instructionselector>` --- select generic MIR |
| 686 | to target-specific MIR. |
| 687 | |
| 688 | Additionally: |
| 689 | |
| 690 | * ``TargetPassConfig`` --- create the passes constituting the pipeline, |
| 691 | including additional passes not included in the :ref:`pipeline`. |
Daniel Sanders | 86274cd | 2017-10-23 17:18:44 +0000 | [diff] [blame] | 692 | |
| 693 | .. _other_resources: |
| 694 | |
| 695 | Resources |
| 696 | ========= |
| 697 | |
| 698 | * `Global Instruction Selection - A Proposal by Quentin Colombet @LLVMDevMeeting 2015 <https://www.youtube.com/watch?v=F6GGbYtae3g>`_ |
| 699 | * `Global Instruction Selection - Status by Quentin Colombet, Ahmed Bougacha, and Tim Northover @LLVMDevMeeting 2016 <https://www.youtube.com/watch?v=6tfb344A7w8>`_ |
| 700 | * `GlobalISel - LLVM's Latest Instruction Selection Framework by Diana Picus @FOSDEM17 <https://www.youtube.com/watch?v=d6dF6E4BPeU>`_ |
| 701 | * GlobalISel: Past, Present, and Future by Quentin Colombet and Ahmed Bougacha @LLVMDevMeeting 2017 |
| 702 | * Head First into GlobalISel by Daniel Sanders, Aditya Nandakumar, and Justin Bogner @LLVMDevMeeting 2017 |