Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 1 | =================================== |
| 2 | Stack maps and patch points in LLVM |
| 3 | =================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | :depth: 2 |
| 8 | |
| 9 | Definitions |
| 10 | =========== |
| 11 | |
| 12 | In this document we refer to the "runtime" collectively as all |
| 13 | components that serve as the LLVM client, including the LLVM IR |
| 14 | generator, object code consumer, and code patcher. |
| 15 | |
| 16 | A stack map records the location of ``live values`` at a particular |
| 17 | instruction address. These ``live values`` do not refer to all the |
| 18 | LLVM values live across the stack map. Instead, they are only the |
| 19 | values that the runtime requires to be live at this point. For |
| 20 | example, they may be the values the runtime will need to resume |
| 21 | program execution at that point independent of the compiled function |
| 22 | containing the stack map. |
| 23 | |
| 24 | LLVM emits stack map data into the object code within a designated |
| 25 | :ref:`stackmap-section`. This stack map data contains a record for |
| 26 | each stack map. The record stores the stack map's instruction address |
| 27 | and contains a entry for each mapped value. Each entry encodes a |
| 28 | value's location as a register, stack offset, or constant. |
| 29 | |
| 30 | A patch point is an instruction address at which space is reserved for |
| 31 | patching a new instruction sequence at run time. Patch points look |
| 32 | much like calls to LLVM. They take arguments that follow a calling |
| 33 | convention and may return a value. They also imply stack map |
| 34 | generation, which allows the runtime to locate the patchpoint and |
| 35 | find the location of ``live values`` at that point. |
| 36 | |
| 37 | Motivation |
| 38 | ========== |
| 39 | |
| 40 | This functionality is currently experimental but is potentially useful |
| 41 | in a variety of settings, the most obvious being a runtime (JIT) |
| 42 | compiler. Example applications of the patchpoint intrinsics are |
| 43 | implementing an inline call cache for polymorphic method dispatch or |
| 44 | optimizing the retrieval of properties in dynamically typed languages |
| 45 | such as JavaScript. |
| 46 | |
| 47 | The intrinsics documented here are currently used by the JavaScript |
| 48 | compiler within the open source WebKit project, see the `FTL JIT |
| 49 | <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be |
| 50 | used whenever stack maps or code patching are needed. Because the |
| 51 | intrinsics have experimental status, compatibility across LLVM |
| 52 | releases is not guaranteed. |
| 53 | |
| 54 | The stack map functionality described in this document is separate |
| 55 | from the functionality described in |
| 56 | :ref:`stack-map`. `GCFunctionMetadata` provides the location of |
| 57 | pointers into a collected heap captured by the `GCRoot` intrinsic, |
| 58 | which can also be considered a "stack map". Unlike the stack maps |
| 59 | defined above, the `GCFunctionMetadata` stack map interface does not |
| 60 | provide a way to associate live register values of arbitrary type with |
| 61 | an instruction address, nor does it specify a format for the resulting |
| 62 | stack map. The stack maps described here could potentially provide |
| 63 | richer information to a garbage collecting runtime, but that usage |
| 64 | will not be discussed in this document. |
| 65 | |
| 66 | Intrinsics |
| 67 | ========== |
| 68 | |
| 69 | The following two kinds of intrinsics can be used to implement stack |
| 70 | maps and patch points: ``llvm.experimental.stackmap`` and |
| 71 | ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a |
| 72 | stack map record, and they both allow some form of code patching. They |
| 73 | can be used independently (i.e. ``llvm.experimental.patchpoint`` |
| 74 | implicitly generates a stack map without the need for an additional |
| 75 | call to ``llvm.experimental.stackmap``). The choice of which to use |
| 76 | depends on whether it is necessary to reserve space for code patching |
| 77 | and whether any of the intrinsic arguments should be lowered according |
| 78 | to calling conventions. ``llvm.experimental.stackmap`` does not |
| 79 | reserve any space, nor does it expect any call arguments. If the |
| 80 | runtime patches code at the stack map's address, it will destructively |
| 81 | overwrite the program text. This is unlike |
| 82 | ``llvm.experimental.patchpoint``, which reserves space for in-place |
| 83 | patching without overwriting surrounding code. The |
| 84 | ``llvm.experimental.patchpoint`` intrinsic also lowers a specified |
| 85 | number of arguments according to its calling convention. This allows |
| 86 | patched code to make in-place function calls without marshaling. |
| 87 | |
| 88 | Each instance of one of these intrinsics generates a stack map record |
| 89 | in the :ref:`stackmap-section`. The record includes an ID, allowing |
| 90 | the runtime to uniquely identify the stack map, and the offset within |
| 91 | the code from the beginning of the enclosing function. |
| 92 | |
| 93 | '``llvm.experimental.stackmap``' Intrinsic |
| 94 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 95 | |
| 96 | Syntax: |
| 97 | """"""" |
| 98 | |
| 99 | :: |
| 100 | |
| 101 | declare void |
| 102 | @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...) |
| 103 | |
| 104 | Overview: |
| 105 | """"""""" |
| 106 | |
| 107 | The '``llvm.experimental.stackmap``' intrinsic records the location of |
| 108 | specified values in the stack map without generating any code. |
| 109 | |
| 110 | Operands: |
| 111 | """"""""" |
| 112 | |
| 113 | The first operand is an ID to be encoded within the stack map. The |
| 114 | second operand is the number of shadow bytes following the |
| 115 | intrinsic. The variable number of operands that follow are the ``live |
| 116 | values`` for which locations will be recorded in the stack map. |
| 117 | |
| 118 | To use this intrinsic as a bare-bones stack map, with no code patching |
| 119 | support, the number of shadow bytes can be set to zero. |
| 120 | |
| 121 | Semantics: |
| 122 | """""""""" |
| 123 | |
| 124 | The stack map intrinsic generates no code in place, unless nops are |
| 125 | needed to cover its shadow (see below). However, its offset from |
| 126 | function entry is stored in the stack map. This is the relative |
| 127 | instruction address immediately following the instructions that |
| 128 | precede the stack map. |
| 129 | |
| 130 | The stack map ID allows a runtime to locate the desired stack map |
| 131 | record. LLVM passes this ID through directly to the stack map |
| 132 | record without checking uniqueness. |
| 133 | |
| 134 | LLVM guarantees a shadow of instructions following the stack map's |
| 135 | instruction offset during which neither the end of the basic block nor |
| 136 | another call to ``llvm.experimental.stackmap`` or |
| 137 | ``llvm.experimental.patchpoint`` may occur. This allows the runtime to |
| 138 | patch the code at this point in response to an event triggered from |
| 139 | outside the code. The code for instructions following the stack map |
| 140 | may be emitted in the stack map's shadow, and these instructions may |
| 141 | be overwritten by destructive patching. Without shadow bytes, this |
| 142 | destructive patching could overwrite program text or data outside the |
| 143 | current function. We disallow overlapping stack map shadows so that |
| 144 | the runtime does not need to consider this corner case. |
| 145 | |
| 146 | For example, a stack map with 8 byte shadow: |
| 147 | |
| 148 | .. code-block:: llvm |
| 149 | |
| 150 | call void @runtime() |
| 151 | call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8, |
| 152 | i64* %ptr) |
| 153 | %val = load i64* %ptr |
| 154 | %add = add i64 %val, 3 |
| 155 | ret i64 %add |
| 156 | |
| 157 | May require one byte of nop-padding: |
| 158 | |
| 159 | .. code-block:: none |
| 160 | |
| 161 | 0x00 callq _runtime |
| 162 | 0x05 nop <--- stack map address |
| 163 | 0x06 movq (%rdi), %rax |
| 164 | 0x07 addq $3, %rax |
| 165 | 0x0a popq %rdx |
| 166 | 0x0b ret <---- end of 8-byte shadow |
| 167 | |
| 168 | Now, if the runtime needs to invalidate the compiled code, it may |
| 169 | patch 8 bytes of code at the stack map's address at follows: |
| 170 | |
| 171 | .. code-block:: none |
| 172 | |
| 173 | 0x00 callq _runtime |
| 174 | 0x05 movl $0xffff, %rax <--- patched code at stack map address |
| 175 | 0x0a callq *%rax <---- end of 8-byte shadow |
| 176 | |
| 177 | This way, after the normal call to the runtime returns, the code will |
| 178 | execute a patched call to a special entry point that can rebuild a |
| 179 | stack frame from the values located by the stack map. |
| 180 | |
| 181 | '``llvm.experimental.patchpoint.*``' Intrinsic |
| 182 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 183 | |
| 184 | Syntax: |
| 185 | """"""" |
| 186 | |
| 187 | :: |
| 188 | |
| 189 | declare void |
| 190 | @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>, |
| 191 | i8* <target>, i32 <numArgs>, ...) |
| 192 | declare i64 |
| 193 | @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>, |
| 194 | i8* <target>, i32 <numArgs>, ...) |
| 195 | |
| 196 | Overview: |
| 197 | """"""""" |
| 198 | |
| 199 | The '``llvm.experimental.patchpoint.*``' intrinsics creates a function |
| 200 | call to the specified ``<target>`` and records the location of specified |
| 201 | values in the stack map. |
| 202 | |
| 203 | Operands: |
| 204 | """"""""" |
| 205 | |
| 206 | The first operand is an ID, the second operand is the number of bytes |
| 207 | reserved for the patchable region, the third operand is the target |
| 208 | address of a function (optionally null), and the fourth operand |
| 209 | specifies how many of the following variable operands are considered |
| 210 | function call arguments. The remaining variable number of operands are |
| 211 | the ``live values`` for which locations will be recorded in the stack |
| 212 | map. |
| 213 | |
| 214 | Semantics: |
| 215 | """""""""" |
| 216 | |
| 217 | The patch point intrinsic generates a stack map. It also emits a |
| 218 | function call to the address specified by ``<target>`` if the address |
| 219 | is not a constant null. The function call and its arguments are |
| 220 | lowered according to the calling convention specified at the |
| 221 | intrinsic's callsite. Variants of the intrinsic with non-void return |
| 222 | type also return a value according to calling convention. |
| 223 | |
Hal Finkel | a8eaf29 | 2015-07-14 22:26:06 +0000 | [diff] [blame] | 224 | On PowerPC, note that ``<target>`` must be the ABI function pointer for the |
| 225 | intended target of the indirect call. Specifically, when compiling for the |
| 226 | ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as |
| 227 | the C/C++ function-pointer representation. |
Hal Finkel | ade705c | 2015-01-14 01:07:51 +0000 | [diff] [blame] | 228 | |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 229 | Requesting zero patch point arguments is valid. In this case, all |
| 230 | variable operands are handled just like |
| 231 | ``llvm.experimental.stackmap.*``. The difference is that space will |
| 232 | still be reserved for patching, a call will be emitted, and a return |
| 233 | value is allowed. |
| 234 | |
| 235 | The location of the arguments are not normally recorded in the stack |
| 236 | map because they are already fixed by the calling convention. The |
| 237 | remaining ``live values`` will have their location recorded, which |
| 238 | could be a register, stack location, or constant. A special calling |
| 239 | convention has been introduced for use with stack maps, anyregcc, |
| 240 | which forces the arguments to be loaded into registers but allows |
| 241 | those register to be dynamically allocated. These argument registers |
| 242 | will have their register locations recorded in the stack map in |
| 243 | addition to the remaining ``live values``. |
| 244 | |
| 245 | The patch point also emits nops to cover at least ``<numBytes>`` of |
| 246 | instruction encoding space. Hence, the client must ensure that |
| 247 | ``<numBytes>`` is enough to encode a call to the target address on the |
| 248 | supported targets. If the call target is constant null, then there is |
| 249 | no minimum requirement. A zero-byte null target patchpoint is |
| 250 | valid. |
| 251 | |
| 252 | The runtime may patch the code emitted for the patch point, including |
| 253 | the call sequence and nops. However, the runtime may not assume |
| 254 | anything about the code LLVM emits within the reserved space. Partial |
| 255 | patching is not allowed. The runtime must patch all reserved bytes, |
| 256 | padding with nops if necessary. |
| 257 | |
| 258 | This example shows a patch point reserving 15 bytes, with one argument |
| 259 | in $rdi, and a return value in $rax per native calling convention: |
| 260 | |
| 261 | .. code-block:: llvm |
| 262 | |
| 263 | %target = inttoptr i64 -281474976710654 to i8* |
| 264 | %val = call i64 (i64, i32, ...)* |
| 265 | @llvm.experimental.patchpoint.i64(i64 78, i32 15, |
| 266 | i8* %target, i32 1, i64* %ptr) |
| 267 | %add = add i64 %val, 3 |
| 268 | ret i64 %add |
| 269 | |
| 270 | May generate: |
| 271 | |
| 272 | .. code-block:: none |
| 273 | |
| 274 | 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address |
| 275 | 0x0a callq *%r11 |
| 276 | 0x0d nop |
| 277 | 0x0e nop <--- end of reserved 15-bytes |
| 278 | 0x0f addq $0x3, %rax |
| 279 | 0x10 movl %rax, 8(%rsp) |
| 280 | |
| 281 | Note that no stack map locations will be recorded. If the patched code |
| 282 | sequence does not need arguments fixed to specific calling convention |
| 283 | registers, then the ``anyregcc`` convention may be used: |
| 284 | |
| 285 | .. code-block:: none |
| 286 | |
| 287 | %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15, |
| 288 | i8* %target, i32 1, |
| 289 | i64* %ptr) |
| 290 | |
| 291 | The stack map now indicates the location of the %ptr argument and |
| 292 | return value: |
| 293 | |
| 294 | .. code-block:: none |
| 295 | |
| 296 | Stack Map: ID=78, Loc0=%r9 Loc1=%r8 |
| 297 | |
| 298 | The patch code sequence may now use the argument that happened to be |
| 299 | allocated in %r8 and return a value allocated in %r9: |
| 300 | |
| 301 | .. code-block:: none |
| 302 | |
| 303 | 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address |
| 304 | 0x03 nop |
| 305 | ... |
| 306 | 0x0e nop <--- end of reserved 15-bytes |
| 307 | 0x0f addq $0x3, %r9 |
| 308 | 0x10 movl %r9, 8(%rsp) |
| 309 | |
| 310 | .. _stackmap-format: |
| 311 | |
| 312 | Stack Map Format |
| 313 | ================ |
| 314 | |
| 315 | The existence of a stack map or patch point intrinsic within an LLVM |
| 316 | Module forces code emission to create a :ref:`stackmap-section`. The |
| 317 | format of this section follows: |
| 318 | |
| 319 | .. code-block:: none |
| 320 | |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 321 | Header { |
Sanjoy Das | edb3c90 | 2017-04-28 04:48:42 +0000 | [diff] [blame] | 322 | uint8 : Stack Map Version (current version is 3) |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 323 | uint8 : Reserved (expected to be 0) |
| 324 | uint16 : Reserved (expected to be 0) |
Juergen Ributzka | 014fdcd | 2014-01-30 18:58:27 +0000 | [diff] [blame] | 325 | } |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 326 | uint32 : NumFunctions |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 327 | uint32 : NumConstants |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 328 | uint32 : NumRecords |
| 329 | StkSizeRecord[NumFunctions] { |
| 330 | uint64 : Function Address |
| 331 | uint64 : Stack Size |
Sanjoy Das | 9becdee | 2016-09-14 20:22:03 +0000 | [diff] [blame] | 332 | uint64 : Record Count |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 333 | } |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 334 | Constants[NumConstants] { |
| 335 | uint64 : LargeConstant |
| 336 | } |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 337 | StkMapRecord[NumRecords] { |
| 338 | uint64 : PatchPoint ID |
| 339 | uint32 : Instruction Offset |
| 340 | uint16 : Reserved (record flags) |
| 341 | uint16 : NumLocations |
| 342 | Location[NumLocations] { |
| 343 | uint8 : Register | Direct | Indirect | Constant | ConstantIndex |
Sanjoy Das | edb3c90 | 2017-04-28 04:48:42 +0000 | [diff] [blame] | 344 | uint8 : Reserved (expected to be 0) |
| 345 | uint16 : Location Size |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 346 | uint16 : Dwarf RegNum |
Sanjoy Das | edb3c90 | 2017-04-28 04:48:42 +0000 | [diff] [blame] | 347 | uint16 : Reserved (expected to be 0) |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 348 | int32 : Offset or SmallConstant |
| 349 | } |
Sanjoy Das | edb3c90 | 2017-04-28 04:48:42 +0000 | [diff] [blame] | 350 | uint32 : Padding (only if required to align to 8 byte) |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 351 | uint16 : Padding |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 352 | uint16 : NumLiveOuts |
| 353 | LiveOuts[NumLiveOuts] |
| 354 | uint16 : Dwarf RegNum |
| 355 | uint8 : Reserved |
| 356 | uint8 : Size in Bytes |
| 357 | } |
Juergen Ributzka | d8c9577 | 2014-03-31 22:14:04 +0000 | [diff] [blame] | 358 | uint32 : Padding (only if required to align to 8 byte) |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 359 | } |
| 360 | |
| 361 | The first byte of each location encodes a type that indicates how to |
| 362 | interpret the ``RegNum`` and ``Offset`` fields as follows: |
| 363 | |
| 364 | ======== ========== =================== =========================== |
| 365 | Encoding Type Value Description |
| 366 | -------- ---------- ------------------- --------------------------- |
| 367 | 0x1 Register Reg Value in a register |
| 368 | 0x2 Direct Reg + Offset Frame index value |
| 369 | 0x3 Indirect [Reg + Offset] Spilled value |
| 370 | 0x4 Constant Offset Small constant |
| 371 | 0x5 ConstIndex Constants[Offset] Large constant |
| 372 | ======== ========== =================== =========================== |
| 373 | |
| 374 | In the common case, a value is available in a register, and the |
| 375 | ``Offset`` field will be zero. Values spilled to the stack are encoded |
| 376 | as ``Indirect`` locations. The runtime must load those values from a |
| 377 | stack address, typically in the form ``[BP + Offset]``. If an |
| 378 | ``alloca`` value is passed directly to a stack map intrinsic, then |
| 379 | LLVM may fold the frame index into the stack map as an optimization to |
| 380 | avoid allocating a register or stack slot. These frame indices will be |
| 381 | encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may |
| 382 | also optimize constants by emitting them directly in the stack map, |
| 383 | either in the ``Offset`` of a ``Constant`` location or in the constant |
| 384 | pool, referred to by ``ConstantIndex`` locations. |
| 385 | |
| 386 | At each callsite, a "liveout" register list is also recorded. These |
| 387 | are the registers that are live across the stackmap and therefore must |
| 388 | be saved by the runtime. This is an important optimization when the |
| 389 | patchpoint intrinsic is used with a calling convention that by default |
| 390 | preserves most registers as callee-save. |
| 391 | |
| 392 | Each entry in the liveout register list contains a DWARF register |
| 393 | number and size in bytes. The stackmap format deliberately omits |
| 394 | specific subregister information. Instead the runtime must interpret |
| 395 | this information conservatively. For example, if the stackmap reports |
| 396 | one byte at ``%rax``, then the value may be in either ``%al`` or |
| 397 | ``%ah``. It doesn't matter in practice, because the runtime will |
| 398 | simply save ``%rax``. However, if the stackmap reports 16 bytes at |
| 399 | ``%ymm0``, then the runtime can safely optimize by saving only |
| 400 | ``%xmm0``. |
| 401 | |
| 402 | The stack map format is a contract between an LLVM SVN revision and |
| 403 | the runtime. It is currently experimental and may change in the short |
| 404 | term, but minimizing the need to update the runtime is |
| 405 | important. Consequently, the stack map design is motivated by |
| 406 | simplicity and extensibility. Compactness of the representation is |
| 407 | secondary because the runtime is expected to parse the data |
| 408 | immediately after compiling a module and encode the information in its |
| 409 | own format. Since the runtime controls the allocation of sections, it |
| 410 | can reuse the same stack map space for multiple modules. |
| 411 | |
Andrew Trick | c8ce967 | 2014-04-03 07:08:21 +0000 | [diff] [blame] | 412 | Stackmap support is currently only implemented for 64-bit |
| 413 | platforms. However, a 32-bit implementation should be able to use the |
| 414 | same format with an insignificant amount of wasted space. |
Andrew Trick | 3b8ad82 | 2014-04-03 07:03:28 +0000 | [diff] [blame] | 415 | |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 416 | .. _stackmap-section: |
| 417 | |
| 418 | Stack Map Section |
| 419 | ^^^^^^^^^^^^^^^^^ |
| 420 | |
| 421 | A JIT compiler can easily access this section by providing its own |
| 422 | memory manager via the LLVM C API |
| 423 | ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory |
| 424 | manager, the JIT provides a callback: |
| 425 | ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates |
| 426 | this section, it invokes the callback and passes the section name. The |
| 427 | JIT can record the in-memory address of the section at this time and |
| 428 | later parse it to recover the stack map data. |
| 429 | |
Philip Reames | 0cdc04d | 2018-11-08 17:20:35 +0000 | [diff] [blame] | 430 | For MachO (e.g. on Darwin), the stack map section name is |
| 431 | "__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS". |
| 432 | |
| 433 | For ELF (e.g. on Linux), the stack map section name is |
| 434 | ".llvm_stackmaps". The segment name is "__LLVM_STACKMAPS". |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 435 | |
| 436 | Stack Map Usage |
| 437 | =============== |
| 438 | |
| 439 | The stack map support described in this document can be used to |
| 440 | precisely determine the location of values at a specific position in |
| 441 | the code. LLVM does not maintain any mapping between those values and |
| 442 | any higher-level entity. The runtime must be able to interpret the |
| 443 | stack map record given only the ID, offset, and the order of the |
Sanjoy Das | 9becdee | 2016-09-14 20:22:03 +0000 | [diff] [blame] | 444 | locations, records, and functions, which LLVM preserves. |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 445 | |
| 446 | Note that this is quite different from the goal of debug information, |
| 447 | which is a best-effort attempt to track the location of named |
| 448 | variables at every instruction. |
| 449 | |
| 450 | An important motivation for this design is to allow a runtime to |
| 451 | commandeer a stack frame when execution reaches an instruction address |
| 452 | associated with a stack map. The runtime must be able to rebuild a |
| 453 | stack frame and resume program execution using the information |
| 454 | provided by the stack map. For example, execution may resume in an |
| 455 | interpreter or a recompiled version of the same function. |
| 456 | |
| 457 | This usage restricts LLVM optimization. Clearly, LLVM must not move |
| 458 | stores across a stack map. However, loads must also be handled |
| 459 | conservatively. If the load may trigger an exception, hoisting it |
| 460 | above a stack map could be invalid. For example, the runtime may |
| 461 | determine that a load is safe to execute without a type check given |
| 462 | the current state of the type system. If the type system changes while |
| 463 | some activation of the load's function exists on the stack, the load |
| 464 | becomes unsafe. The runtime can prevent subsequent execution of that |
| 465 | load by immediately patching any stack map location that lies between |
| 466 | the current call site and the load (typically, the runtime would |
| 467 | simply patch all stack map locations to invalidate the function). If |
| 468 | the compiler had hoisted the load above the stack map, then the |
| 469 | program could crash before the runtime could take back control. |
| 470 | |
| 471 | To enforce these semantics, stackmap and patchpoint intrinsics are |
| 472 | considered to potentially read and write all memory. This may limit |
Andrew Trick | 3b8ad82 | 2014-04-03 07:03:28 +0000 | [diff] [blame] | 473 | optimization more than some clients desire. This limitation may be |
| 474 | avoided by marking the call site as "readonly". In the future we may |
| 475 | also allow meta-data to be added to the intrinsic call to express |
| 476 | aliasing, thereby allowing optimizations to hoist certain loads above |
| 477 | stack maps. |
Andrew Trick | e97b132 | 2013-12-24 02:57:25 +0000 | [diff] [blame] | 478 | |
| 479 | Direct Stack Map Entries |
| 480 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
| 481 | |
| 482 | As shown in :ref:`stackmap-section`, a Direct stack map location |
| 483 | records the address of frame index. This address is itself the value |
| 484 | that the runtime requested. This differs from Indirect locations, |
| 485 | which refer to a stack locations from which the requested values must |
| 486 | be loaded. Direct locations can communicate the address if an alloca, |
| 487 | while Indirect locations handle register spills. |
| 488 | |
| 489 | For example: |
| 490 | |
| 491 | .. code-block:: none |
| 492 | |
| 493 | entry: |
| 494 | %a = alloca i64... |
| 495 | llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a) |
| 496 | |
| 497 | The runtime can determine this alloca's relative location on the |
| 498 | stack immediately after compilation, or at any time thereafter. This |
| 499 | differs from Register and Indirect locations, because the runtime can |
| 500 | only read the values in those locations when execution reaches the |
| 501 | instruction address of the stack map. |
| 502 | |
| 503 | This functionality requires LLVM to treat entry-block allocas |
| 504 | specially when they are directly consumed by an intrinsics. (This is |
| 505 | the same requirement imposed by the llvm.gcroot intrinsic.) LLVM |
| 506 | transformations must not substitute the alloca with any intervening |
| 507 | value. This can be verified by the runtime simply by checking that the |
| 508 | stack map's location is a Direct location type. |
Philip Reames | edcb51f | 2015-07-16 21:10:46 +0000 | [diff] [blame] | 509 | |
| 510 | |
| 511 | Supported Architectures |
| 512 | ======================= |
| 513 | |
| 514 | Support for StackMap generation and the related intrinsics requires |
| 515 | some code for each backend. Today, only a subset of LLVM's backends |
| 516 | are supported. The currently supported architectures are X86_64, |
| 517 | PowerPC, and Aarch64. |