Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 1 | ========================================== |
| 2 | Design and Usage of the InAlloca Attribute |
| 3 | ========================================== |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 8 | The :ref:`inalloca <attr_inalloca>` attribute is designed to allow |
| 9 | taking the address of an aggregate argument that is being passed by |
| 10 | value through memory. Primarily, this feature is required for |
| 11 | compatibility with the Microsoft C++ ABI. Under that ABI, class |
| 12 | instances that are passed by value are constructed directly into |
| 13 | argument stack memory. Prior to the addition of inalloca, calls in LLVM |
| 14 | were indivisible instructions. There was no way to perform intermediate |
| 15 | work, such as object construction, between the first stack adjustment |
| 16 | and the final control transfer. With inalloca, all arguments passed in |
| 17 | memory are modelled as a single alloca, which can be stored to prior to |
| 18 | the call. Unfortunately, this complicated feature comes with a large |
| 19 | set of restrictions designed to bound the lifetime of the argument |
| 20 | memory around the call. |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 21 | |
| 22 | For now, it is recommended that frontends and optimizers avoid producing |
| 23 | this construct, primarily because it forces the use of a base pointer. |
| 24 | This feature may grow in the future to allow general mid-level |
| 25 | optimization, but for now, it should be regarded as less efficient than |
| 26 | passing by value with a copy. |
| 27 | |
| 28 | Intended Usage |
| 29 | ============== |
| 30 | |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 31 | The example below is the intended LLVM IR lowering for some C++ code |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 32 | that passes two default-constructed ``Foo`` objects to ``g`` in the |
| 33 | 32-bit Microsoft C++ ABI. |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 34 | |
| 35 | .. code-block:: c++ |
| 36 | |
| 37 | // Foo is non-trivial. |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 38 | struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); }; |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 39 | void g(Foo a, Foo b); |
| 40 | void f() { |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 41 | g(Foo(), Foo()); |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 42 | } |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 43 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 44 | .. code-block:: text |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 45 | |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 46 | %struct.Foo = type { i32, i32 } |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 47 | declare void @Foo_ctor(%struct.Foo* %this) |
| 48 | declare void @Foo_dtor(%struct.Foo* %this) |
| 49 | declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 50 | |
| 51 | define void @f() { |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 52 | entry: |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 53 | %base = call i8* @llvm.stacksave() |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 54 | %memargs = alloca <{ %struct.Foo, %struct.Foo }> |
Reid Kleckner | 7d3c3163 | 2014-03-27 01:38:48 +0000 | [diff] [blame] | 55 | %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1 |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 56 | call void @Foo_ctor(%struct.Foo* %b) |
| 57 | |
| 58 | ; If a's ctor throws, we must destruct b. |
Reid Kleckner | 7d3c3163 | 2014-03-27 01:38:48 +0000 | [diff] [blame] | 59 | %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0 |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 60 | invoke void @Foo_ctor(%struct.Foo* %a) |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 61 | to label %invoke.cont unwind %invoke.unwind |
| 62 | |
| 63 | invoke.cont: |
Reid Kleckner | 2ce2122 | 2014-03-27 01:32:22 +0000 | [diff] [blame] | 64 | call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 65 | call void @llvm.stackrestore(i8* %base) |
| 66 | ... |
| 67 | |
| 68 | invoke.unwind: |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 69 | call void @Foo_dtor(%struct.Foo* %b) |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 70 | call void @llvm.stackrestore(i8* %base) |
| 71 | ... |
| 72 | } |
| 73 | |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 74 | To avoid stack leaks, the frontend saves the current stack pointer with |
| 75 | a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the |
| 76 | argument stack space with alloca and calls the default constructor. The |
| 77 | default constructor could throw an exception, so the frontend has to |
| 78 | create a landing pad. The frontend has to destroy the already |
| 79 | constructed argument ``b`` before restoring the stack pointer. If the |
| 80 | constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI, |
| 81 | ``g`` will destroy its arguments, and then the stack is restored in |
| 82 | ``f``. |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 83 | |
| 84 | Design Considerations |
| 85 | ===================== |
| 86 | |
| 87 | Lifetime |
| 88 | -------- |
| 89 | |
| 90 | The biggest design consideration for this feature is object lifetime. |
| 91 | We cannot model the arguments as static allocas in the entry block, |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 92 | because all calls need to use the memory at the top of the stack to pass |
| 93 | arguments. We cannot vend pointers to that memory at function entry |
| 94 | because after code generation they will alias. |
| 95 | |
| 96 | The rule against allocas between argument allocations and the call site |
| 97 | avoids this problem, but it creates a cleanup problem. Cleanup and |
| 98 | lifetime is handled explicitly with stack save and restore calls. In |
| 99 | the future, we may want to introduce a new construct such as ``freea`` |
| 100 | or ``afree`` to make it clear that this stack adjusting cleanup is less |
| 101 | powerful than a full stack save and restore. |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 102 | |
| 103 | Nested Calls and Copy Elision |
| 104 | ----------------------------- |
| 105 | |
Reid Kleckner | ad60d3c | 2014-01-16 22:59:24 +0000 | [diff] [blame] | 106 | We also want to be able to support copy elision into these argument |
| 107 | slots. This means we have to support multiple live argument |
| 108 | allocations. |
| 109 | |
| 110 | Consider the evaluation of: |
| 111 | |
| 112 | .. code-block:: c++ |
| 113 | |
| 114 | // Foo is non-trivial. |
| 115 | struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); }; |
| 116 | Foo bar(Foo b); |
| 117 | int main() { |
| 118 | bar(bar(Foo())); |
| 119 | } |
| 120 | |
| 121 | In this case, we want to be able to elide copies into ``bar``'s argument |
| 122 | slots. That means we need to have more than one set of argument frames |
| 123 | active at the same time. First, we need to allocate the frame for the |
| 124 | outer call so we can pass it in as the hidden struct return pointer to |
| 125 | the middle call. Then we do the same for the middle call, allocating a |
| 126 | frame and passing its address to ``Foo``'s default constructor. By |
| 127 | wrapping the evaluation of the inner ``bar`` with stack save and |
| 128 | restore, we can have multiple overlapping active call frames. |
Reid Kleckner | 4b70bfc | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 129 | |
| 130 | Callee-cleanup Calling Conventions |
| 131 | ---------------------------------- |
| 132 | |
| 133 | Another wrinkle is the existence of callee-cleanup conventions. On |
| 134 | Windows, all methods and many other functions adjust the stack to clear |
| 135 | the memory used to pass their arguments. In some sense, this means that |
| 136 | the allocas are automatically cleared by the call. However, LLVM |
| 137 | instead models this as a write of undef to all of the inalloca values |
| 138 | passed to the call instead of a stack adjustment. Frontends should |
| 139 | still restore the stack pointer to avoid a stack leak. |
| 140 | |
| 141 | Exceptions |
| 142 | ---------- |
| 143 | |
| 144 | There is also the possibility of an exception. If argument evaluation |
| 145 | or copy construction throws an exception, the landing pad must do |
| 146 | cleanup, which includes adjusting the stack pointer to avoid a stack |
| 147 | leak. This means the cleanup of the stack memory cannot be tied to the |
| 148 | call itself. There needs to be a separate IR-level instruction that can |
| 149 | perform independent cleanup of arguments. |
| 150 | |
| 151 | Efficiency |
| 152 | ---------- |
| 153 | |
| 154 | Eventually, it should be possible to generate efficient code for this |
| 155 | construct. In particular, using inalloca should not require a base |
| 156 | pointer. If the backend can prove that all points in the CFG only have |
| 157 | one possible stack level, then it can address the stack directly from |
| 158 | the stack pointer. While this is not yet implemented, the plan is that |
| 159 | the inalloca attribute should not change much, but the frontend IR |
| 160 | generation recommendations may change. |