Dimitry Andric | ac6a87b | 2016-01-03 17:22:03 +0000 | [diff] [blame] | 1 | =============================== |
| 2 | MCJIT Design and Implementation |
| 3 | =============================== |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
| 8 | This document describes the internal workings of the MCJIT execution |
| 9 | engine and the RuntimeDyld component. It is intended as a high level |
| 10 | overview of the implementation, showing the flow and interactions of |
| 11 | objects throughout the code generation and dynamic loading process. |
| 12 | |
| 13 | Engine Creation |
| 14 | =============== |
| 15 | |
| 16 | In most cases, an EngineBuilder object is used to create an instance of |
| 17 | the MCJIT execution engine. The EngineBuilder takes an llvm::Module |
| 18 | object as an argument to its constructor. The client may then set various |
| 19 | options that we control the later be passed along to the MCJIT engine, |
| 20 | including the selection of MCJIT as the engine type to be created. |
| 21 | Of particular interest is the EngineBuilder::setMCJITMemoryManager |
| 22 | function. If the client does not explicitly create a memory manager at |
| 23 | this time, a default memory manager (specifically SectionMemoryManager) |
| 24 | will be created when the MCJIT engine is instantiated. |
| 25 | |
| 26 | Once the options have been set, a client calls EngineBuilder::create to |
| 27 | create an instance of the MCJIT engine. If the client does not use the |
| 28 | form of this function that takes a TargetMachine as a parameter, a new |
| 29 | TargetMachine will be created based on the target triple associated with |
| 30 | the Module that was used to create the EngineBuilder. |
| 31 | |
| 32 | .. image:: MCJIT-engine-builder.png |
| 33 | |
| 34 | EngineBuilder::create will call the static MCJIT::createJIT function, |
| 35 | passing in its pointers to the module, memory manager and target machine |
| 36 | objects, all of which will subsequently be owned by the MCJIT object. |
| 37 | |
| 38 | The MCJIT class has a member variable, Dyld, which contains an instance of |
| 39 | the RuntimeDyld wrapper class. This member will be used for |
| 40 | communications between MCJIT and the actual RuntimeDyldImpl object that |
| 41 | gets created when an object is loaded. |
| 42 | |
| 43 | .. image:: MCJIT-creation.png |
| 44 | |
| 45 | Upon creation, MCJIT holds a pointer to the Module object that it received |
| 46 | from EngineBuilder but it does not immediately generate code for this |
| 47 | module. Code generation is deferred until either the |
| 48 | MCJIT::finalizeObject method is called explicitly or a function such as |
| 49 | MCJIT::getPointerToFunction is called which requires the code to have been |
| 50 | generated. |
| 51 | |
| 52 | Code Generation |
| 53 | =============== |
| 54 | |
| 55 | When code generation is triggered, as described above, MCJIT will first |
| 56 | attempt to retrieve an object image from its ObjectCache member, if one |
| 57 | has been set. If a cached object image cannot be retrieved, MCJIT will |
| 58 | call its emitObject method. MCJIT::emitObject uses a local PassManager |
| 59 | instance and creates a new ObjectBufferStream instance, both of which it |
| 60 | passes to TargetMachine::addPassesToEmitMC before calling PassManager::run |
| 61 | on the Module with which it was created. |
| 62 | |
| 63 | .. image:: MCJIT-load.png |
| 64 | |
| 65 | The PassManager::run call causes the MC code generation mechanisms to emit |
| 66 | a complete relocatable binary object image (either in either ELF or MachO |
| 67 | format, depending on the target) into the ObjectBufferStream object, which |
| 68 | is flushed to complete the process. If an ObjectCache is being used, the |
| 69 | image will be passed to the ObjectCache here. |
| 70 | |
| 71 | At this point, the ObjectBufferStream contains the raw object image. |
| 72 | Before the code can be executed, the code and data sections from this |
| 73 | image must be loaded into suitable memory, relocations must be applied and |
| 74 | memory permission and code cache invalidation (if required) must be completed. |
| 75 | |
| 76 | Object Loading |
| 77 | ============== |
| 78 | |
| 79 | Once an object image has been obtained, either through code generation or |
| 80 | having been retrieved from an ObjectCache, it is passed to RuntimeDyld to |
| 81 | be loaded. The RuntimeDyld wrapper class examines the object to determine |
| 82 | its file format and creates an instance of either RuntimeDyldELF or |
| 83 | RuntimeDyldMachO (both of which derive from the RuntimeDyldImpl base |
| 84 | class) and calls the RuntimeDyldImpl::loadObject method to perform that |
| 85 | actual loading. |
| 86 | |
| 87 | .. image:: MCJIT-dyld-load.png |
| 88 | |
| 89 | RuntimeDyldImpl::loadObject begins by creating an ObjectImage instance |
| 90 | from the ObjectBuffer it received. ObjectImage, which wraps the |
| 91 | ObjectFile class, is a helper class which parses the binary object image |
| 92 | and provides access to the information contained in the format-specific |
| 93 | headers, including section, symbol and relocation information. |
| 94 | |
| 95 | RuntimeDyldImpl::loadObject then iterates through the symbols in the |
| 96 | image. Information about common symbols is collected for later use. For |
| 97 | each function or data symbol, the associated section is loaded into memory |
| 98 | and the symbol is stored in a symbol table map data structure. When the |
| 99 | iteration is complete, a section is emitted for the common symbols. |
| 100 | |
| 101 | Next, RuntimeDyldImpl::loadObject iterates through the sections in the |
| 102 | object image and for each section iterates through the relocations for |
| 103 | that sections. For each relocation, it calls the format-specific |
| 104 | processRelocationRef method, which will examine the relocation and store |
| 105 | it in one of two data structures, a section-based relocation list map and |
| 106 | an external symbol relocation map. |
| 107 | |
| 108 | .. image:: MCJIT-load-object.png |
| 109 | |
| 110 | When RuntimeDyldImpl::loadObject returns, all of the code and data |
| 111 | sections for the object will have been loaded into memory allocated by the |
| 112 | memory manager and relocation information will have been prepared, but the |
| 113 | relocations have not yet been applied and the generated code is still not |
| 114 | ready to be executed. |
| 115 | |
| 116 | [Currently (as of August 2013) the MCJIT engine will immediately apply |
| 117 | relocations when loadObject completes. However, this shouldn't be |
| 118 | happening. Because the code may have been generated for a remote target, |
| 119 | the client should be given a chance to re-map the section addresses before |
| 120 | relocations are applied. It is possible to apply relocations multiple |
| 121 | times, but in the case where addresses are to be re-mapped, this first |
| 122 | application is wasted effort.] |
| 123 | |
| 124 | Address Remapping |
| 125 | ================= |
| 126 | |
| 127 | At any time after initial code has been generated and before |
| 128 | finalizeObject is called, the client can remap the address of sections in |
| 129 | the object. Typically this is done because the code was generated for an |
| 130 | external process and is being mapped into that process' address space. |
| 131 | The client remaps the section address by calling MCJIT::mapSectionAddress. |
| 132 | This should happen before the section memory is copied to its new |
| 133 | location. |
| 134 | |
| 135 | When MCJIT::mapSectionAddress is called, MCJIT passes the call on to |
| 136 | RuntimeDyldImpl (via its Dyld member). RuntimeDyldImpl stores the new |
| 137 | address in an internal data structure but does not update the code at this |
| 138 | time, since other sections are likely to change. |
| 139 | |
| 140 | When the client is finished remapping section addresses, it will call |
| 141 | MCJIT::finalizeObject to complete the remapping process. |
| 142 | |
| 143 | Final Preparations |
| 144 | ================== |
| 145 | |
| 146 | When MCJIT::finalizeObject is called, MCJIT calls |
| 147 | RuntimeDyld::resolveRelocations. This function will attempt to locate any |
| 148 | external symbols and then apply all relocations for the object. |
| 149 | |
| 150 | External symbols are resolved by calling the memory manager's |
| 151 | getPointerToNamedFunction method. The memory manager will return the |
| 152 | address of the requested symbol in the target address space. (Note, this |
| 153 | may not be a valid pointer in the host process.) RuntimeDyld will then |
| 154 | iterate through the list of relocations it has stored which are associated |
| 155 | with this symbol and invoke the resolveRelocation method which, through an |
| 156 | format-specific implementation, will apply the relocation to the loaded |
| 157 | section memory. |
| 158 | |
| 159 | Next, RuntimeDyld::resolveRelocations iterates through the list of |
| 160 | sections and for each section iterates through a list of relocations that |
| 161 | have been saved which reference that symbol and call resolveRelocation for |
| 162 | each entry in this list. The relocation list here is a list of |
| 163 | relocations for which the symbol associated with the relocation is located |
| 164 | in the section associated with the list. Each of these locations will |
| 165 | have a target location at which the relocation will be applied that is |
| 166 | likely located in a different section. |
| 167 | |
| 168 | .. image:: MCJIT-resolve-relocations.png |
| 169 | |
| 170 | Once relocations have been applied as described above, MCJIT calls |
| 171 | RuntimeDyld::getEHFrameSection, and if a non-zero result is returned |
| 172 | passes the section data to the memory manager's registerEHFrames method. |
| 173 | This allows the memory manager to call any desired target-specific |
| 174 | functions, such as registering the EH frame information with a debugger. |
| 175 | |
| 176 | Finally, MCJIT calls the memory manager's finalizeMemory method. In this |
| 177 | method, the memory manager will invalidate the target code cache, if |
| 178 | necessary, and apply final permissions to the memory pages it has |
| 179 | allocated for code and data memory. |
| 180 | |