Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 1 | ============================================================ |
| 2 | Extending LLVM: Adding instructions, intrinsics, types, etc. |
| 3 | ============================================================ |
| 4 | |
| 5 | Introduction and Warning |
| 6 | ======================== |
| 7 | |
| 8 | |
| 9 | During the course of using LLVM, you may wish to customize it for your research |
| 10 | project or for experimentation. At this point, you may realize that you need to |
| 11 | add something to LLVM, whether it be a new fundamental type, a new intrinsic |
| 12 | function, or a whole new instruction. |
| 13 | |
| 14 | When you come to this realization, stop and think. Do you really need to extend |
| 15 | LLVM? Is it a new fundamental capability that LLVM does not support at its |
| 16 | current incarnation or can it be synthesized from already pre-existing LLVM |
| 17 | elements? If you are not sure, ask on the `LLVM-dev |
Tanya Lattner | 377a984 | 2015-08-05 03:51:17 +0000 | [diff] [blame] | 18 | <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ list. The reason is that |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 19 | extending LLVM will get involved as you need to update all the different passes |
| 20 | that you intend to use with your extension, and there are ``many`` LLVM analyses |
| 21 | and transformations, so it may be quite a bit of work. |
| 22 | |
| 23 | Adding an `intrinsic function`_ is far easier than adding an |
| 24 | instruction, and is transparent to optimization passes. If your added |
| 25 | functionality can be expressed as a function call, an intrinsic function is the |
| 26 | method of choice for LLVM extension. |
| 27 | |
| 28 | Before you invest a significant amount of effort into a non-trivial extension, |
| 29 | **ask on the list** if what you are looking to do can be done with |
| 30 | already-existing infrastructure, or if maybe someone else is already working on |
| 31 | it. You will save yourself a lot of time and effort by doing so. |
| 32 | |
| 33 | .. _intrinsic function: |
| 34 | |
| 35 | Adding a new intrinsic function |
| 36 | =============================== |
| 37 | |
| 38 | Adding a new intrinsic function to LLVM is much easier than adding a new |
| 39 | instruction. Almost all extensions to LLVM should start as an intrinsic |
| 40 | function and then be turned into an instruction if warranted. |
| 41 | |
| 42 | #. ``llvm/docs/LangRef.html``: |
| 43 | |
| 44 | Document the intrinsic. Decide whether it is code generator specific and |
| 45 | what the restrictions are. Talk to other people about it so that you are |
| 46 | sure it's a good idea. |
| 47 | |
Jia Liu | fd975b0 | 2013-04-15 03:26:13 +0000 | [diff] [blame] | 48 | #. ``llvm/include/llvm/IR/Intrinsics*.td``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 49 | |
| 50 | Add an entry for your intrinsic. Describe its memory access characteristics |
| 51 | for optimization (this controls whether it will be DCE'd, CSE'd, etc). Note |
Joseph Tremoulet | 8f3f5c3 | 2015-09-03 09:15:32 +0000 | [diff] [blame] | 52 | that any intrinsic using one of the ``llvm_any*_ty`` types for an argument or |
| 53 | return type will be deemed by ``tblgen`` as overloaded and the corresponding |
| 54 | suffix will be required on the intrinsic's name. |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 55 | |
| 56 | #. ``llvm/lib/Analysis/ConstantFolding.cpp``: |
| 57 | |
| 58 | If it is possible to constant fold your intrinsic, add support to it in the |
| 59 | ``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions. |
| 60 | |
Sergey Dmitrouk | cd64ace | 2014-11-24 19:40:07 +0000 | [diff] [blame] | 61 | #. ``llvm/test/*``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 62 | |
| 63 | Add test cases for your test cases to the test suite |
| 64 | |
| 65 | Once the intrinsic has been added to the system, you must add code generator |
| 66 | support for it. Generally you must do the following steps: |
| 67 | |
| 68 | Add support to the .td file for the target(s) of your choice in |
| 69 | ``lib/Target/*/*.td``. |
| 70 | |
| 71 | This is usually a matter of adding a pattern to the .td file that matches the |
| 72 | intrinsic, though it may obviously require adding the instructions you want to |
| 73 | generate as well. There are lots of examples in the PowerPC and X86 backend |
| 74 | to follow. |
| 75 | |
| 76 | Adding a new SelectionDAG node |
| 77 | ============================== |
| 78 | |
| 79 | As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than |
| 80 | adding a new instruction. New nodes are often added to help represent |
| 81 | instructions common to many targets. These nodes often map to an LLVM |
| 82 | instruction (add, sub) or intrinsic (byteswap, population count). In other |
| 83 | cases, new nodes have been added to allow many targets to perform a common task |
| 84 | (converting between floating point and integer representation) or capture more |
| 85 | complicated behavior in a single node (rotate). |
| 86 | |
| 87 | #. ``include/llvm/CodeGen/ISDOpcodes.h``: |
| 88 | |
| 89 | Add an enum value for the new SelectionDAG node. |
| 90 | |
| 91 | #. ``lib/CodeGen/SelectionDAG/SelectionDAG.cpp``: |
| 92 | |
| 93 | Add code to print the node to ``getOperationName``. If your new node can be |
| 94 | evaluated at compile time when given constant arguments (such as an add of a |
| 95 | constant with another constant), find the ``getNode`` method that takes the |
| 96 | appropriate number of arguments, and add a case for your node to the switch |
| 97 | statement that performs constant folding for nodes that take the same number |
| 98 | of arguments as your new node. |
| 99 | |
| 100 | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: |
| 101 | |
| 102 | Add code to `legalize, promote, and expand |
| 103 | <CodeGenerator.html#selectiondag_legalize>`_ the node as necessary. At a |
| 104 | minimum, you will need to add a case statement for your node in |
| 105 | ``LegalizeOp`` which calls LegalizeOp on the node's operands, and returns a |
| 106 | new node if any of the operands changed as a result of being legalized. It |
| 107 | is likely that not all targets supported by the SelectionDAG framework will |
| 108 | natively support the new node. In this case, you must also add code in your |
| 109 | node's case statement in ``LegalizeOp`` to Expand your node into simpler, |
| 110 | legal operations. The case for ``ISD::UREM`` for expanding a remainder into |
| 111 | a divide, multiply, and a subtract is a good example. |
| 112 | |
| 113 | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: |
| 114 | |
| 115 | If targets may support the new node being added only at certain sizes, you |
| 116 | will also need to add code to your node's case statement in ``LegalizeOp`` |
| 117 | to Promote your node's operands to a larger size, and perform the correct |
| 118 | operation. You will also need to add code to ``PromoteOp`` to do this as |
| 119 | well. For a good example, see ``ISD::BSWAP``, which promotes its operand to |
| 120 | a wider size, performs the byteswap, and then shifts the correct bytes right |
| 121 | to emulate the narrower byteswap in the wider type. |
| 122 | |
| 123 | #. ``lib/CodeGen/SelectionDAG/LegalizeDAG.cpp``: |
| 124 | |
| 125 | Add a case for your node in ``ExpandOp`` to teach the legalizer how to |
| 126 | perform the action represented by the new node on a value that has been split |
| 127 | into high and low halves. This case will be used to support your node with a |
| 128 | 64 bit operand on a 32 bit target. |
| 129 | |
| 130 | #. ``lib/CodeGen/SelectionDAG/DAGCombiner.cpp``: |
| 131 | |
| 132 | If your node can be combined with itself, or other existing nodes in a |
| 133 | peephole-like fashion, add a visit function for it, and call that function |
| 134 | from. There are several good examples for simple combines you can do; |
| 135 | ``visitFABS`` and ``visitSRL`` are good starting places. |
| 136 | |
| 137 | #. ``lib/Target/PowerPC/PPCISelLowering.cpp``: |
| 138 | |
| 139 | Each target has an implementation of the ``TargetLowering`` class, usually in |
| 140 | its own file (although some targets include it in the same file as the |
| 141 | DAGToDAGISel). The default behavior for a target is to assume that your new |
| 142 | node is legal for all types that are legal for that target. If this target |
| 143 | does not natively support your node, then tell the target to either Promote |
| 144 | it (if it is supported at a larger type) or Expand it. This will cause the |
| 145 | code you wrote in ``LegalizeOp`` above to decompose your new node into other |
| 146 | legal nodes for this target. |
| 147 | |
| 148 | #. ``lib/Target/TargetSelectionDAG.td``: |
| 149 | |
| 150 | Most current targets supported by LLVM generate code using the DAGToDAG |
| 151 | method, where SelectionDAG nodes are pattern matched to target-specific |
| 152 | nodes, which represent individual instructions. In order for the targets to |
| 153 | match an instruction to your new node, you must add a def for that node to |
| 154 | the list in this file, with the appropriate type constraints. Look at |
| 155 | ``add``, ``bswap``, and ``fadd`` for examples. |
| 156 | |
| 157 | #. ``lib/Target/PowerPC/PPCInstrInfo.td``: |
| 158 | |
| 159 | Each target has a tablegen file that describes the target's instruction set. |
| 160 | For targets that use the DAGToDAG instruction selection framework, add a |
| 161 | pattern for your new node that uses one or more target nodes. Documentation |
| 162 | for this is a bit sparse right now, but there are several decent examples. |
| 163 | See the patterns for ``rotl`` in ``PPCInstrInfo.td``. |
| 164 | |
| 165 | #. TODO: document complex patterns. |
| 166 | |
Sergey Dmitrouk | cd64ace | 2014-11-24 19:40:07 +0000 | [diff] [blame] | 167 | #. ``llvm/test/CodeGen/*``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 168 | |
| 169 | Add test cases for your new node to the test suite. |
Sergey Dmitrouk | cd64ace | 2014-11-24 19:40:07 +0000 | [diff] [blame] | 170 | ``llvm/test/CodeGen/X86/bswap.ll`` is a good example. |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 171 | |
| 172 | Adding a new instruction |
| 173 | ======================== |
| 174 | |
| 175 | .. warning:: |
| 176 | |
| 177 | Adding instructions changes the bitcode format, and it will take some effort |
| 178 | to maintain compatibility with the previous version. Only add an instruction |
| 179 | if it is absolutely necessary. |
| 180 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 181 | #. ``llvm/include/llvm/IR/Instruction.def``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 182 | |
| 183 | add a number for your instruction and an enum name |
| 184 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 185 | #. ``llvm/include/llvm/IR/Instructions.h``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 186 | |
| 187 | add a definition for the class that will represent your instruction |
| 188 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 189 | #. ``llvm/include/llvm/IR/InstVisitor.h``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 190 | |
| 191 | add a prototype for a visitor to your new instruction type |
| 192 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 193 | #. ``llvm/lib/AsmParser/LLLexer.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 194 | |
| 195 | add a new token to parse your instruction from assembly text file |
| 196 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 197 | #. ``llvm/lib/AsmParser/LLParser.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 198 | |
| 199 | add the grammar on how your instruction can be read and what it will |
| 200 | construct as a result |
| 201 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 202 | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 203 | |
| 204 | add a case for your instruction and how it will be parsed from bitcode |
| 205 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 206 | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: |
| 207 | |
| 208 | add a case for your instruction and how it will be parsed from bitcode |
| 209 | |
| 210 | #. ``llvm/lib/IR/Instruction.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 211 | |
| 212 | add a case for how your instruction will be printed out to assembly |
| 213 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 214 | #. ``llvm/lib/IR/Instructions.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 215 | |
| 216 | implement the class you defined in ``llvm/include/llvm/Instructions.h`` |
| 217 | |
| 218 | #. Test your instruction |
| 219 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 220 | #. ``llvm/lib/Target/*``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 221 | |
| 222 | add support for your instruction to code generators, or add a lowering pass. |
| 223 | |
Sergey Dmitrouk | cd64ace | 2014-11-24 19:40:07 +0000 | [diff] [blame] | 224 | #. ``llvm/test/*``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 225 | |
| 226 | add your test cases to the test suite. |
| 227 | |
| 228 | Also, you need to implement (or modify) any analyses or passes that you want to |
| 229 | understand this new instruction. |
| 230 | |
| 231 | Adding a new type |
| 232 | ================= |
| 233 | |
| 234 | .. warning:: |
| 235 | |
| 236 | Adding new types changes the bitcode format, and will break compatibility with |
| 237 | currently-existing LLVM installations. Only add new types if it is absolutely |
| 238 | necessary. |
| 239 | |
| 240 | Adding a fundamental type |
| 241 | ------------------------- |
| 242 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 243 | #. ``llvm/include/llvm/IR/Type.h``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 244 | |
| 245 | add enum for the new type; add static ``Type*`` for this type |
| 246 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 247 | #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 248 | |
| 249 | add mapping from ``TypeID`` => ``Type*``; initialize the static ``Type*`` |
| 250 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 251 | #. ``llvm/llvm/llvm-c/Core.cpp``: |
| 252 | |
| 253 | add enum ``LLVMTypeKind`` and modify |
| 254 | ``LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)`` for the new type |
| 255 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 256 | #. ``llvm/lib/AsmParser/LLLexer.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 257 | |
| 258 | add ability to parse in the type from text assembly |
| 259 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 260 | #. ``llvm/lib/AsmParser/LLParser.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 261 | |
| 262 | add a token for that type |
| 263 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 264 | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: |
| 265 | |
| 266 | modify ``static void WriteTypeTable(const ValueEnumerator &VE, |
| 267 | BitstreamWriter &Stream)`` to serialize your type |
| 268 | |
| 269 | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: |
| 270 | |
| 271 | modify ``bool BitcodeReader::ParseTypeType()`` to read your data type |
| 272 | |
| 273 | #. ``include/llvm/Bitcode/LLVMBitCodes.h``: |
| 274 | |
| 275 | add enum ``TypeCodes`` for the new type |
| 276 | |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 277 | Adding a derived type |
| 278 | --------------------- |
| 279 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 280 | #. ``llvm/include/llvm/IR/Type.h``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 281 | |
| 282 | add enum for the new type; add a forward declaration of the type also |
| 283 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 284 | #. ``llvm/include/llvm/IR/DerivedTypes.h``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 285 | |
| 286 | add new class to represent new class in the hierarchy; add forward |
| 287 | declaration to the TypeMap value type |
| 288 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 289 | #. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 290 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 291 | add support for derived type, notably `enum TypeID` and `is`, `get` methods. |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 292 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 293 | #. ``llvm/llvm/llvm-c/Core.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 294 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 295 | add enum ``LLVMTypeKind`` and modify |
| 296 | `LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)` for the new type |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 297 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 298 | #. ``llvm/lib/AsmParser/LLLexer.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 299 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 300 | modify ``lltok::Kind LLLexer::LexIdentifier()`` to add ability to |
| 301 | parse in the type from text assembly |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 302 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 303 | #. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 304 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 305 | modify ``static void WriteTypeTable(const ValueEnumerator &VE, |
| 306 | BitstreamWriter &Stream)`` to serialize your type |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 307 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 308 | #. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 309 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 310 | modify ``bool BitcodeReader::ParseTypeType()`` to read your data type |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 311 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 312 | #. ``include/llvm/Bitcode/LLVMBitCodes.h``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 313 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 314 | add enum ``TypeCodes`` for the new type |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 315 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 316 | #. ``llvm/lib/IR/AsmWriter.cpp``: |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 317 | |
David Blaikie | 79ccb55 | 2015-04-13 16:04:17 +0000 | [diff] [blame] | 318 | modify ``void TypePrinting::print(Type *Ty, raw_ostream &OS)`` |
Bill Wendling | bef3ef9 | 2012-10-07 04:56:08 +0000 | [diff] [blame] | 319 | to output the new derived type |