Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 1 | ======== |
| 2 | TableGen |
| 3 | ======== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | .. toctree:: |
| 9 | :hidden: |
| 10 | |
| 11 | BackEnds |
| 12 | LangRef |
Renato Golin | a5b283a | 2014-04-01 09:51:49 +0000 | [diff] [blame] | 13 | LangIntro |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 14 | Deficiencies |
| 15 | |
| 16 | Introduction |
| 17 | ============ |
| 18 | |
| 19 | TableGen's purpose is to help a human develop and maintain records of |
| 20 | domain-specific information. Because there may be a large number of these |
| 21 | records, it is specifically designed to allow writing flexible descriptions and |
| 22 | for common features of these records to be factored out. This reduces the |
| 23 | amount of duplication in the description, reduces the chance of error, and makes |
| 24 | it easier to structure domain specific information. |
| 25 | |
| 26 | The core part of TableGen parses a file, instantiates the declarations, and |
Eli Bendersky | f631e0b | 2014-03-20 17:45:30 +0000 | [diff] [blame] | 27 | hands the result off to a domain-specific `backend`_ for processing. |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 28 | |
| 29 | The current major users of TableGen are :doc:`../CodeGenerator` |
| 30 | and the |
| 31 | `Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. |
| 32 | |
| 33 | Note that if you work on TableGen much, and use emacs or vim, that you can find |
| 34 | an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and |
| 35 | ``llvm/utils/vim`` directories of your LLVM distribution, respectively. |
| 36 | |
| 37 | .. _intro: |
| 38 | |
| 39 | |
| 40 | The TableGen program |
| 41 | ==================== |
| 42 | |
| 43 | TableGen files are interpreted by the TableGen program: `llvm-tblgen` available |
| 44 | on your build directory under `bin`. It is not installed in the system (or where |
| 45 | your sysroot is set to), since it has no use beyond LLVM's build process. |
| 46 | |
| 47 | Running TableGen |
| 48 | ---------------- |
| 49 | |
| 50 | TableGen runs just like any other LLVM tool. The first (optional) argument |
| 51 | specifies the file to read. If a filename is not specified, ``llvm-tblgen`` |
| 52 | reads from standard input. |
| 53 | |
| 54 | To be useful, one of the `backends`_ must be used. These backends are |
| 55 | selectable on the command line (type '``llvm-tblgen -help``' for a list). For |
| 56 | example, to get a list of all of the definitions that subclass a particular type |
| 57 | (which can be useful for building up an enum list of these records), use the |
| 58 | ``-print-enums`` option: |
| 59 | |
| 60 | .. code-block:: bash |
| 61 | |
| 62 | $ llvm-tblgen X86.td -print-enums -class=Register |
| 63 | AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, |
| 64 | ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, |
| 65 | MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, |
| 66 | R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, |
| 67 | R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, |
| 68 | RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, |
| 69 | XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, |
| 70 | XMM6, XMM7, XMM8, XMM9, |
| 71 | |
| 72 | $ llvm-tblgen X86.td -print-enums -class=Instruction |
| 73 | ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, |
| 74 | ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, |
| 75 | ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, |
| 76 | ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, |
| 77 | ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... |
| 78 | |
Simon Tatham | 9cfd4e5 | 2018-07-11 08:40:19 +0000 | [diff] [blame] | 79 | The default backend prints out all of the records. There is also a general |
| 80 | backend which outputs all the records as a JSON data structure, enabled using |
| 81 | the `-dump-json` option. |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 82 | |
| 83 | If you plan to use TableGen, you will most likely have to write a `backend`_ |
| 84 | that extracts the information specific to what you need and formats it in the |
Simon Tatham | 9cfd4e5 | 2018-07-11 08:40:19 +0000 | [diff] [blame] | 85 | appropriate way. You can do this by extending TableGen itself in C++, or by |
| 86 | writing a script in any language that can consume the JSON output. |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 87 | |
| 88 | Example |
| 89 | ------- |
| 90 | |
| 91 | With no other arguments, `llvm-tblgen` parses the specified file and prints out all |
| 92 | of the classes, then all of the definitions. This is a good way to see what the |
| 93 | various definitions expand to fully. Running this on the ``X86.td`` file prints |
| 94 | this (at the time of this writing): |
| 95 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 96 | .. code-block:: text |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 97 | |
| 98 | ... |
| 99 | def ADD32rr { // Instruction X86Inst I |
| 100 | string Namespace = "X86"; |
| 101 | dag OutOperandList = (outs GR32:$dst); |
| 102 | dag InOperandList = (ins GR32:$src1, GR32:$src2); |
| 103 | string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; |
| 104 | list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; |
| 105 | list<Register> Uses = []; |
| 106 | list<Register> Defs = [EFLAGS]; |
| 107 | list<Predicate> Predicates = []; |
| 108 | int CodeSize = 3; |
| 109 | int AddedComplexity = 0; |
| 110 | bit isReturn = 0; |
| 111 | bit isBranch = 0; |
| 112 | bit isIndirectBranch = 0; |
| 113 | bit isBarrier = 0; |
| 114 | bit isCall = 0; |
| 115 | bit canFoldAsLoad = 0; |
| 116 | bit mayLoad = 0; |
| 117 | bit mayStore = 0; |
| 118 | bit isImplicitDef = 0; |
| 119 | bit isConvertibleToThreeAddress = 1; |
| 120 | bit isCommutable = 1; |
| 121 | bit isTerminator = 0; |
| 122 | bit isReMaterializable = 0; |
| 123 | bit isPredicable = 0; |
| 124 | bit hasDelaySlot = 0; |
| 125 | bit usesCustomInserter = 0; |
| 126 | bit hasCtrlDep = 0; |
| 127 | bit isNotDuplicable = 0; |
| 128 | bit hasSideEffects = 0; |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 129 | InstrItinClass Itinerary = NoItinerary; |
| 130 | string Constraints = ""; |
| 131 | string DisableEncoding = ""; |
| 132 | bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; |
| 133 | Format Form = MRMDestReg; |
| 134 | bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; |
| 135 | ImmType ImmT = NoImm; |
| 136 | bits<3> ImmTypeBits = { 0, 0, 0 }; |
| 137 | bit hasOpSizePrefix = 0; |
| 138 | bit hasAdSizePrefix = 0; |
| 139 | bits<4> Prefix = { 0, 0, 0, 0 }; |
| 140 | bit hasREX_WPrefix = 0; |
| 141 | FPFormat FPForm = ?; |
| 142 | bits<3> FPFormBits = { 0, 0, 0 }; |
| 143 | } |
| 144 | ... |
| 145 | |
| 146 | This definition corresponds to the 32-bit register-register ``add`` instruction |
| 147 | of the x86 architecture. ``def ADD32rr`` defines a record named |
| 148 | ``ADD32rr``, and the comment at the end of the line indicates the superclasses |
| 149 | of the definition. The body of the record contains all of the data that |
| 150 | TableGen assembled for the record, indicating that the instruction is part of |
Eli Bendersky | f631e0b | 2014-03-20 17:45:30 +0000 | [diff] [blame] | 151 | the "X86" namespace, the pattern indicating how the instruction is selected by |
| 152 | the code generator, that it is a two-address instruction, has a particular |
| 153 | encoding, etc. The contents and semantics of the information in the record are |
| 154 | specific to the needs of the X86 backend, and are only shown as an example. |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 155 | |
| 156 | As you can see, a lot of information is needed for every instruction supported |
| 157 | by the code generator, and specifying it all manually would be unmaintainable, |
| 158 | prone to bugs, and tiring to do in the first place. Because we are using |
| 159 | TableGen, all of the information was derived from the following definition: |
| 160 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 161 | .. code-block:: text |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 162 | |
| 163 | let Defs = [EFLAGS], |
| 164 | isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y |
| 165 | isConvertibleToThreeAddress = 1 in // Can transform into LEA. |
| 166 | def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), |
| 167 | (ins GR32:$src1, GR32:$src2), |
| 168 | "add{l}\t{$src2, $dst|$dst, $src2}", |
| 169 | [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; |
| 170 | |
| 171 | This definition makes use of the custom class ``I`` (extended from the custom |
| 172 | class ``X86Inst``), which is defined in the X86-specific TableGen file, to |
| 173 | factor out the common features that instructions of its class share. A key |
| 174 | feature of TableGen is that it allows the end-user to define the abstractions |
| 175 | they prefer to use when describing their information. |
| 176 | |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 177 | Syntax |
| 178 | ====== |
| 179 | |
Eli Bendersky | f631e0b | 2014-03-20 17:45:30 +0000 | [diff] [blame] | 180 | TableGen has a syntax that is loosely based on C++ templates, with built-in |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 181 | types and specification. In addition, TableGen's syntax introduces some |
| 182 | automation concepts like multiclass, foreach, let, etc. |
| 183 | |
| 184 | Basic concepts |
| 185 | -------------- |
| 186 | |
| 187 | TableGen files consist of two key parts: 'classes' and 'definitions', both of |
| 188 | which are considered 'records'. |
| 189 | |
| 190 | **TableGen records** have a unique name, a list of values, and a list of |
| 191 | superclasses. The list of values is the main data that TableGen builds for each |
| 192 | record; it is this that holds the domain specific information for the |
Eli Bendersky | 0722280 | 2014-03-20 17:59:37 +0000 | [diff] [blame] | 193 | application. The interpretation of this data is left to a specific `backend`_, |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 194 | but the structure and format rules are taken care of and are fixed by |
| 195 | TableGen. |
| 196 | |
| 197 | **TableGen definitions** are the concrete form of 'records'. These generally do |
| 198 | not have any undefined values, and are marked with the '``def``' keyword. |
| 199 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 200 | .. code-block:: text |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 201 | |
| 202 | def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", |
| 203 | "Enable ARMv8 FP">; |
| 204 | |
| 205 | In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised |
| 206 | with some values. The names of the classes are defined via the |
| 207 | keyword `class` either on the same file or some other included. Most target |
| 208 | TableGen files include the generic ones in ``include/llvm/Target``. |
| 209 | |
| 210 | **TableGen classes** are abstract records that are used to build and describe |
| 211 | other records. These classes allow the end-user to build abstractions for |
| 212 | either the domain they are targeting (such as "Register", "RegisterClass", and |
| 213 | "Instruction" in the LLVM code generator) or for the implementor to help factor |
| 214 | out common properties of records (such as "FPInst", which is used to represent |
| 215 | floating point instructions in the X86 backend). TableGen keeps track of all of |
| 216 | the classes that are used to build up a definition, so the backend can find all |
| 217 | definitions of a particular class, such as "Instruction". |
| 218 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 219 | .. code-block:: text |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 220 | |
| 221 | class ProcNoItin<string Name, list<SubtargetFeature> Features> |
| 222 | : Processor<Name, NoItineraries, Features>; |
Simon Tatham | 1b68b2e | 2018-04-23 08:41:53 +0000 | [diff] [blame] | 223 | |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 224 | Here, the class ProcNoItin, receiving parameters `Name` of type `string` and |
| 225 | a list of target features is specializing the class Processor by passing the |
| 226 | arguments down as well as hard-coding NoItineraries. |
| 227 | |
| 228 | **TableGen multiclasses** are groups of abstract records that are instantiated |
| 229 | all at once. Each instantiation can result in multiple TableGen definitions. |
| 230 | If a multiclass inherits from another multiclass, the definitions in the |
| 231 | sub-multiclass become part of the current multiclass, as if they were declared |
| 232 | in the current multiclass. |
| 233 | |
Renato Golin | 88ea57f | 2016-07-20 12:16:38 +0000 | [diff] [blame] | 234 | .. code-block:: text |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 235 | |
| 236 | multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, |
| 237 | dag address, ValueType sty> { |
| 238 | def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), |
| 239 | (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") |
| 240 | Base, Offset, Extend)>; |
| 241 | |
| 242 | def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), |
| 243 | (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") |
| 244 | Base, Offset, Extend)>; |
| 245 | } |
| 246 | |
| 247 | defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, |
| 248 | !foreach(decls.pattern, address, |
| 249 | !subst(SHIFT, imm_eq0, decls.pattern)), |
| 250 | i8>; |
| 251 | |
| 252 | |
| 253 | |
Renato Golin | a5b283a | 2014-04-01 09:51:49 +0000 | [diff] [blame] | 254 | See the :doc:`TableGen Language Introduction <LangIntro>` for more generic |
| 255 | information on the usage of the language, and the |
| 256 | :doc:`TableGen Language Reference <LangRef>` for more in-depth description |
| 257 | of the formal language specification. |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 258 | |
| 259 | .. _backend: |
| 260 | .. _backends: |
| 261 | |
| 262 | TableGen backends |
| 263 | ================= |
| 264 | |
| 265 | TableGen files have no real meaning without a back-end. The default operation |
| 266 | of running ``llvm-tblgen`` is to print the information in a textual format, but |
| 267 | that's only useful for debugging of the TableGen files themselves. The power |
| 268 | in TableGen is, however, to interpret the source files into an internal |
| 269 | representation that can be generated into anything you want. |
| 270 | |
Jonathan Roelofs | 69ee7cb | 2014-10-03 20:46:05 +0000 | [diff] [blame] | 271 | Current usage of TableGen is to create huge include files with tables that you |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 272 | can either include directly (if the output is in the language you're coding), |
| 273 | or be used in pre-processing via macros surrounding the include of the file. |
| 274 | |
| 275 | Direct output can be used if the back-end already prints a table in C format |
| 276 | or if the output is just a list of strings (for error and warning messages). |
| 277 | Pre-processed output should be used if the same information needs to be used |
| 278 | in different contexts (like Instruction names), so your back-end should print |
| 279 | a meta-information list that can be shaped into different compile-time formats. |
| 280 | |
| 281 | See the `TableGen BackEnds <BackEnds.html>`_ for more information. |
| 282 | |
| 283 | TableGen Deficiencies |
| 284 | ===================== |
| 285 | |
| 286 | Despite being very generic, TableGen has some deficiencies that have been |
| 287 | pointed out numerous times. The common theme is that, while TableGen allows |
| 288 | you to build Domain-Specific-Languages, the final languages that you create |
| 289 | lack the power of other DSLs, which in turn increase considerably the size |
JF Bastien | 5e48675 | 2014-08-05 23:27:34 +0000 | [diff] [blame] | 290 | and complexity of TableGen files. |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 291 | |
| 292 | At the same time, TableGen allows you to create virtually any meaning of |
| 293 | the basic concepts via custom-made back-ends, which can pervert the original |
| 294 | design and make it very hard for newcomers to understand the evil TableGen |
| 295 | file. |
| 296 | |
Eli Bendersky | 0722280 | 2014-03-20 17:59:37 +0000 | [diff] [blame] | 297 | There are some in favour of extending the semantics even more, but making sure |
| 298 | back-ends adhere to strict rules. Others are suggesting we should move to less, |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 299 | more powerful DSLs designed with specific purposes, or even re-using existing |
| 300 | DSLs. |
| 301 | |
Eli Bendersky | 0722280 | 2014-03-20 17:59:37 +0000 | [diff] [blame] | 302 | Either way, this is a discussion that will likely span across several years, |
Renato Golin | 1625937 | 2014-03-20 16:08:34 +0000 | [diff] [blame] | 303 | if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_ |
| 304 | document. |