Zachary Turner | 576eea8 | 2016-11-14 17:59:28 +0000 | [diff] [blame] | 1 | ===================================== |
| 2 | The PDB DBI (Debug Info) Stream |
| 3 | ===================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | .. _dbi_intro: |
| 9 | |
| 10 | Introduction |
| 11 | ============ |
| 12 | |
| 13 | The PDB DBI Stream (Index 3) is one of the largest and most important streams |
| 14 | in a PDB file. It contains information about how the program was compiled, |
| 15 | (e.g. compilation flags, etc), the compilands (e.g. object files) that |
| 16 | were used to link together the program, the source files which were used |
| 17 | to build the program, as well as references to other streams that contain more |
| 18 | detailed information about each compiland, such as the CodeView symbol records |
| 19 | contained within each compiland and the source and line information for |
| 20 | functions and other symbols within each compiland. |
| 21 | |
| 22 | |
| 23 | .. _dbi_header: |
| 24 | |
| 25 | Stream Header |
| 26 | ============= |
| 27 | At offset 0 of the DBI Stream is a header with the following layout: |
| 28 | |
| 29 | |
| 30 | .. code-block:: c++ |
| 31 | |
| 32 | struct DbiStreamHeader { |
| 33 | int32_t VersionSignature; |
| 34 | uint32_t VersionHeader; |
| 35 | uint32_t Age; |
| 36 | uint16_t GlobalStreamIndex; |
| 37 | uint16_t BuildNumber; |
| 38 | uint16_t PublicStreamIndex; |
| 39 | uint16_t PdbDllVersion; |
| 40 | uint16_t SymRecordStream; |
| 41 | uint16_t PdbDllRbld; |
| 42 | int32_t ModInfoSize; |
| 43 | int32_t SectionContributionSize; |
| 44 | int32_t SectionMapSize; |
| 45 | int32_t SourceInfoSize; |
| 46 | int32_t TypeServerSize; |
| 47 | uint32_t MFCTypeServerIndex; |
| 48 | int32_t OptionalDbgHeaderSize; |
| 49 | int32_t ECSubstreamSize; |
| 50 | uint16_t Flags; |
| 51 | uint16_t Machine; |
| 52 | uint32_t Padding; |
| 53 | }; |
| 54 | |
| 55 | - **VersionSignature** - Unknown meaning. Appears to always be ``-1``. |
| 56 | |
| 57 | - **VersionHeader** - A value from the following enum. |
| 58 | |
| 59 | .. code-block:: c++ |
| 60 | |
| 61 | enum class DbiStreamVersion : uint32_t { |
| 62 | VC41 = 930803, |
| 63 | V50 = 19960307, |
| 64 | V60 = 19970606, |
| 65 | V70 = 19990903, |
| 66 | V110 = 20091201 |
| 67 | }; |
| 68 | |
| 69 | Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be |
| 70 | ``V70``, and it is not clear what the other values are for. |
| 71 | |
| 72 | - **Age** - The number of times the PDB has been written. Equal to the same |
| 73 | field from the :ref:`PDB Stream header <pdb_stream_header>`. |
| 74 | |
| 75 | - **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`, |
| 76 | which contains CodeView symbol records for all global symbols. Actual records |
| 77 | are stored in the symbol record stream, and are referenced from this stream. |
| 78 | |
| 79 | - **BuildNumber** - A bitfield containing values representing the major and minor |
| 80 | version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the |
| 81 | program, with the following layout: |
| 82 | |
| 83 | .. code-block:: c++ |
| 84 | |
| 85 | uint16_t MinorVersion : 8; |
| 86 | uint16_t MajorVersion : 7; |
| 87 | uint16_t NewVersionFormat : 1; |
| 88 | |
| 89 | For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``. |
| 90 | If it is ``false``, the layout above does not apply and the reader should consult |
| 91 | the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for |
| 92 | further guidance. |
| 93 | |
| 94 | - **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`, |
| 95 | which contains CodeView symbol records for all public symbols. Actual records |
| 96 | are stored in the symbol record stream, and are referenced from this stream. |
| 97 | |
| 98 | - **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this |
| 99 | PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``. |
| 100 | |
| 101 | - **SymRecordStream** - The stream containing all CodeView symbol records used |
| 102 | by the program. This is used for deduplication, so that many different |
| 103 | compilands can refer to the same symbols without having to include the full record |
| 104 | content inside of each module stream. |
| 105 | |
| 106 | - **PdbDllRbld** - Unknown |
| 107 | |
| 108 | - **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream |
| 109 | |
| 110 | - **Flags** - A bitfield with the following layout, containing various |
| 111 | information about how the program was built: |
| 112 | |
| 113 | .. code-block:: c++ |
| 114 | |
| 115 | uint16_t WasIncrementallyLinked : 1; |
| 116 | uint16_t ArePrivateSymbolsStripped : 1; |
| 117 | uint16_t HasConflictingTypes : 1; |
| 118 | uint16_t Reserved : 13; |
| 119 | |
| 120 | The only one of these that is not self-explanatory is ``HasConflictingTypes``. |
| 121 | Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``. |
| 122 | If it is passed to ``link.exe``, this field will be set. Otherwise it will |
| 123 | not be set. It is unclear what this flag does, although it seems to have |
| 124 | subtle implications on the algorithm used to look up type records. |
| 125 | |
| 126 | - **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__ |
| 127 | enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86). |
| 128 | |
| 129 | Immediately after the fixed-size DBI Stream header are ``7`` variable-length |
| 130 | `substreams`. The following ``7`` fields of the DBI Stream header specify the |
| 131 | number of bytes of the corresponding substream. Each substream's contents will |
| 132 | be described in detail :ref:`below <dbi_substreams>`. The length of the entire |
| 133 | DBI Stream should equal ``64`` (the length of the header above) plus the value |
| 134 | of each of the following ``7`` fields. |
| 135 | |
| 136 | - **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`. |
| 137 | |
| 138 | - **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`. |
| 139 | |
| 140 | - **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`. |
| 141 | |
| 142 | - **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`. |
| 143 | |
| 144 | - **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`. |
| 145 | |
| 146 | - **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`. |
| 147 | |
| 148 | - **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`. |
| 149 | |
| 150 | .. _dbi_substreams: |
| 151 | |
| 152 | Substreams |
| 153 | ========== |
| 154 | |
| 155 | .. _dbi_mod_info_substream: |
| 156 | |
| 157 | Module Info Substream |
| 158 | ^^^^^^^^^^^^^^^^^^^^^ |
| 159 | |
| 160 | Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The |
| 161 | module info substream is an array of variable-length records, each one |
| 162 | describing a single module (e.g. object file) linked into the program. Each |
| 163 | record in the array has the format: |
| 164 | |
| 165 | .. code-block:: c++ |
| 166 | |
| 167 | struct SectionContribEntry { |
| 168 | uint16_t Section; |
| 169 | char Padding1[2]; |
| 170 | int32_t Offset; |
| 171 | int32_t Size; |
| 172 | uint32_t Characteristics; |
| 173 | uint16_t ModuleIndex; |
| 174 | char Padding2[2]; |
| 175 | uint32_t DataCrc; |
| 176 | uint32_t RelocCrc; |
| 177 | }; |
| 178 | |
| 179 | While most of these are self-explanatory, the ``Characteristics`` field |
| 180 | warrants some elaboration. It corresponds to the ``Characteristics`` |
| 181 | field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__ |
| 182 | structure. |
| 183 | |
| 184 | .. code-block:: c++ |
| 185 | |
| 186 | struct ModInfo { |
| 187 | uint32_t Unused1; |
| 188 | SectionContribEntry SectionContr; |
| 189 | uint16_t Flags; |
| 190 | uint16_t ModuleSymStream; |
| 191 | uint32_t SymByteSize; |
| 192 | uint32_t C11ByteSize; |
| 193 | uint32_t C13ByteSize; |
| 194 | uint16_t SourceFileCount; |
| 195 | char Padding[2]; |
| 196 | uint32_t Unused2; |
| 197 | uint32_t SourceFileNameIndex; |
| 198 | uint32_t PdbFilePathNameIndex; |
| 199 | char ModuleName[]; |
| 200 | char ObjFileName[]; |
| 201 | }; |
| 202 | |
| 203 | - **SectionContr** - Describes the properties of the section in the final binary |
| 204 | which contain the code and data from this module. |
| 205 | |
| 206 | - **Flags** - A bitfield with the following format: |
| 207 | |
| 208 | .. code-block:: c++ |
| 209 | |
| 210 | uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB. |
| 211 | uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is. |
| 212 | uint16_t Unused : 6; |
| 213 | uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM. |
| 214 | |
| 215 | |
| 216 | - **ModuleSymStream** - The index of the stream that contains symbol information |
| 217 | for this module. This includes CodeView symbol information as well as source |
| 218 | and line information. |
| 219 | |
| 220 | - **SymByteSize** - The number of bytes of data from the stream identified by |
| 221 | ``ModuleSymStream`` that represent CodeView symbol records. |
| 222 | |
| 223 | - **C11ByteSize** - The number of bytes of data from the stream identified by |
| 224 | ``ModuleSymStream`` that represent C11-style CodeView line information. |
| 225 | |
| 226 | - **C13ByteSize** - The number of bytes of data from the stream identified by |
| 227 | ``ModuleSymStream`` that represent C13-style CodeView line information. At |
| 228 | most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero. |
| 229 | |
| 230 | - **SourceFileCount** - The number of source files that contributed to this |
| 231 | module during compilation. |
| 232 | |
| 233 | - **SourceFileNameIndex** - The offset in the names buffer of the primary |
| 234 | translation unit used to build this module. All PDB files observed to date |
| 235 | always have this value equal to 0. |
| 236 | |
| 237 | - **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file |
| 238 | containing this module's symbol information. This has only been observed |
| 239 | to be non-zero for the special ``* Linker *`` module. |
| 240 | |
| 241 | - **ModuleName** - The module name. This is usually either a full path to an |
| 242 | object file (either directly passed to ``link.exe`` or from an archive) or |
| 243 | a string of the form ``Import:<dll name>``. |
| 244 | |
| 245 | - **ObjFileName** - The object file name. In the case of an module that is |
| 246 | linked directly passed to ``link.exe``, this is the same as **ModuleName**. |
| 247 | In the case of a module that comes from an archive, this is usually the full |
| 248 | path to the archive. |
| 249 | |
| 250 | .. _dbi_sec_contr_substream: |
| 251 | |
| 252 | Section Contribution Substream |
| 253 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 254 | Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends, |
| 255 | and consumes ``Header->SectionContributionSize`` bytes. This substream begins |
| 256 | with a single ``uint32_t`` which will be one of the following values: |
| 257 | |
| 258 | .. code-block:: c++ |
| 259 | |
| 260 | enum class SectionContrSubstreamVersion : uint32_t { |
| 261 | Ver60 = 0xeffe0000 + 19970605, |
| 262 | V2 = 0xeffe0000 + 20140516 |
| 263 | }; |
| 264 | |
| 265 | ``Ver60`` is the only value which has been observed in a PDB so far. Following |
| 266 | this ``4`` byte field is an array of fixed-length structures. If the version |
| 267 | is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the |
| 268 | version is ``V2``, it is an array of ``SectionContribEntry2`` structures, |
| 269 | defined as follows: |
| 270 | |
| 271 | .. code-block:: c++ |
| 272 | |
| 273 | struct SectionContribEntry2 { |
| 274 | SectionContribEntry SC; |
| 275 | uint32_t ISectCoff; |
| 276 | }; |
| 277 | |
| 278 | The purpose of the second field is not well understood. |
| 279 | |
| 280 | |
| 281 | .. _dbi_section_map_substream: |
| 282 | |
| 283 | Section Map Substream |
| 284 | ^^^^^^^^^^^^^^^^^^^^^ |
| 285 | Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends, |
| 286 | and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8`` |
| 287 | byte header followed by an array of fixed-length records. The header and records |
| 288 | have the following layout: |
| 289 | |
| 290 | .. code-block:: c++ |
| 291 | |
| 292 | struct SectionMapHeader { |
| 293 | uint16_t Count; // Number of segment descriptors |
| 294 | uint16_t LogCount; // Number of logical segment descriptors |
| 295 | }; |
| 296 | |
| 297 | struct SectionMapEntry { |
| 298 | uint16_t Flags; // See the SectionMapEntryFlags enum below. |
| 299 | uint16_t Ovl; // Logical overlay number |
| 300 | uint16_t Group; // Group index into descriptor array. |
| 301 | uint16_t Frame; |
| 302 | uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF. |
| 303 | uint16_t ClassName; // Byte index of class in string table, or 0xFFFF. |
| 304 | uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group. |
| 305 | uint32_t SectionLength; // Byte count of the segment or group. |
| 306 | }; |
| 307 | |
| 308 | enum class SectionMapEntryFlags : uint16_t { |
| 309 | Read = 1 << 0, // Segment is readable. |
| 310 | Write = 1 << 1, // Segment is writable. |
| 311 | Execute = 1 << 2, // Segment is executable. |
| 312 | AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address. |
| 313 | IsSelector = 1 << 8, // Frame represents a selector. |
| 314 | IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address. |
| 315 | IsGroup = 1 << 10 // If set, descriptor represents a group. |
| 316 | }; |
| 317 | |
| 318 | Many of these fields are not well understood, so will not be discussed further. |
| 319 | |
| 320 | .. _dbi_file_info_substream: |
| 321 | |
| 322 | File Info Substream |
| 323 | ^^^^^^^^^^^^^^^^^^^ |
| 324 | Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends, |
| 325 | and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping |
| 326 | from module to the source files that contribute to that module. Since multiple |
| 327 | modules can use the same source file (for example, a header file), this substream |
| 328 | uses a string table to store each unique file name only once, and then have each |
| 329 | module use offsets into the string table rather than embedding the string's value |
| 330 | directly. The format of this substream is as follows: |
| 331 | |
| 332 | .. code-block:: c++ |
| 333 | |
| 334 | struct FileInfoSubstream { |
| 335 | uint16_t NumModules; |
| 336 | uint16_t NumSourceFiles; |
| 337 | |
| 338 | uint16_t ModIndices[NumModules]; |
| 339 | uint16_t ModFileCounts[NumModules]; |
| 340 | uint32_t FileNameOffsets[NumSourceFiles]; |
| 341 | char NamesBuffer[][NumSourceFiles]; |
| 342 | }; |
| 343 | |
| 344 | **NumModules** - The number of modules for which source file information is |
| 345 | contained within this substream. Should match the corresponding value from the |
| 346 | ref:`dbi_header`. |
| 347 | |
| 348 | **NumSourceFiles**: In theory this is supposed to contain the number of source |
| 349 | files for which this substream contains information. But that would present a |
| 350 | problem in that the width of this field being ``16``-bits would prevent one from |
| 351 | having more than 64K source files in a program. In early versions of the file |
| 352 | format, this seems to have been the case. In order to support more than this, this |
| 353 | field of the is simply ignored, and computed dynamically by summing up the values of |
| 354 | the ``ModFileCounts`` array (discussed below). In short, this value should be |
| 355 | ignored. |
| 356 | |
| 357 | **ModIndices** - This array is present, but does not appear to be useful. |
| 358 | |
| 359 | **ModFileCountArray** - An array of ``NumModules`` integers, each one containing |
| 360 | the number of source files which contribute to the module at the specified index. |
| 361 | While each individual module is limited to 64K contributing source files, the |
| 362 | union of all modules' source files may be greater than 64K. The real number of |
| 363 | source files is thus computed by summing this array. Note that summing this array |
| 364 | does not give the number of `unique` source files, only the total number of source |
| 365 | file contributions to modules. |
| 366 | |
| 367 | **FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles** |
| 368 | here refers to the 32-bit value obtained from summing **ModFileCountArray**), where |
| 369 | each integer is an offset into **NamesBuffer** pointing to a null terminated string. |
| 370 | |
| 371 | **NamesBuffer** - An array of null terminated strings containing the actual source |
| 372 | file names. |
| 373 | |
| 374 | .. _dbi_type_server_substream: |
| 375 | |
| 376 | Type Server Substream |
| 377 | ^^^^^^^^^^^^^^^^^^^^^ |
| 378 | Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends, |
| 379 | and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout |
| 380 | of this substream is understood, although it is assumed to related somehow to the |
| 381 | usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further. |
| 382 | |
| 383 | .. _dbi_ec_substream: |
| 384 | |
| 385 | EC Substream |
| 386 | ^^^^^^^^^^^^ |
| 387 | Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends, |
| 388 | and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout |
| 389 | of this substream is understood, and it will not be discussed further. |
| 390 | |
| 391 | .. _dbi_optional_dbg_stream: |
| 392 | |
| 393 | Optional Debug Header Stream |
| 394 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 395 | Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and |
| 396 | consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of |
| 397 | stream indices (e.g. ``uint16_t``'s), each of which identifies a stream |
| 398 | index in the larger MSF file which contains some additional debug information. |
| 399 | Each position of this array has a special meaning, allowing one to determine |
| 400 | what kind of debug information is at the referenced stream. ``11`` indices |
| 401 | are currently understood, although it's possible there may be more. The |
| 402 | layout of each stream generally corresponds exactly to a particular type |
| 403 | of debug data directory from the PE/COFF file. The format of these fields |
| 404 | can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__. |
| 405 | |
| 406 | **FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a |
| 407 | debug data directory of type ``IMAGE_DEBUG_TYPE_FPO`` |
| 408 | |
| 409 | **Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream |
| 410 | is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``. |
| 411 | |
| 412 | **Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a |
| 413 | debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``. |
| 414 | |
| 415 | **Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream |
| 416 | is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This |
| 417 | is used for mapping addresses between instrumented and uninstrumented code. |
| 418 | |
| 419 | **Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream |
| 420 | is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This |
| 421 | is used for mapping addresses between instrumented and uninstrumented code. |
| 422 | |
| 423 | **Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from |
| 424 | the original executable. |
| 425 | |
| 426 | **Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not |
| 427 | understood, but it is assumed to be a mapping from ``CLR Token`` to |
| 428 | ``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__ |
| 429 | for more information. |
| 430 | |
| 431 | **Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the |
| 432 | executable. |
| 433 | |
| 434 | **Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata`` |
| 435 | section from the executable, but that would make it identical to |
| 436 | ``DbgStreamArray[1]``. The difference between these two indices is not well |
| 437 | understood. |
| 438 | |
| 439 | **New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a |
| 440 | debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this |
| 441 | differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have |
| 442 | used the "new" format rather than the "old" format. |
| 443 | |
| 444 | **Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar |
| 445 | to ``DbgStreamArray[5]``, but has not been observed in practice. |