blob: fec0e29ae533cf3b02a40dfa7abc32c65ce5d37c [file] [log] [blame]
Zachary Turner576eea82016-11-14 17:59:28 +00001=====================================
2The PDB DBI (Debug Info) Stream
3=====================================
4
5.. contents::
6 :local:
7
8.. _dbi_intro:
9
10Introduction
11============
12
13The PDB DBI Stream (Index 3) is one of the largest and most important streams
14in a PDB file. It contains information about how the program was compiled,
15(e.g. compilation flags, etc), the compilands (e.g. object files) that
16were used to link together the program, the source files which were used
17to build the program, as well as references to other streams that contain more
18detailed information about each compiland, such as the CodeView symbol records
19contained within each compiland and the source and line information for
20functions and other symbols within each compiland.
21
22
23.. _dbi_header:
24
25Stream Header
26=============
27At offset 0 of the DBI Stream is a header with the following layout:
28
29
30.. code-block:: c++
31
32 struct DbiStreamHeader {
33 int32_t VersionSignature;
34 uint32_t VersionHeader;
35 uint32_t Age;
36 uint16_t GlobalStreamIndex;
37 uint16_t BuildNumber;
38 uint16_t PublicStreamIndex;
39 uint16_t PdbDllVersion;
40 uint16_t SymRecordStream;
41 uint16_t PdbDllRbld;
42 int32_t ModInfoSize;
43 int32_t SectionContributionSize;
44 int32_t SectionMapSize;
45 int32_t SourceInfoSize;
46 int32_t TypeServerSize;
47 uint32_t MFCTypeServerIndex;
48 int32_t OptionalDbgHeaderSize;
49 int32_t ECSubstreamSize;
50 uint16_t Flags;
51 uint16_t Machine;
52 uint32_t Padding;
53 };
54
55- **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
56
57- **VersionHeader** - A value from the following enum.
58
59.. code-block:: c++
60
61 enum class DbiStreamVersion : uint32_t {
62 VC41 = 930803,
63 V50 = 19960307,
64 V60 = 19970606,
65 V70 = 19990903,
66 V110 = 20091201
67 };
68
69Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
70``V70``, and it is not clear what the other values are for.
71
72- **Age** - The number of times the PDB has been written. Equal to the same
73 field from the :ref:`PDB Stream header <pdb_stream_header>`.
74
75- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
76 which contains CodeView symbol records for all global symbols. Actual records
77 are stored in the symbol record stream, and are referenced from this stream.
78
79- **BuildNumber** - A bitfield containing values representing the major and minor
80 version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
81 program, with the following layout:
82
83.. code-block:: c++
84
85 uint16_t MinorVersion : 8;
86 uint16_t MajorVersion : 7;
87 uint16_t NewVersionFormat : 1;
88
89For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
90If it is ``false``, the layout above does not apply and the reader should consult
91the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
92further guidance.
93
94- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
95 which contains CodeView symbol records for all public symbols. Actual records
96 are stored in the symbol record stream, and are referenced from this stream.
97
98- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
99 PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
100
101- **SymRecordStream** - The stream containing all CodeView symbol records used
102 by the program. This is used for deduplication, so that many different
103 compilands can refer to the same symbols without having to include the full record
104 content inside of each module stream.
105
106- **PdbDllRbld** - Unknown
107
108- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
109
110- **Flags** - A bitfield with the following layout, containing various
111 information about how the program was built:
112
113.. code-block:: c++
114
115 uint16_t WasIncrementallyLinked : 1;
116 uint16_t ArePrivateSymbolsStripped : 1;
117 uint16_t HasConflictingTypes : 1;
118 uint16_t Reserved : 13;
119
120The only one of these that is not self-explanatory is ``HasConflictingTypes``.
121Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
122If it is passed to ``link.exe``, this field will be set. Otherwise it will
123not be set. It is unclear what this flag does, although it seems to have
124subtle implications on the algorithm used to look up type records.
125
126- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
127 enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
128
129Immediately after the fixed-size DBI Stream header are ``7`` variable-length
130`substreams`. The following ``7`` fields of the DBI Stream header specify the
131number of bytes of the corresponding substream. Each substream's contents will
132be described in detail :ref:`below <dbi_substreams>`. The length of the entire
133DBI Stream should equal ``64`` (the length of the header above) plus the value
134of each of the following ``7`` fields.
135
136- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
137
138- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
139
140- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
141
142- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
143
144- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`.
145
146- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
147
148- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
149
150.. _dbi_substreams:
151
152Substreams
153==========
154
155.. _dbi_mod_info_substream:
156
157Module Info Substream
158^^^^^^^^^^^^^^^^^^^^^
159
160Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The
161module info substream is an array of variable-length records, each one
162describing a single module (e.g. object file) linked into the program. Each
163record in the array has the format:
164
165.. code-block:: c++
166
167 struct SectionContribEntry {
168 uint16_t Section;
169 char Padding1[2];
170 int32_t Offset;
171 int32_t Size;
172 uint32_t Characteristics;
173 uint16_t ModuleIndex;
174 char Padding2[2];
175 uint32_t DataCrc;
176 uint32_t RelocCrc;
177 };
178
179While most of these are self-explanatory, the ``Characteristics`` field
180warrants some elaboration. It corresponds to the ``Characteristics``
181field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
182structure.
183
184.. code-block:: c++
185
186 struct ModInfo {
187 uint32_t Unused1;
188 SectionContribEntry SectionContr;
189 uint16_t Flags;
190 uint16_t ModuleSymStream;
191 uint32_t SymByteSize;
192 uint32_t C11ByteSize;
193 uint32_t C13ByteSize;
194 uint16_t SourceFileCount;
195 char Padding[2];
196 uint32_t Unused2;
197 uint32_t SourceFileNameIndex;
198 uint32_t PdbFilePathNameIndex;
199 char ModuleName[];
200 char ObjFileName[];
201 };
202
203- **SectionContr** - Describes the properties of the section in the final binary
204 which contain the code and data from this module.
205
206- **Flags** - A bitfield with the following format:
207
208.. code-block:: c++
209
210 uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB.
211 uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is.
212 uint16_t Unused : 6;
213 uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM.
214
215
216- **ModuleSymStream** - The index of the stream that contains symbol information
217 for this module. This includes CodeView symbol information as well as source
218 and line information.
219
220- **SymByteSize** - The number of bytes of data from the stream identified by
221 ``ModuleSymStream`` that represent CodeView symbol records.
222
223- **C11ByteSize** - The number of bytes of data from the stream identified by
224 ``ModuleSymStream`` that represent C11-style CodeView line information.
225
226- **C13ByteSize** - The number of bytes of data from the stream identified by
227 ``ModuleSymStream`` that represent C13-style CodeView line information. At
228 most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
229
230- **SourceFileCount** - The number of source files that contributed to this
231 module during compilation.
232
233- **SourceFileNameIndex** - The offset in the names buffer of the primary
234 translation unit used to build this module. All PDB files observed to date
235 always have this value equal to 0.
236
237- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
238 containing this module's symbol information. This has only been observed
239 to be non-zero for the special ``* Linker *`` module.
240
241- **ModuleName** - The module name. This is usually either a full path to an
242 object file (either directly passed to ``link.exe`` or from an archive) or
243 a string of the form ``Import:<dll name>``.
244
245- **ObjFileName** - The object file name. In the case of an module that is
246 linked directly passed to ``link.exe``, this is the same as **ModuleName**.
247 In the case of a module that comes from an archive, this is usually the full
248 path to the archive.
249
250.. _dbi_sec_contr_substream:
251
252Section Contribution Substream
253^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
254Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
255and consumes ``Header->SectionContributionSize`` bytes. This substream begins
256with a single ``uint32_t`` which will be one of the following values:
257
258.. code-block:: c++
259
260 enum class SectionContrSubstreamVersion : uint32_t {
261 Ver60 = 0xeffe0000 + 19970605,
262 V2 = 0xeffe0000 + 20140516
263 };
264
265``Ver60`` is the only value which has been observed in a PDB so far. Following
266this ``4`` byte field is an array of fixed-length structures. If the version
267is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the
268version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
269defined as follows:
270
271.. code-block:: c++
272
273 struct SectionContribEntry2 {
274 SectionContribEntry SC;
275 uint32_t ISectCoff;
276 };
277
278The purpose of the second field is not well understood.
279
280
281.. _dbi_section_map_substream:
282
283Section Map Substream
284^^^^^^^^^^^^^^^^^^^^^
285Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
286and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8``
287byte header followed by an array of fixed-length records. The header and records
288have the following layout:
289
290.. code-block:: c++
291
292 struct SectionMapHeader {
293 uint16_t Count; // Number of segment descriptors
294 uint16_t LogCount; // Number of logical segment descriptors
295 };
296
297 struct SectionMapEntry {
298 uint16_t Flags; // See the SectionMapEntryFlags enum below.
299 uint16_t Ovl; // Logical overlay number
300 uint16_t Group; // Group index into descriptor array.
301 uint16_t Frame;
302 uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF.
303 uint16_t ClassName; // Byte index of class in string table, or 0xFFFF.
304 uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
305 uint32_t SectionLength; // Byte count of the segment or group.
306 };
307
308 enum class SectionMapEntryFlags : uint16_t {
309 Read = 1 << 0, // Segment is readable.
310 Write = 1 << 1, // Segment is writable.
311 Execute = 1 << 2, // Segment is executable.
312 AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address.
313 IsSelector = 1 << 8, // Frame represents a selector.
314 IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
315 IsGroup = 1 << 10 // If set, descriptor represents a group.
316 };
317
318Many of these fields are not well understood, so will not be discussed further.
319
320.. _dbi_file_info_substream:
321
322File Info Substream
323^^^^^^^^^^^^^^^^^^^
324Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
325and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping
326from module to the source files that contribute to that module. Since multiple
327modules can use the same source file (for example, a header file), this substream
328uses a string table to store each unique file name only once, and then have each
329module use offsets into the string table rather than embedding the string's value
330directly. The format of this substream is as follows:
331
332.. code-block:: c++
333
334 struct FileInfoSubstream {
335 uint16_t NumModules;
336 uint16_t NumSourceFiles;
337
338 uint16_t ModIndices[NumModules];
339 uint16_t ModFileCounts[NumModules];
340 uint32_t FileNameOffsets[NumSourceFiles];
341 char NamesBuffer[][NumSourceFiles];
342 };
343
344**NumModules** - The number of modules for which source file information is
345contained within this substream. Should match the corresponding value from the
346ref:`dbi_header`.
347
348**NumSourceFiles**: In theory this is supposed to contain the number of source
349files for which this substream contains information. But that would present a
350problem in that the width of this field being ``16``-bits would prevent one from
351having more than 64K source files in a program. In early versions of the file
352format, this seems to have been the case. In order to support more than this, this
353field of the is simply ignored, and computed dynamically by summing up the values of
354the ``ModFileCounts`` array (discussed below). In short, this value should be
355ignored.
356
357**ModIndices** - This array is present, but does not appear to be useful.
358
359**ModFileCountArray** - An array of ``NumModules`` integers, each one containing
360the number of source files which contribute to the module at the specified index.
361While each individual module is limited to 64K contributing source files, the
362union of all modules' source files may be greater than 64K. The real number of
363source files is thus computed by summing this array. Note that summing this array
364does not give the number of `unique` source files, only the total number of source
365file contributions to modules.
366
367**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
368here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
369each integer is an offset into **NamesBuffer** pointing to a null terminated string.
370
371**NamesBuffer** - An array of null terminated strings containing the actual source
372file names.
373
374.. _dbi_type_server_substream:
375
376Type Server Substream
377^^^^^^^^^^^^^^^^^^^^^
378Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
379and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout
380of this substream is understood, although it is assumed to related somehow to the
381usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further.
382
383.. _dbi_ec_substream:
384
385EC Substream
386^^^^^^^^^^^^
387Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
388and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout
389of this substream is understood, and it will not be discussed further.
390
391.. _dbi_optional_dbg_stream:
392
393Optional Debug Header Stream
394^^^^^^^^^^^^^^^^^^^^^^^^^^^^
395Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
396consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of
397stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
398index in the larger MSF file which contains some additional debug information.
399Each position of this array has a special meaning, allowing one to determine
400what kind of debug information is at the referenced stream. ``11`` indices
401are currently understood, although it's possible there may be more. The
402layout of each stream generally corresponds exactly to a particular type
403of debug data directory from the PE/COFF file. The format of these fields
404can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
405
406**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a
407debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
408
409**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream
410is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
411
412**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a
413debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
414
415**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
416is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
417is used for mapping addresses between instrumented and uninstrumented code.
418
419**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
420is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
421is used for mapping addresses between instrumented and uninstrumented code.
422
423**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
424the original executable.
425
426**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
427understood, but it is assumed to be a mapping from ``CLR Token`` to
428``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
429for more information.
430
431**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the
432executable.
433
434**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
435section from the executable, but that would make it identical to
436``DbgStreamArray[1]``. The difference between these two indices is not well
437understood.
438
439**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a
440debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this
441differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
442used the "new" format rather than the "old" format.
443
444**Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar
445to ``DbgStreamArray[5]``, but has not been observed in practice.