Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 1 | =====================================
|
| 2 | The PDB File Format
|
| 3 | =====================================
|
| 4 |
|
| 5 | .. contents::
|
| 6 | :local:
|
| 7 |
|
| 8 | .. _pdb_intro:
|
| 9 |
|
| 10 | Introduction
|
| 11 | ============
|
| 12 |
|
| 13 | PDB (Program Database) is a file format invented by Microsoft and which contains
|
| 14 | debug information that can be consumed by debuggers and other tools. Since
|
| 15 | officially supported APIs exist on Windows for querying debug information from
|
| 16 | PDBs even without the user understanding the internals of the file format, a
|
| 17 | large ecosystem of tools has been built for Windows to consume this format. In
|
| 18 | order for Clang to be able to generate programs that can interoperate with these
|
| 19 | tools, it is necessary for us to generate PDB files ourselves.
|
| 20 |
|
| 21 | At the same time, LLVM has a long history of being able to cross-compile from
|
| 22 | any platform to any platform, and we wish for the same to be true here. So it
|
| 23 | is necessary for us to understand the PDB file format at the byte-level so that
|
| 24 | we can generate PDB files entirely on our own.
|
| 25 |
|
| 26 | This manual describes what we know about the PDB file format today. The layout
|
| 27 | of the file, the various streams contained within, the format of individual
|
| 28 | records within, and more.
|
| 29 |
|
| 30 | We would like to extend our heartfelt gratitude to Microsoft, without whom we
|
| 31 | would not be where we are today. Much of the knowledge contained within this
|
| 32 | manual was learned through reading code published by Microsoft on their `GitHub
|
| 33 | repo <https://github.com/Microsoft/microsoft-pdb>`__.
|
| 34 |
|
| 35 | .. _pdb_layout:
|
| 36 |
|
| 37 | File Layout
|
| 38 | ===========
|
| 39 |
|
Zachary Turner | 576eea8 | 2016-11-14 17:59:28 +0000 | [diff] [blame] | 40 | .. important::
|
| 41 | Unless otherwise specified, all numeric values are encoded in little endian.
|
| 42 | If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
|
| 43 | assume it is little endian!
|
| 44 |
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 45 | .. toctree::
|
| 46 | :hidden:
|
| 47 |
|
| 48 | MsfFile
|
| 49 | PdbStream
|
| 50 | TpiStream
|
| 51 | DbiStream
|
| 52 | ModiStream
|
| 53 | PublicStream
|
| 54 | GlobalStream
|
| 55 | HashStream
|
Zachary Turner | 29916a1 | 2016-11-29 22:14:56 +0000 | [diff] [blame] | 56 | CodeViewSymbols
|
| 57 | CodeViewTypes
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 58 |
|
| 59 | .. _msf:
|
| 60 |
|
| 61 | The MSF Container
|
| 62 | -----------------
|
| 63 | A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
|
| 64 | An MSF file is actually a miniature "file system within a file". It contains
|
| 65 | multiple streams (aka files) which can represent arbitrary data, and these
|
| 66 | streams are divided into blocks which may not necessarily be contiguously
|
| 67 | laid out within the file (aka fragmented). Additionally, the MSF contains a
|
| 68 | stream directory (aka MFT) which describes how the streams (files) are laid
|
| 69 | out within the MSF.
|
| 70 |
|
| 71 | For more information about the MSF container format, stream directory, and
|
| 72 | block layout, see :doc:`MsfFile`.
|
| 73 |
|
| 74 | .. _streams:
|
| 75 |
|
| 76 | Streams
|
| 77 | -------
|
| 78 | The PDB format contains a number of streams which describe various information
|
| 79 | such as the types, symbols, source files, and compilands (e.g. object files)
|
| 80 | of a program, as well as some additional streams containing hash tables that are
|
| 81 | used by debuggers and other tools to provide fast lookup of records and types
|
| 82 | by name, and various other information about how the program was compiled such
|
| 83 | as the specific toolchain used, and more. A summary of streams contained in a
|
| 84 | PDB file is as follows:
|
| 85 |
|
| 86 | +--------------------+------------------------------+-------------------------------------------+
|
| 87 | | Name | Stream Index | Contents |
|
| 88 | +====================+==============================+===========================================+
|
| 89 | | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
|
| 90 | +--------------------+------------------------------+-------------------------------------------+
|
| 91 | | PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
|
| 92 | | | | - Fields to match EXE to this PDB |
|
| 93 | | | | - Map of named streams to stream indices |
|
| 94 | +--------------------+------------------------------+-------------------------------------------+
|
| 95 | | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
|
| 96 | | | | - Index of TPI Hash Stream |
|
| 97 | +--------------------+------------------------------+-------------------------------------------+
|
| 98 | | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
|
| 99 | | | | - Indices of individual module streams |
|
| 100 | | | | - Indices of public / global streams |
|
| 101 | | | | - Section Contribution Information |
|
| 102 | | | | - Source File Information |
|
| 103 | | | | - FPO / PGO Data |
|
| 104 | +--------------------+------------------------------+-------------------------------------------+
|
| 105 | | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
|
| 106 | | | | - Index of IPI Hash Stream |
|
| 107 | +--------------------+------------------------------+-------------------------------------------+
|
| 108 | | /LinkInfo | - Contained in PDB Stream | - Unknown |
|
| 109 | | | Named Stream map | |
|
| 110 | +--------------------+------------------------------+-------------------------------------------+
|
| 111 | | /src/headerblock | - Contained in PDB Stream | - Unknown |
|
| 112 | | | Named Stream map | |
|
| 113 | +--------------------+------------------------------+-------------------------------------------+
|
| 114 | | /names | - Contained in PDB Stream | - PDB-wide global string table used for |
|
| 115 | | | Named Stream map | string de-duplication |
|
| 116 | +--------------------+------------------------------+-------------------------------------------+
|
| 117 | | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
|
| 118 | | | - One for each compiland | - Line Number Information |
|
| 119 | +--------------------+------------------------------+-------------------------------------------+
|
| 120 | | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
|
| 121 | | | | - Index of Public Hash Stream |
|
| 122 | +--------------------+------------------------------+-------------------------------------------+
|
| 123 | | Global Stream | - Contained in DBI Stream | - Global Symbol Records |
|
| 124 | | | | - Index of Global Hash Stream |
|
| 125 | +--------------------+------------------------------+-------------------------------------------+
|
| 126 | | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
|
| 127 | | | | by name |
|
| 128 | +--------------------+------------------------------+-------------------------------------------+
|
| 129 | | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
|
| 130 | | | | by name |
|
| 131 | +--------------------+------------------------------+-------------------------------------------+
|
| 132 |
|
| 133 | More information about the structure of each of these can be found on the
|
| 134 | following pages:
|
| 135 |
|
| 136 | :doc:`PdbStream`
|
| 137 | Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
|
| 138 |
|
| 139 | :doc:`TpiStream`
|
| 140 | Information about the TPI stream and the CodeView records contained within.
|
| 141 |
|
| 142 | :doc:`DbiStream`
|
| 143 | Information about the DBI stream and relevant substreams including the Module Substreams,
|
| 144 | source file information, and CodeView symbol records contained within.
|
| 145 |
|
| 146 | :doc:`ModiStream`
|
| 147 | Information about the Module Information Stream, of which there is one for each compilation
|
| 148 | unit and the format of symbols contained within.
|
| 149 |
|
| 150 | :doc:`PublicStream`
|
| 151 | Information about the Public Symbol Stream.
|
| 152 |
|
| 153 | :doc:`GlobalStream`
|
| 154 | Information about the Global Symbol Stream.
|
| 155 |
|
| 156 | :doc:`HashStream`
|
| 157 | Information about the Hash Table stream, and how it can be used to quickly look up records
|
| 158 | by name.
|
| 159 |
|
| 160 | CodeView
|
| 161 | ========
|
| 162 | CodeView is another format which comes into the picture. While MSF defines
|
| 163 | the structure of the overall file, and PDB defines the set of streams that
|
| 164 | appear within the MSF file and the format of those streams, CodeView defines
|
| 165 | the format of **symbol and type records** that appear within specific streams.
|
Zachary Turner | 29916a1 | 2016-11-29 22:14:56 +0000 | [diff] [blame] | 166 | Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 167 | more information about the CodeView format.
|