Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 1 | =====================================
|
| 2 | The MSF File Format
|
| 3 | =====================================
|
| 4 |
|
| 5 | .. contents::
|
| 6 | :local:
|
| 7 |
|
Zachary Turner | 0b5677c | 2018-01-12 21:42:39 +0000 | [diff] [blame] | 8 | .. _msf_layout:
|
| 9 |
|
| 10 | File Layout
|
| 11 | ===========
|
| 12 |
|
| 13 | The MSF file format consists of the following components:
|
| 14 |
|
| 15 | 1. :ref:`msf_superblock`
|
| 16 | 2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
|
| 17 | 3. Data
|
| 18 |
|
| 19 | Each component is stored as an indexed block, the length of which is specified
|
| 20 | in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
|
| 21 | following pattern (sometimes referred to as an "interval"):
|
| 22 |
|
| 23 | 1. 1 block of data
|
| 24 | 2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
|
| 25 | 3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
|
| 26 | 4. ``SuperBlock::BlockSize - 3`` blocks of data
|
| 27 |
|
| 28 | In the first interval, the first data block is used to store
|
| 29 | :ref:`msf_superblock`.
|
| 30 |
|
| 31 | The following diagram demonstrates the general layout of the file (\| denotes
|
| 32 | the end of an interval, and is for visualization purposes only):
|
| 33 |
|
| 34 | +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
|
| 35 | | Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
|
| 36 | +=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
|
| 37 | | Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
|
| 38 | +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
|
| 39 |
|
| 40 | The file may end after any block, including immediately after a FPM1.
|
| 41 |
|
| 42 | .. note::
|
| 43 | LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
|
| 44 | variant), so the rest of this document will assume a block size of 4096.
|
| 45 |
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 46 | .. _msf_superblock:
|
| 47 |
|
| 48 | The Superblock
|
| 49 | ==============
|
| 50 | At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
|
| 51 | follows:
|
| 52 |
|
| 53 | .. code-block:: c++
|
| 54 |
|
| 55 | struct SuperBlock {
|
| 56 | char FileMagic[sizeof(Magic)];
|
| 57 | ulittle32_t BlockSize;
|
| 58 | ulittle32_t FreeBlockMapBlock;
|
| 59 | ulittle32_t NumBlocks;
|
| 60 | ulittle32_t NumDirectoryBytes;
|
| 61 | ulittle32_t Unknown;
|
| 62 | ulittle32_t BlockMapAddr;
|
| 63 | };
|
| 64 |
|
| 65 | - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
|
| 66 | followed by the bytes ``1A 44 53 00 00 00``.
|
| 67 | - **BlockSize** - The block size of the internal file system. Valid values are
|
| 68 | 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
|
| 69 | depending on the block sizes. For the purposes of LLVM, we handle only block
|
| 70 | sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
|
| 71 | - **FreeBlockMapBlock** - The index of a block within the file, at which begins
|
| 72 | a bitfield representing the set of all blocks within the file which are "free"
|
Zachary Turner | 0b5677c | 2018-01-12 21:42:39 +0000 | [diff] [blame] | 73 | (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
|
| 74 | more information.
|
| 75 | **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 76 | - **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
|
| 77 | should equal the size of the file on disk.
|
| 78 | - **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
|
| 79 | directory contains information about each stream's size and the set of blocks
|
| 80 | that it occupies. It will be described in more detail later.
|
| 81 | - **BlockMapAddr** - The index of a block within the MSF file. At this block is
|
| 82 | an array of ``ulittle32_t``'s listing the blocks that the stream directory
|
| 83 | resides on. For large MSF files, the stream directory (which describes the
|
| 84 | block layout of each stream) may not fit entirely on a single block. As a
|
| 85 | result, this extra layer of indirection is introduced, whereby this block
|
| 86 | contains the list of blocks that the stream directory occupies, and the stream
|
| 87 | directory itself can be stitched together accordingly. The number of
|
| 88 | ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
|
Zachary Turner | 0b5677c | 2018-01-12 21:42:39 +0000 | [diff] [blame] | 89 |
|
| 90 | .. _msf_freeblockmap:
|
| 91 |
|
| 92 | The Free Block Map
|
| 93 | ==================
|
| 94 |
|
| 95 | The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
|
| 96 | series of blocks which contains a bit flag for every block in the file. The
|
| 97 | flag will be set to 0 if the block is in use, and 1 if the block is unused.
|
| 98 |
|
| 99 | Each file contains two FPMs, one of which is active at any given time. This
|
| 100 | feature is designed to support incremental and atomic updates of the underlying
|
| 101 | MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
|
| 102 | write your new modified bitfield to FPM2, and vice versa. Only when you commit
|
| 103 | the file to disk do you need to swap the value in the SuperBlock to point to
|
| 104 | the new ``FreeBlockMapBlock``.
|
| 105 |
|
| 106 | The Free Block Maps are stored as a series of single blocks thoughout the file
|
| 107 | at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
|
| 108 | bytes, it contains 8 times as many bits as an interval has blocks. This means
|
| 109 | that the first block of each FPM refers to the first 8 intervals of the file
|
| 110 | (the first 32768 blocks), the second block of each FPM refers to the next 8
|
| 111 | blocks, and so on. This results in far more FPM blocks being present than are
|
| 112 | required, but in order to maintain backwards compatibility the format must stay
|
| 113 | this way.
|
| 114 |
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 115 | The Stream Directory
|
| 116 | ====================
|
| 117 | The Stream Directory is the root of all access to the other streams in an MSF
|
| 118 | file. Beginning at byte 0 of the stream directory is the following structure:
|
| 119 |
|
| 120 | .. code-block:: c++
|
| 121 |
|
| 122 | struct StreamDirectory {
|
| 123 | ulittle32_t NumStreams;
|
| 124 | ulittle32_t StreamSizes[NumStreams];
|
| 125 | ulittle32_t StreamBlocks[NumStreams][];
|
| 126 | };
|
Zachary Turner | 0b5677c | 2018-01-12 21:42:39 +0000 | [diff] [blame] | 127 |
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 128 | And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
|
| 129 | Note that each of the last two arrays is of variable length, and in particular
|
Zachary Turner | 0b5677c | 2018-01-12 21:42:39 +0000 | [diff] [blame] | 130 | that the second array is jagged.
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 131 |
|
| 132 | **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
|
| 133 | streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
|
| 134 |
|
| 135 | Stream 0: ceil(1000 / 4096) = 1 block
|
| 136 |
|
| 137 | Stream 1: ceil(8000 / 4096) = 2 blocks
|
| 138 |
|
| 139 | Stream 2: ceil(16000 / 4096) = 4 blocks
|
| 140 |
|
| 141 | Stream 3: ceil(9000 / 4096) = 3 blocks
|
| 142 |
|
| 143 | In total, 10 blocks are used. Let's see what the stream directory might look
|
| 144 | like:
|
| 145 |
|
| 146 | .. code-block:: c++
|
| 147 |
|
| 148 | struct StreamDirectory {
|
| 149 | ulittle32_t NumStreams = 4;
|
| 150 | ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
|
| 151 | ulittle32_t StreamBlocks[][] = {
|
| 152 | {4},
|
| 153 | {5, 6},
|
| 154 | {11, 9, 7, 8},
|
| 155 | {10, 15, 12}
|
| 156 | };
|
| 157 | };
|
Zachary Turner | 0b5677c | 2018-01-12 21:42:39 +0000 | [diff] [blame] | 158 |
|
Zachary Turner | ab792ca | 2016-11-10 19:24:21 +0000 | [diff] [blame] | 159 | In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
|
| 160 | would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
|
| 161 | ``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
|
| 162 |
|
| 163 | Note also that the streams are discontiguous, and that part of stream 3 is in the
|
| 164 | middle of part of stream 2. You cannot assume anything about the layout of the
|
| 165 | blocks!
|
| 166 |
|
| 167 | Alignment and Block Boundaries
|
| 168 | ==============================
|
| 169 | As may be clear by now, it is possible for a single field (whether it be a high
|
| 170 | level record, a long string field, or even a single ``uint16``) to begin and
|
| 171 | end in separate blocks. For example, if the block size is 4096 bytes, and a
|
| 172 | ``uint16`` field begins at the last byte of the current block, then it would
|
| 173 | need to end on the first byte of the next block. Since blocks are not
|
| 174 | necessarily contiguously laid out in the file, this means that both the consumer
|
| 175 | and the producer of an MSF file must be prepared to split data apart
|
| 176 | accordingly. In the aforementioned example, the high byte of the ``uint16``
|
| 177 | would be written to the last byte of block N, and the low byte would be written
|
| 178 | to the first byte of block N+1, which could be tens of thousands of bytes later
|
| 179 | (or even earlier!) in the file, depending on what the stream directory says.
|