The Android Open Source Project | f6c3871 | 2009-03-03 19:28:47 -0800 | [diff] [blame] | 1 | <html> |
| 2 | <head> |
| 3 | <title>Dalvik Optimization and Verification</title> |
| 4 | </head> |
| 5 | |
| 6 | <body> |
| 7 | <h1>Dalvik Optimization and Verification With <i>dexopt</i></h1> |
| 8 | |
| 9 | <p> |
| 10 | The Dalvik virtual machine was designed specifically for the Android |
| 11 | mobile platform. The target systems have little RAM, store data on slow |
| 12 | internal flash memory, and generally have the performance characteristics |
| 13 | of decade-old desktop systems. They also run Linux, which provides |
| 14 | virtual memory, processes and threads, and UID-based security mechanisms. |
| 15 | <p> |
| 16 | The features and limitations caused us to focus on certain goals: |
| 17 | |
| 18 | <ul> |
| 19 | <li>Class data, notably bytecode, must be shared between multiple |
| 20 | processes to minimize total system memory usage. |
| 21 | <li>The overhead in launching a new app must be minimized to keep |
| 22 | the device responsive. |
| 23 | <li>Storing class data in individual files results in a lot of |
| 24 | redundancy, especially with respect to strings. To conserve disk |
| 25 | space we need to factor this out. |
| 26 | <li>Parsing class data fields adds unnecessary overhead during |
| 27 | class loading. Accessing data values (e.g. integers and strings) |
| 28 | directly as C types is better. |
| 29 | <li>Bytecode verification is necessary, but slow, so we want to verify |
| 30 | as much as possible outside app execution. |
| 31 | <li>Bytecode optimization (quickened instructions, method pruning) is |
| 32 | important for speed and battery life. |
| 33 | <li>For security reasons, processes may not edit shared code. |
| 34 | </ul> |
| 35 | |
| 36 | <p> |
| 37 | The typical VM implementation uncompresses individual classes from a |
| 38 | compressed archive and stores them on the heap. This implies a separate |
| 39 | copy of each class in every process, and slows application startup because |
| 40 | the code must be uncompressed (or at least read off disk in many small |
| 41 | pieces). On the other hand, having the bytecode on the local heap makes |
| 42 | it easy to rewrite instructions on first use, facilitating a number of |
| 43 | different optimizations. |
| 44 | <p> |
| 45 | The goals led us to make some fundamental decisions: |
| 46 | |
| 47 | <ul> |
| 48 | <li>Multiple classes are aggregated into a single "DEX" file. |
| 49 | <li>DEX files are mapped read-only and shared between processes. |
| 50 | <li>Byte ordering and word alignment are adjusted to suit the local |
| 51 | system. |
| 52 | <li>Bytecode verification is mandatory for all classes, but we want |
| 53 | to "pre-verify" whatever we can. |
| 54 | <li>Optimizations that require rewriting bytecode must be done ahead |
| 55 | of time. |
| 56 | </ul> |
| 57 | |
| 58 | <p> |
| 59 | The consequences of these decisions are explained in the following sections. |
| 60 | |
| 61 | |
| 62 | <h2>VM Operation</h2> |
| 63 | |
| 64 | <p> |
| 65 | Application code is delivered to the system in a <code>.jar</code> |
| 66 | or <code>.apk</code> file. These are really just <code>.zip</code> |
| 67 | archives with some meta-data files added. The Dalvik DEX data file |
| 68 | is always called <code>classes.dex</code>. |
| 69 | <p> |
| 70 | The bytecode cannot be memory-mapped and executed directly from the zip |
| 71 | file, because the data is compressed and the start of the file is not |
| 72 | guaranteed to be word-aligned. These problems could be addressed by |
| 73 | storing <code>classes.dex</code> without compression and padding out the zip |
| 74 | file, but that would increase the size of the package sent across the |
| 75 | data network. |
| 76 | <p> |
| 77 | We need to extract <code>classes.dex</code> from the zip archive before |
| 78 | we can use it. While we have the file available, we might as well perform |
| 79 | some of the other actions (realignment, optimization, verification) described |
| 80 | earlier. This raises a new question however: who is responsible for doing |
| 81 | this, and where do we keep the output? |
| 82 | |
| 83 | <h3>Preparation</h3> |
| 84 | |
| 85 | <p> |
| 86 | There are at least three different ways to create a "prepared" DEX file, |
| 87 | sometimes known as "ODEX" (for Optimized DEX): |
| 88 | <ol> |
| 89 | <li>The VM does it "just in time". The output goes into a special |
| 90 | <code>dalvik-cache</code> directory. This works on the desktop and |
| 91 | engineering-only device builds where the permissions on the |
| 92 | <code>dalvik-cache</code> directory are not restricted. On production |
| 93 | devices, this is not allowed. |
| 94 | <li>The system installer does it when an application is first added. |
| 95 | It has the privileges required to write to <code>dalvik-cache</code>. |
| 96 | <li>The build system does it ahead of time. The relevant <code>jar</code> |
| 97 | / <code>apk</code> files are present, but the <code>classes.dex</code> |
| 98 | is stripped out. The optimized DEX is stored next to the original |
| 99 | zip archive, not in <code>dalvik-cache</code>, and is part of the |
| 100 | system image. |
| 101 | </ol> |
| 102 | <p> |
| 103 | The <code>dalvik-cache</code> directory is more accurately |
| 104 | <code>$ANDROID_DATA/data/dalvik-cache</code>. The files inside it have |
| 105 | names derived from the full path of the source DEX. On the device the |
| 106 | directory is owned by <code>system</code> / <code>system</code> |
| 107 | and has 0771 permissions, and the optimized DEX files stored there are |
| 108 | owned by <code>system</code> and the |
| 109 | application's group, with 0644 permissions. DRM-locked applications will |
| 110 | use 640 permissions to prevent other user applications from examining them. |
| 111 | The bottom line is that you can read your own DEX file and those of most |
| 112 | other applications, but you cannot create, modify, or remove them. |
| 113 | <p> |
| 114 | Preparation of the DEX file for the "just in time" and "system installer" |
| 115 | approaches proceeds in three steps: |
| 116 | <p> |
| 117 | First, the dalvik-cache file is created. This must be done in a process |
| 118 | with appropriate privileges, so for the "system installer" case this is |
| 119 | done within <code>installd</code>, which runs as root. |
| 120 | <p> |
| 121 | Second, the <code>classes.dex</code> entry is extracted from the the zip |
| 122 | archive. A small amount of space is left at the start of the file for |
| 123 | the ODEX header. |
| 124 | <p> |
| 125 | Third, the file is memory-mapped for easy access and tweaked for use on |
| 126 | the current system. This includes byte-swapping and structure realigning, |
| 127 | but no meaningful changes to the DEX file. We also do some basic |
| 128 | structure checks, such as ensuring that file offsets and data indices |
| 129 | fall within valid ranges. |
| 130 | <p> |
| 131 | The build system uses a hairy process that involves starting the |
| 132 | emulator, forcing just-in-time optimization of all relevant DEX files, |
| 133 | and then extracting the results from <code>dalvik-cache</code>. The |
| 134 | reasons for doing this, rather than using a tool that runs on the desktop, |
| 135 | will become more apparent when the optimizations are explained. |
| 136 | <p> |
| 137 | Once the code is byte-swapped and aligned, we're ready to go. We append |
| 138 | some pre-computed data, fill in the ODEX header at the start of the file, |
| 139 | and start executing. (The header is filled in last, so that we don't |
| 140 | try to use a partial file.) If we're interested in verification and |
| 141 | optimization, however, we need to insert a step after the initial prep. |
| 142 | |
| 143 | <h3>dexopt</h3> |
| 144 | |
| 145 | <p> |
| 146 | We want to verify and optimize all of the classes in the DEX file. The |
| 147 | easiest and safest way to do this is to load all of the classes into |
| 148 | the VM and run through them. Anything that fails to load is simply not |
| 149 | verified or optimized. Unfortunately, this can cause allocation of some |
| 150 | resources that are difficult to release (e.g. loading of native shared |
| 151 | libraries), so we don't want to do it in the same virtual machine that |
| 152 | we're running applications in. |
| 153 | <p> |
| 154 | The solution is to invoke a program called <code>dexopt</code>, which |
| 155 | is really just a back door into the VM. It performs an abbreviated VM |
| 156 | initialization, loads zero or more DEX files from the bootstrap class |
| 157 | path, and then sets about verifying and optimizing whatever it can from |
| 158 | the target DEX. On completion, the process exits, freeing all resources. |
| 159 | <p> |
| 160 | It is possible for multiple VMs to want the same DEX file at the same |
| 161 | time. File locking is used to ensure that dexopt is only run once. |
| 162 | |
| 163 | |
| 164 | <h2>Verification</h2> |
| 165 | |
| 166 | <p> |
| 167 | The bytecode verification process involves scanning through the instructions |
| 168 | in every method in every class in a DEX file. The goal is to identify |
| 169 | illegal instruction sequences so that we don't have to check for them at |
| 170 | run time. Many of the computations involved are also necessary for "exact" |
| 171 | garbage collection. See |
| 172 | <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more |
| 173 | information. |
| 174 | <p> |
| 175 | For performance reasons, the optimizer (described in the next section) |
| 176 | assumes that the verifier has run successfully, and makes some potentially |
| 177 | unsafe assumptions. By default, Dalvik insists upon verifying all classes, |
| 178 | and only optimizes classes that have been verified. If you want to |
| 179 | disable the verifier, you can use command-line flags to do so. See also |
| 180 | <a href="embedded-vm-control.html"> Controlling the Embedded VM</a> |
| 181 | for instructions on controlling these |
| 182 | features within the Android application framework. |
| 183 | <p> |
| 184 | Reporting of verification failures is a tricky issue. For example, |
| 185 | calling a package-scope method on a class in a different package is |
| 186 | illegal and will be caught by the verifier. We don't necessarily want |
| 187 | to report it during verification though -- we actually want to throw |
| 188 | an exception when the method call is attempted. Checking the access |
| 189 | flags on every method call is expensive though. The |
| 190 | <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document |
| 191 | addresses this issue. |
| 192 | <p> |
| 193 | Classes that have been verified successfully have a flag set in the ODEX. |
| 194 | They will not be re-verified when loaded. The Linux access permissions |
| 195 | are expected to prevent tampering; if you can get around those, installing |
| 196 | faulty bytecode is far from the easiest line of attack. The ODEX file has |
| 197 | a 32-bit checksum, but that's chiefly present as a quick check for |
| 198 | corrupted data. |
| 199 | |
| 200 | |
| 201 | <h2>Optimization</h2> |
| 202 | |
| 203 | <p> |
| 204 | Virtual machine interpreters typically perform certain optimizations the |
| 205 | first time a piece of code is used. Constant pool references are replaced |
| 206 | with pointers to internal data structures, operations that always succeed |
| 207 | or always work a certain way are replaced with simpler forms. Some of |
| 208 | these require information only available at runtime, others can be inferred |
| 209 | statically when certain assumptions are made. |
| 210 | <p> |
| 211 | The Dalvik optimizer does the following: |
| 212 | <ul> |
| 213 | <li>For virtual method calls, replace the method index with a |
| 214 | vtable index. |
| 215 | <li>For instance field get/put, replace the field index with |
| 216 | a byte offset. Also, merge the boolean / byte / char / short |
| 217 | variants into a single 32-bit form (less code in the interpreter |
| 218 | means more room in the CPU I-cache). |
| 219 | <li>Replace a handful of high-volume calls, like String.length(), |
| 220 | with "inline" replacements. This skips the usual method call |
| 221 | overhead, directly switching from the interpreter to a native |
| 222 | implementation. |
| 223 | <li>Prune empty methods. The simplest example is |
| 224 | <code>Object.<init></code>, which does nothing, but must be |
| 225 | called whenever any object is allocated. The instruction is |
| 226 | replaced with a new version that acts as a no-op unless a debugger |
| 227 | is attached. |
| 228 | <li>Append pre-computed data. For example, the VM wants to have a |
| 229 | hash table for lookups on class name. Instead of computing this |
| 230 | when the DEX file is loaded, we can compute it now, saving heap |
| 231 | space and computation time in every VM where the DEX is loaded. |
| 232 | </ul> |
| 233 | |
| 234 | <p> |
| 235 | All of the instruction modifications involve replacing the opcode with |
| 236 | one not defined by the Dalvik specification. This allows us to freely |
| 237 | mix optimized and unoptimized instructions. The set of optimized |
| 238 | instructions, and their exact representation, is tied closely to the VM |
| 239 | version. |
| 240 | <p> |
| 241 | Most of the optimizations are obvious "wins". The use of raw indices |
| 242 | and offsets not only allows us to execute more quickly, we can also |
| 243 | skip the initial symbolic resolution. Pre-computation eats up |
| 244 | disk space, and so must be done in moderation. |
| 245 | <p> |
| 246 | There are a couple of potential sources of trouble with these |
| 247 | optimizations. First, vtable indices and byte offsets are subject to |
| 248 | change if the VM is updated. Second, if a superclass is in a different |
| 249 | DEX, and that other DEX is updated, we need to ensure that our optimized |
| 250 | indices and offsets are updated as well. A similar but more subtle |
| 251 | problem emerges when user-defined class loaders are employed: the class |
| 252 | we actually call may not be the one we expected to call. |
| 253 | <p>These problems are addressed with dependency lists and some limitations |
| 254 | on what can be optimized. |
| 255 | |
| 256 | |
| 257 | <h2>Dependencies and Limitations</h2> |
| 258 | |
| 259 | <p> |
| 260 | The optimized DEX file includes a list of dependencies on other DEX files, |
| 261 | plus the CRC-32 and modification date from the originating |
| 262 | <code>classes.dex</code> zip file entry. The dependency list includes the |
| 263 | full path to the <code>dalvik-cache</code> file, and the file's SHA-1 |
| 264 | signature. The timestamps of files on the device are unreliable and |
| 265 | not used. The dependency area also includes the VM version number. |
| 266 | <p> |
| 267 | An optimized DEX is dependent upon all of the DEX files in the bootstrap |
| 268 | class path. DEX files that are part of the bootstrap class path depend |
| 269 | upon the DEX files that appeared earlier. To ensure that nothing outside |
| 270 | the dependent DEX files is available, <code>dexopt</code> only loads the |
| 271 | bootstrap classes. References to classes in other DEX files fail, which |
| 272 | causes class loading and/or verification to fail, and classes with |
| 273 | external dependencies are simply not optimized. |
| 274 | <p> |
| 275 | This means that splitting code out into many separate DEX files has a |
| 276 | disadvantage: virtual method calls and instance field lookups between |
| 277 | non-boot DEX files can't be optimized. Because verification is pass/fail |
| 278 | with class granularity, no method in a class that has any reliance on |
| 279 | classes in external DEX files can be optimized. This may be a bit |
| 280 | heavy-handed, but it's the only way to guarantee that nothing breaks |
| 281 | when individual pieces are updated. |
| 282 | <p> |
| 283 | Another negative consequence: any change to a bootstrap DEX will result |
| 284 | in rejection of all optimized DEX files. This makes it hard to keep |
| 285 | system updates small. |
| 286 | <p> |
| 287 | Despite our caution, there is still a possibility that a class in a DEX |
| 288 | file loaded by a user-defined class loader could ask for a bootstrap class |
| 289 | (say, String) and be given a different class with the same name. If a |
| 290 | class in the DEX file being processed has the same name as a class in the |
| 291 | bootstrap DEX files, the class will be flagged as ambiguous and references |
| 292 | to it will not be resolved during verification / optimization. The class |
| 293 | linking code in the VM does additional checks to plug another hole; |
| 294 | see the verbose description in the VM sources for details (vm/oo/Class.c). |
| 295 | <p> |
| 296 | If one of the dependencies is updated, we need to re-verify and |
| 297 | re-optimize the DEX file. If we can do a just-in-time <code>dexopt</code> |
| 298 | invocation, this is easy. If we have to rely on the installer daemon, or |
| 299 | the DEX was shipped only in ODEX, then the VM has to reject the DEX. |
| 300 | <p> |
| 301 | The output of <code>dexopt</code> is byte-swapped and struct-aligned |
| 302 | for the host, and contains indices and offsets that are highly VM-specific |
| 303 | (both version-wise and platform-wise). For this reason it's tricky to |
| 304 | write a version of <code>dexopt</code> that runs on the desktop but |
| 305 | generates output suitable for a particular device. The safest way to |
| 306 | invoke it is on the target device, or on an emulator for that device. |
| 307 | |
| 308 | |
| 309 | <h2>Generated DEX</h2> |
| 310 | |
| 311 | <p> |
| 312 | Some languages and frameworks rely on the ability to generate bytecode |
| 313 | and execute it. The rather heavy <code>dexopt</code> verification and |
| 314 | optimization model doesn't work well with that. |
| 315 | <p> |
| 316 | We intend to support this in a future release, but the exact method is |
| 317 | to be determined. We may allow individual classes to be added or whole |
| 318 | DEX files; may allow Java bytecode or Dalvik bytecode in instructions; |
| 319 | may perform the usual set of optimizations, or use a separate interpreter |
| 320 | that performs on-first-use optimizations directly on the bytecode (which |
| 321 | won't be mapped read-only, since it's locally defined). |
| 322 | |
| 323 | <address>Copyright © 2008 The Android Open Source Project</address> |
| 324 | |
| 325 | </body> |
| 326 | </html> |