Numerous fixes to enable PromoteRegs, though it's still broken.

- Fixed ThrowNullPointerFromCode launchpad to load the array length
  directly into the necessary arg reg without clobbering the array
  pointer, since that value may be live afterwards.

- genArrayPut use a temporary reg for bytes if the source reg is >= 4,
  since x86 can't express this.

- Fixed the order that core regs are spilled and unspilled.

- Correctly emit instructions when base == rBP and disp == 0.

- Added checks to the compiler to ensure that byte opcodes aren't used
  on registers that can't be byte accessed.

- Fixed generation of a number of ops which use byte opcodes, including
  floating point comparison, int-to-byte, and and-int/lit16.

- Added rBP, rSI, and rDI to spill registers for the x86 jni compiler.

- Various fixes and additions to the x86 disassembler.

Change-Id: I365fe7dec5cc64d181248fd58e90789f100b45e7
diff --git a/src/compiler/codegen/x86/ArchFactory.cc b/src/compiler/codegen/x86/ArchFactory.cc
index 1620044..001a93d 100644
--- a/src/compiler/codegen/x86/ArchFactory.cc
+++ b/src/compiler/codegen/x86/ArchFactory.cc
@@ -128,11 +128,11 @@
   }
   // Spill mask not including fake return address register
   uint32_t mask = cUnit->coreSpillMask & ~(1 << rRET);
-  int offset = cUnit->frameSize - 4;
+  int offset = cUnit->frameSize - (4 * cUnit->numCoreSpills);
   for (int reg = 0; mask; mask >>= 1, reg++) {
     if (mask & 0x1) {
-      offset -= 4;
       storeWordDisp(cUnit, rSP, offset, reg);
+      offset += 4;
     }
   }
 }
@@ -143,11 +143,11 @@
   }
   // Spill mask not including fake return address register
   uint32_t mask = cUnit->coreSpillMask & ~(1 << rRET);
-  int offset = cUnit->frameSize - 4;
+  int offset = cUnit->frameSize - (4 * cUnit->numCoreSpills);
   for (int reg = 0; mask; mask >>= 1, reg++) {
     if (mask & 0x1) {
-      offset -= 4;
       loadWordDisp(cUnit, rSP, offset, reg);
+      offset += 4;
     }
   }
 }