Optimization fixes

Two primary fixes.  First, the save/restore mechanism for FP callee saves
was broken if there were any holes in the save mask (the Arm ld/store
multiple instructions for floating point use a start + count mechanism,
rather than the bit-mask mechanism used for core registers).

The second fix corrects a problem introduced by the recent enhancements
to loading floating point literals.  The load->copy optimization mechanism
for literal loads used the value of the loaded literal to identify
redundant loads.  However, it used only the first 32 bits of the
literal - which worked fine previously because 64-bit literal loads
were treated as a pair of 32-bit loads.  The fix was to use the
label of the literal rather than the value in the aliasInfo - which
works for all sizes.

Change-Id: Ic4779adf73b2c7d80059a988b0ecdef39921a81f
diff --git a/src/compiler/Ralloc.cc b/src/compiler/Ralloc.cc
index e7844b6..aaf9b97 100644
--- a/src/compiler/Ralloc.cc
+++ b/src/compiler/Ralloc.cc
@@ -233,7 +233,7 @@
 
     cUnit->coreSpillMask = 0;
     cUnit->fpSpillMask = 0;
-    cUnit->numSpills = 0;
+    cUnit->numCoreSpills = 0;
 
     oatDoPromotion(cUnit);
 
@@ -247,9 +247,10 @@
     cUnit->numRegs = cUnit->method->NumRegisters() - cUnit->numIns;
     cUnit->numOuts = cUnit->method->NumOuts();
     cUnit->numPadding = (STACK_ALIGN_WORDS -
-        (cUnit->numSpills + cUnit->numRegs +
+        (cUnit->numCoreSpills + cUnit->numFPSpills + cUnit->numRegs +
          cUnit->numOuts + 2)) & (STACK_ALIGN_WORDS-1);
-    cUnit->frameSize = (cUnit->numSpills + cUnit->numRegs + cUnit->numOuts +
+    cUnit->frameSize = (cUnit->numCoreSpills + cUnit->numFPSpills +
+                        cUnit->numRegs + cUnit->numOuts +
                         cUnit->numPadding + 2) * 4;
     cUnit->insOffset = cUnit->frameSize + 4;
     cUnit->regsOffset = (cUnit->numOuts + cUnit->numPadding + 1) * 4;