[optimizing] Improve x86 parallel moves/swaps

Add a new constructor to ScratchRegisterScope that will supply a
register if there is a free one, but not spill to force one.  Use this
to generated alternate code that doesn't use a temporary, as the
spill/restore of a register generates extra instructions that aren't
necessary on x86.

Here is the benefit for a 32 bit memory-to-memory exchange with no
free registers:
<        50    	       push eax
<        53    	       push ebx
<  8B44244C    	       mov eax, [esp + 76]
<  8B5C246C    	       mov ebx, [esp + 108]
<  8944246C    	       mov [esp + 108], eax
<  895C244C    	       mov [esp + 76], ebx
<        5B    	       pop ebx
<        58    	       pop eax
---
>  FF742444    	       push [esp + 68]
>  FF742468    	       push [esp + 104]
>  8F44244C    	       pop [esp + 72]
>  8F442468    	       pop [esp + 100]

Avoid using xchg instruction, as it is slow on smaller processors.

Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
diff --git a/compiler/optimizing/code_generator_x86.h b/compiler/optimizing/code_generator_x86.h
index e6e7fb7..b2420e4 100644
--- a/compiler/optimizing/code_generator_x86.h
+++ b/compiler/optimizing/code_generator_x86.h
@@ -106,6 +106,7 @@
   X86Assembler* GetAssembler() const;
 
  private:
+  void Exchange(Register reg1, Register Reg2);
   void Exchange(Register reg, int mem);
   void Exchange(int mem1, int mem2);
   void Exchange32(XmmRegister reg, int mem);