Diff - 7b3e4f99b25c31048a33a08688557b133ad345ab^! - platform_art

commit	7b3e4f99b25c31048a33a08688557b133ad345ab	[log] [tgz]
author	Mark Mendell <mark.p.mendell@intel.com>	Thu Nov 19 14:08:40 2015 -0500
committer	Mark Mendell <mark.p.mendell@intel.com>	Tue Dec 15 15:48:39 2015 -0500
tree	446ce2d9b4684120c35fad9c097ea2f760f0797c
parent	089ff4886aa9b5e7cec04d2ef5cdeb9d68e5dc43 [diff] [blame]

X86: Use locked add rather than mfence

Java semantics for memory ordering can be satisfied using
  lock addl $0,0(SP)
rather than mfence.  The locked add synchronizes the memory caches, but
doesn't affect device memory.

Timing on a micro benchmark with a mfence or lock add $0,0(sp) in a loop
with 600000000 iterations:
time ./mfence
real    0m5.411s
user    0m5.408s
sys     0m0.000s

time ./locked_add
real    0m3.552s
user    0m3.550s
sys     0m0.000s

Implement this as an instruction-set-feature lock_add.  This is off by
default (uses mfence), and enabled for atom & silvermont variants.
Generation of mfence can be forced by a parameter to MemoryFence.

Change-Id: I5cb4fded61f4cbbd7b7db42a1b6902e43e458911
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

diff --git a/compiler/optimizing/intrinsics_x86_64.cc b/compiler/optimizing/intrinsics_x86_64.cc
index ac9b245..d519034 100644
--- a/compiler/optimizing/intrinsics_x86_64.cc
+++ b/compiler/optimizing/intrinsics_x86_64.cc

@@ -2059,7 +2059,7 @@
   }
 
   if (is_volatile) {
-    __ mfence();
+    codegen->MemoryFence();
   }
 
   if (type == Primitive::kPrimNot) {