optimizing: Build HConstructorFence for HNewArray/HNewInstance nodes

Also fixes:
* LSE, code_sinking to keep optimizing new-instance if it did so before
* Various tests to expect constructor fences after new-instance

Sidenote: new-instance String does not get a ConstructorFence; the
special StringFactory calls are assumed to be self-fencing.

Metric changes on go/lem:
* CodeSize -0.262% in ART-Compile (ARMv8)
* RunTime -0.747% for all (linux-armv8)

(No changes expected to x86, constructor fences are no-op).

The RunTime regression is temporary until art_quick_alloc_* entrypoints have their
DMBs removed in a follow up CL.

Test: art/test.py
Bug: 36656456
Change-Id: I6a936a6e51c623e1c6b5b22eee5c3c72bebbed35
diff --git a/compiler/optimizing/instruction_builder.h b/compiler/optimizing/instruction_builder.h
index 2fb5c7b..e968760 100644
--- a/compiler/optimizing/instruction_builder.h
+++ b/compiler/optimizing/instruction_builder.h
@@ -194,12 +194,12 @@
                               uint32_t register_index);
 
   // Builds a new array node and the instructions that fill it.
-  void BuildFilledNewArray(uint32_t dex_pc,
-                           dex::TypeIndex type_index,
-                           uint32_t number_of_vreg_arguments,
-                           bool is_range,
-                           uint32_t* args,
-                           uint32_t register_index);
+  HNewArray* BuildFilledNewArray(uint32_t dex_pc,
+                                 dex::TypeIndex type_index,
+                                 uint32_t number_of_vreg_arguments,
+                                 bool is_range,
+                                 uint32_t* args,
+                                 uint32_t register_index);
 
   void BuildFillArrayData(const Instruction& instruction, uint32_t dex_pc);
 
@@ -288,7 +288,11 @@
       REQUIRES_SHARED(Locks::mutator_lock_);
 
   // Build a HNewInstance instruction.
-  bool BuildNewInstance(dex::TypeIndex type_index, uint32_t dex_pc);
+  HNewInstance* BuildNewInstance(dex::TypeIndex type_index, uint32_t dex_pc);
+
+  // Build a HConstructorFence for HNewInstance and HNewArray instructions. This ensures the
+  // happens-before ordering for default-initialization of the object referred to by new_instance.
+  void BuildConstructorFenceForAllocation(HInstruction* allocation);
 
   // Return whether the compiler can assume `cls` is initialized.
   bool IsInitialized(Handle<mirror::Class> cls) const