Merge changes I2ce3479d,Ibb56664d

* changes:
  more optimizations...
  refactor code to improve neon code