commit | da7ddd8477dc802c8736c7ab860fc09f33689ce9 | [log] [tgz] |
---|---|---|
author | Tobias Grosser <grosser@google.com> | Thu Jul 11 17:58:47 2013 -0700 |
committer | Tobias Grosser <grosser@google.com> | Mon Jul 15 14:07:20 2013 -0700 |
tree | aa2adccfb8659aeef55ae83ef2f1164699c4cb30 | |
parent | 574854bcb2eb25a85b9b52faf2fb3e743fa7aa14 [diff] |
Simplify code of convolve3x3 Instead of first doing all multiplications and then adding the results in a tree manner, we just repetitively perform a load/multiply/add patter. With and without tuning for A15, this yields a 5% performance increase for N10. This commit also exposes more instructions to be transformed into fused multiply adds. Change-Id: I1215d75da236e6b2d6b6aa48b3ab35606cdba7b8