Use nmad primitive in inverse and transcendental functions.

This should generate smaller (and faster!) code by getting rid of
explicit neg instructions. This is particularly effective on ARM.
On Intel it looks like the compiler was already able to
replace `vfmadd231ps` with `vfmsub231ps` in some places, which
would capture a very similar code-size savings, but ARM does not
seem to attempt anything similar.

Change-Id: I35253de4b159ae991ade610b28727d9ee6f21a1f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/818516
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: John Stiles <johnstiles@google.com>
Auto-Submit: John Stiles <johnstiles@google.com>
1 file changed