Add mulhi back to SkVx
This is a conjectured fix for the perf regressions in
https://perf.skia.org/e/?begin=1653081848&end=1653327714&keys=Xeba7495751e68faa6011e93cee1ca5d7&xbaroffset=60700
When moving from SkNx to SkVx, I removed the mulHi operation that used
the _mm_mulhi_epu16 intrinsic on SSE because Godbolt showed that clang
could detect mull(x,y)>>16 and turn it into a mulhi.
However, the regression on perf is limited to our AVX2 bots, and
further testing on Godbolt shows clang decides to use an 256-bit avx2
instruction instead of mulhi_epu16 under that situation, so we may
hit a mixing penalty under that scenario. I saw chromium regressions
too, so they may also be operating with AVX2-enabled builds.
Unfortunately, I could not repro the perf regression locally by
default or forcing avx2. nanobench results were very noisy. Since
it's not too much new code, I'm proposing landing this and then
monitoring perf and chromium's alerts to see if they improve.
I also cleaned up the definitions of mull to use one definition and
constexprs like the other functions in SkVx.
Bug: skia:13346
Change-Id: Ic54bdeb361bfd8dc259bf12634b3b33bb70d4b60
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/561222
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
2 files changed