use fourth arg in store128
store128() has been lowered into SkVM Ops strangely (two interlocking
64-bit stores) only because of SkVM's limit of three arguments per Op.
With four arguments we can lower store128() in a straightforward way.
Perhaps surprisingly, I've left the implementations of store128 fairly
naive, with narrower stores than having all this data together in one
place allows. I do want to follow up here, but not so much because the
speed of store128 is important, rather more so because getting the tools
in place for idiomatic store128 implementations will lead us down a path
with great knock-on effects for more interesting features.
We'll need four adjacent temporary registers to use the ARM-idiomatic
st2.4s/st4.4s approaches for store64/store128, and the idiomatic x86
implementations need multiple temporary registers too. Once we're able
to manage multiple adjacent registers as a unit, we'll be able to
stretch the idea to things like load64/load128 returning 2 or 4
registers worth of data from a single Op. And the ultimate goal is in
Half-is-fp16 mode, where we'll be able to fill one register with 16-bit
float/int/mask data and spread any 32-bit data across a register pair.
Change-Id: Ieb20d8b7d00e9d806cb27fd30ebfd50ae9317da7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/355936
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
3 files changed