SSE 4.1 SrcOver blits: color32, blitmask.
This is mainly warmup for an AVX2 version.
The machine I'm typing this on just doesn't support AVX2.
This strategy should translate easily down to SSSE3 and SSE2.
Xfermode_SrcOver: 2.73ms -> 2.62ms (0.96x) (That's Color32.)
Xfermode_SrcOver_aa: 3.48ms -> 3.09ms (0.89x) (That's BlitMask_D32_A8.)
AA text blits (text_16_AA_{88,FF,WT,BK}) show speedups in the range of 5 to 20%.
Unlike previous versions of this code, all the div255() are exactly (x+127)/255.
This won't fix any major bugs, but it does correct our bias in the middle.
There will be many diffs, all minor.
I've punted for now on pmaddubsw for lerping. I do intend to try that,
but I want this (relatively simple) code as my basis for comparison.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1526883004
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1526883004
1 file changed