[NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.

The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330296 91177308-0d34-0410-b5e6-96231b3b80d8
6 files changed