[AMDGPU] Do not only rely on BB number when finding bottom loop

We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing.

Differential Revision: https://reviews.llvm.org/D43831

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330337 91177308-0d34-0410-b5e6-96231b3b80d8
2 files changed