Port the more accurate (and slightly faster) floating point IDCT implementation from jpeg-8a and later. New research revealed that the SSE/SSE2 floating point IDCT implementation was actually more accurate than the jpeg-6b implementation, not less, which is why its mathematical results have always differed from those of the jpeg-6b implementation. This patch brings the accuracy of the C code in line with that of the SSE/SSE2 code.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1288 632fc199-4ca6-4c93-a231-07263d6284db