AMD’s Heterogeneous Computing with Trinity
It’s not all about just CPU or GPU performance, though—or at least
that’s what we’ve been hearing from various parties for a while now. The
real question is how a platform performs as a whole. There are some
tasks where pure CPU performance is what really matters, and there are
other tasks where the parallel nature of GPUs pays serious dividends.
AMD (and NVIDIA) has been pushing for more applications to make use of
the GPU for tasks where it can provide a lot of number crunching
prowess.
With Trinity, AMD provided us with a selection of applications that now
leverage—to varying degrees—AMD’s App Acceleration, OpenCL, OpenGL, or
other tools. For some of these applications, we don’t have any good way
of measuring performance across a wide selection of hardware, and for
some of those where benchmarks are possible I’ve run out of time to try
to put anything concrete together. I don’t want to skip this section
entirely, so what follows is a list of the applications, how they
benefit from heterogeneous compute, and some general impressions of the
application. We also have graphs for a few of the applications where
performance seemed to matter the most.
Adobe Flash 11.2—The latest version of Flash continues
to add GPU acceleration features, and now there are 3D hooks in
addition to the video offload acceleration we first saw with Flash 10.x.
There’s not too much of note here, as NVIDIA and Intel also support the
latest features of Flash 11.2. Flash works fine on Trinity, but the
same goes for Ivy Bridge and various NVIDIA GPUs. If you never saw the
Epic Citadel demo for iOS or Android, there’s now a
Flash-based version of the same demo
that will run in your browser. (Warning: that link can take 10-15
minutes on a decent connection to download all the textures and other
data!) Epic Citadel looks just as nice as it did on iOS, but now we need
some actual games to take advantage of the tools. Then perhaps we can
start looking into benchmarks of browser games or something….
Adobe Photoshop CS6—Photoshop started to take
advantage of GPU acceleration back with the CS4 release, using OpenGL to
improve performance on certain filters and features. With CS6, Adobe
has begun using OpenCL. Fundamentally, I’m not sure how big of a change
this represents, but there are quite a few functions in Photoshop that
are now supposed to be faster/better with an OpenCL compatible graphics
card. There are also two new features that leverage OpenCL; one is Iris
Blur, which allows you to mimic depth of field using Photoshop instead
of your camera, and the other is Liquify. Unfortunately, I’m by no means
a Photoshop expert, so I’m not sure how much the features really help
“power users”. I did try doing a benchmark of general Photoshop CS6
performance using the Photoshop Retouch benchmark with and without GPU
acceleration enabled; unfortunately, it looks like most of the filters
in that action script don’t benefit from the GPU acceleration, as the
scores I got were essentially unchanged with or without GPU/OpenCL
enabled. Overall, I’ll take the GPU acceleration, but for most of what I
do in Photoshop it doesn’t appear to benefit; if you’re interested, you
can
read more about AMD’s work with Adobe.
GNU Image Manipulation Program (GIMP)—Going along with
Photoshop CS6, AMD provided a special preview build of GIMP 2.8. GIMP
is sort of the poor man’s Photoshop, as it’s completely free. At
present, there are 19 filters that utilize OpenCL to speed of
processing, and over the coming months as the release version of GIMP
looks to take their new engine live there will undoubtedly be more
additions. For now, probably only five of the filters are things I would
use (e.g. noise reduction, maybe a light blur). I tested several of
these, and there is sometimes an order of magnitude speedup vs. doing
the work on just the CPU. The problem is that it also looks like GIMP
isn't incredibly well threaded in many of these tasks, putting multicore
CPUs at a disadvantage. My biggest complaint isn’t even about
performance, though; sadly, I just find the GIMP UI and general
performance to be really bad compared to Photoshop. I've tried several
times over the years to use GIMP instead of Photoshop, but I’ve never
felt comfortable with the tool. If on the other hand you prefer GIMP,
hopefully when the current GEGL menu gets integrated into the main
program you’ll realize a healthy performance boost.
ArcSoft MediaConverter 7.5—MediaConverter should be a familiar name by now if you’ve been
following our reviews,
as it’s one of the showcase titles for Intel’s Quick Sync transcoding.
When we reviewed Ivy Bridge last month, we found that on Llano at least
the version of MediaConverter we had ran slower on the GPU than on the
CPU; with Trinity on the other hand, enabling GPU acceleration results
in times that are about 60% faster than the CPU alone. That’s a good
performance increase, but we’re looking at 154 seconds on the CPU
compared to 98 seconds using the GPU. In contrast, dual-core Sandy
Bridge on CPU transcoding took 127 seconds and with Quick Sync it only
took 28 seconds—a 5X improvement. Quad-core Ivy Bridge was just as
impressive, going from 68 seconds on the CPU down to 16 seconds with
Quick Sync (4.25X). We’ve been hoping to see something more from AMD’s
new Video Codec Engine (VCE), first announced over six months ago with
HD 7970, but unless there’s substantial room for improvement it looks
like Intel’s Quick Sync will continue to be the fastest transcoding tool
for now.
CyberLink MediaEspresso 6.5—This tool is very similar
to MediaConverter, and the results are also better this time around. We
measured the assisted encode time at 74 seconds compared to 135 seconds
on the CPU alone. The 74 second transcode time actually makes Trinity
potentially faster than CPU-based transcoding on dual-core Sandy Bridge,
but again Quick Sync (25 seconds on SNB, 12 seconds on IVB) remains the
fastest way to transcode. Considering both of these tools are
apparently using VCE, I have to state that I’m disappointed; with VCE I
was expecting performance similar to what Intel is getting with Quick
Sync—four or five times faster than CPU-based encoding for the same APU.
That Trinity isn't quite twice as fast with VCE is unfortunate; even
though there's a decent improvement, Intel is in a completely different
category of performance. We’ll have to wait and see if anything more
develops with VCE.
WinZip 16.5—This final application is one that I can
see being very useful, assuming we see similar advancements in other
compression utilities. WinZip 16.5 now supports OpenCL to improve
compression times. We tested by compressing the entire Cinebench 11.5
directory with and without OpenCL enabled, and we also compared the
results with 7-Zip. On Trinity, performance improved by about 20%, which
is decent; Llano sees an even larger 28% improvement. Meanwhile, Sandy
Bridge using CPU-based compression is about as fast as Trinity with
OpenCL, and Ivy Bridge is still faster, but the 20% increase for “free”
is nothing to scoff at. Unfortunately for WinZip, 7-Zip compressed the
same directory to 95MB vs. 108MB in roughly the same time as the
non-OpenCL WinZip, and 7-Zip is completely free and doesn't nag you and
tell you to buy it. Where WinZip 16.5 is a good proof of concept, what
will really help AMD is if all the other compression utilities (7-Zip,
WinRAR, etc.) all start using OpenCL or other tools to improve
performance.
The majority of the applications continue to focus on video and image
manipulation, likely because those are areas where the parallel nature
of GPUs can be readily utilized. WinZip on the other hand is an
application showing other potential uses for GPGPU and heterogeneous
compute. We’d love to see even more adoption of OpenCL and similar
tools, but the stark reality is that coming up with new and useful ways
of doing this is difficult—if it were easy, everyone would do it! The
good news is that giving the creative people of the world more tools
with which to work can only help, and we’ll just have to wait and see
what else comes out.
There’s another interesting sidebar worth mentioning here. OpenCL is an
open standard, and the latest Intel drivers actually install an OpenCL
driver on Ivy Bridge and Sandy Bridge. Not surprisingly, not all
implementations are created equal, so even with Intel’s drivers we
couldn’t enable OpenCL in Photoshop or WinZip; GIMP on the other hand
apparently worked okay with OpenCL on Intel—we measured a 5X performance
improvement of the Noise Reduction filter with Ivy Bridge. Trinity also
came in slightly faster with both leveraging OpenCL, while Intel was
nearly twice as fast without.