nVidia Speaks On Performance Issue
Nvidia has said that most of the outlandish performance increase figures touted by GPGPU vendors was down to poor original code rather than sheer brute force computing power provided by GPUs.
Both AMD and Nvidia have been using real-world code examples and projects to promote the performance of their respective GPGPU accelerators for years, but now it seems some of the eye popping figures including speed ups of 100x or 200x were not down to just the computing power of GPGPUs. Sumit Gupta, GM of Nvidia’s Tesla business said that such figures were generally down to starting with unoptimized CPU code.
During Intel’s Xeon Phi pre-launch press conference call, the firm cast doubt on some of the orders of magnitude speed up claims that had been bandied about for years. Now Gupta told The INQUIRER that while those large speed ups did happen, it was possible because of poorly optimized code to begin with, thus the bar was set very low.
Gupta said, “Most of the time when you saw the 100x, 200x and larger numbers those came from universities. Nvidia may have taken university work and shown it and it has an 100x on it, but really most of those gains came from academic work. Typically we find when you investigate why someone got 100x [speed up] is because they didn’t have good CPU code to begin with. When you investigate why they didn’t have good CPU code you find that typically they are domain scientist’s not computer science guys – biologists, chemists, physics – and they wrote some C code and it wasn’t good on the CPU. It turns out most of those people find it easier to code in CUDA C or CUDA Fortran than they do to use MPI or Pthreads to go to multi-core CPUs, so CUDA programming for a GPU is easier than multi-core CPU programming.”
ARM Seeing Growth
ARM and Vivante have achieved significant market share gains in the system-on-chip (SoC) GPU market while Imagination and Qualcomm have seen their market shares fall.
ARM has been aggressively pushing its Mali GPU design for the last two years, while Vivante has ridden the surge in Chinese tablet sales, and these factors have resulted in both firms increasing market shares. Analyst outfit Jon Peddie Research claimed that ARM and Vivante scored first half 2012 SoC GPU market shares of 12.9 percent and 9.8 percent, respectively, while the SoC GPU market share leaders Imagination and Qualcomm both suffered declines.
ARM more than doubled its market share from the same period a year ago while Vivante went even better by almost quadrupling its market share. Not only were both firms claiming large pieces of the pie, Jon Peddie Research claimed the SoC GPU market had increased by 91.3 percent, suggesting that Qualcomm and Imagination are having a harder time getting new business. Jon Peddie told The INQUIRER that new vendors are entering the market, typically with lower prices to earn customers.
Nvidia’s SoC GPU operations accounted for 2.5 percent of the total smartphone and tablet market, which given that the firm doesn’t license out its GPU designs is pretty impressive. Nvidia could see its market share increase if Microsoft’s Surface tablet sells well.
Will ARM Get OpenCL Certification?
ARM has submitted its Mali-T604 GPU for OpenCL certification.
ARM’s Mali GPUs have so far shyed away from GPGPU support, however as smartphones and tablets are not expected to see an ever growing number of processor cores the cries for OpenCL support in its GPUs have been growing louder. Now ARM has submitted its Mali-T604 GPU to the Khronos consortium for full profile OpenCL certification.
The Khronos consortium oversees the development of OpenCL and the high-level language is supported by a number of firms including AMD, Nvidia and Intel on their latest GPUs. However until now there hasn’t been an OpenCL certified GPU that is used in smartphones, though firms such as Zii Labs also boast OpenCL support for their chips.
ARM said, “Building on a scalable multicore, multi-pipeline architecture design, the Mali-T600 Series GPU includes a number of advanced features. In particular, native scalar and vector operations for OpenCL’s integer and floating point data types (including 64-bit); support for static and dynamic compilation; hardware accelerated image and sampler data types; fast atomic operations and compliance to IEEE754-2008 precision requirements.