Can DirectX-12 Give Mobile A Boot?
Microsoft announced DirectX 12 just a few days ago and for the first time Redmond’s API is relevant beyond the PC space. Some DirectX 12 tech will end up in phones and of course Windows tablets.
Qualcomm likes the idea, along with Nvidia. Qualcomm published an blog post on the potential impact of DirectX 12 on the mobile industry and the takeaway is very positive indeed.
DirectX 12 equals less overhead, more battery life
Qualcomm says it has worked closely with Microsoft to optimise “Windows mobile operating systems” and make the most of Adreno graphics. The chipmaker points out that current Snapdragon chipsets already support DirectX 9.3 and DirectX 11. However, the transition to DirectX 12 will make a huge difference.
“DirectX 12 will turbocharge gaming on Snapdragon enabled devices in many ways. Just a few years ago, our Snapdragon processors featured one CPU core, now most Snapdragon processors offer four. The new libraries and API’s in DirectX 12 make more efficient use of these multiple cores to deliver better performance,” Qualcomm said.
DirectX 12 will also allow the GPU to be used more efficiently, delivering superior performance per watt.
“That means games will look better and deliver longer gameplay longer on a single charge,” Qualcomm’s gaming and graphics director Jim Merrick added.
What about eye candy?
Any improvement in efficiency also tends to have a positive effect on overall quality. Developers can get more out of existing hardware, they will have more resources at their disposal, simple as that.
Qualcomm also points out that DirectX 12 is also the first version to launch on Microsoft’s mobile operating systems at the same time as its desktop and console counterparts.
The company believes this emphasizes the growing shift and consumer demand for mobile gaming. However, it will also make it easier to port desktop and console games to mobile platforms.
Of course, this does not mean that we’ll be able to play Titanfall on a Nokia Lumia, or that similarly demanding titles can be ported. However, it will speed up development and allow developers and publishers to recycle resources used in console and PC games. Since Windows Phone isn’t exactly the biggest mobile platform out there, this might be very helpful and it might attract more developers.
AMD, Intel & nVidia Go OpenGL
AMD, Intel and Nvidia teamed up to tout the advantages of the OpenGL multi-platform application programming interface (API) at this year’s Game Developers Conference (GDC).
Sharing a stage at the event in San Francisco, the three major chip designers explained how, with a little tuning, OpenGL can offer developers between seven and 15 times better performance as opposed to the more widely recognised increases of 1.3 times.
AMD manager of software development Graham Sellers, Intel graphics software engineer Tim Foley and Nvidia OpenGL engineer Cass Everitt and senior software engineer John McDonald presented their OpenGL techniques on real-world devices to demonstrate how these techniques are suitable for use across multiple platforms.
During the presentation, Intel’s Foley talked up three techniques that can help OpenGL increase performance and reduce driver overhead: persistent-mapped buffers for faster streaming of dynamic geometry, integrating Multidrawindirect (MDI) for faster submission of many draw calls, and packing 2D textures into arrays, so texture changes no longer break batches.
They also mentioned during their presentation that with proper implementations of these high-level OpenGL techniques, driver overhead could be reduced to almost zero. This is something that Nvidia’s software engineers have already claimed is impossible with Direct3D and only possible with OpenGL (see video below).
Nvidia’s VP of game content and technology, Ashu Rege, blogged his account of the GDC joint session on the Nvidia blog.
“The techniques presented apply to all major vendors and are suitable for use across multiple platforms,” Rege wrote.
“OpenGL can cut through the driver overhead that has been a frustrating reality for game developers since the beginning of the PC game industry. On desktop systems, driver overhead can decrease frame rate. On mobile devices, however, driver overhead is even more insidious, robbing both battery life and frame rate.”
The slides from the talk, entitled Approaching Zero Driver Overhead, are embedded below.
At the Game Developers Conference (GDC), Microsoft also unveiled the latest version of its graphics API, Directx 12, with Direct3D 12 for more efficient gaming.
Showing off the new Directx 12 API during a demo of Xbox One racing game Forza 5 running on a PC with an Nvidia Geforce Titan Black graphics card, Microsoft said Directx 12 gives applications the ability to directly manage resources to perform synchronisation. As a result, developers of advanced applications can control the GPU to develop games that run more efficiently.
nVidia Outs CUDA 6
Nvidia has made the latest GPU programming language CUDA 6 Release Candidate available for developers to download for free.
The release arrives with several new features and improvements to make parallel programming “better, faster and easier” for developers creating next generation scientific, engineering, enterprise and other applications.
Nvidia has aggressively promoted its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Available now, the CUDA 6 Release Candidate brings a major new update in unified memory access, which lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other.
“This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications,” Nvidia said in a blog post on Thursday.
There’s also the addition of “drop-in libraries”, which Nvidia said will accelerate applications by up to eight times.
“The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent,” the chip designer added.
Multi-GPU Scaling has also been added to the CUDA 6 programming language, introducing re-designed BLAS and FFT GPU libraries that automatically scale performance across up to eight GPUs in a single node. Nvidia said this provides over nine teraflops of double-precision performance per node, supporting larger workloads of up to 512GB in size, more than it’s supported before.
“In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides,” Nvidia said.
The previous CUDA 5.5 Release Candidate was issued last June, and added support for ARM based processors.
Aside from ARM support, Nvidia also improved Hyper-Q support in CUDA 5.5, which allowed developers to use MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.
Is AMD Worried?
AMD’s Mantle has been a hot topic for quite some time and despite its delayed birth, it has finally came delivered performance in Battlefield 4. Microsoft is not sleeping it has its own answer to Mantle that we mentioned here.
Oddly enough we heard some industry people calling it DirectX 12 or DirectX Next but it looks like Microsoft is getting ready to finally update the next generation DirectX. From what we heard the next generation DirectX will fix some of the driver overhead problems that were addressed by Mantle, which is a good thing for the whole industry and of course gamers.
AMD got back to us officially stating that “AMD would like you to know that it supports and celebrates a direction for game development that is aligned with AMD’s vision of lower-level, ‘closer to the metal’ graphics APIs for PC gaming. While industry experts expect this to take some time, developers can immediately leverage efficient API design using Mantle. “
AMD also told us that we can expect some information about this at the Game Developers Conference that starts on March 17th, or in less than two weeks from now.
We have a feeling that Microsoft is finally ready to talk about DirectX Next, DirectX 11.X, DirectX 12 or whatever they end up calling it, and we would not be surprised to see Nvidia 20nm Maxwell chips to support this API, as well as future GPUs from AMD, possibly again 20nm parts.
nVidia Pays Up
January 10, 2014 by admin
Filed under Around The Net
Comments Off on nVidia Pays Up
Nvidia has agreed to pay any Canadian who had the misfortune to buy a certain laptop computer made by Apple, Compaq, Dell, HP, or Sony between November 2005 and February 2010. Apparently these models contained a dodgy graphics card which was not fixed for five years.
Under a settlement approved by the court Nvidia will pay $1,900,000 into a fund for anyone who might have bought a faulty card. The Settlement Agreement provides partial cash reimbursement of the purchase price and you have to submit a claim by February 25, 2014. You will know if your Nvidia card was faulty because your machine would have a distorted or scrambled video, or no video on the screen even when the computer is on. There would be random characters, lines or garbled images – a bit like watching one of the Twilight series. There will be intermittent video issues or a failure to detect wireless adaptor or wireless networks.
The amount of compensation will be determined by the Claims Administrator who will apply a compensation grid and settlement administration guidelines. Cash compensation will also be provided for total loss of use based on the age of the computer; temporary loss of use having regard to the nature and duration of the loss of use; and reimbursement for out-of-pocket expenses caused by Qualifying Symptoms to an Affected Computer.
AMD’s Richland Shows Up
Kaveri is coming in a few months, but before it ships AMD will apparently spice up the Richland line-up with a few low-power parts.
CPU World has come across an interesting listing, which points to two new 45W chips, the A8-6500T and the A10-6700T. Both are quads with 4MB of cache. The A8-6500T is clocked at 2.1GHz and can hit 3.1GHz on Turbo, while the A10-6700T’s base clock is 2.5GHz and it maxes out at 3500MHz.
The prices are $108 and $155 for the A8 and A10 respectively, which doesn’t sound too bad although they are still significantly pricier than regular FM2 parts.
AMD’s Kaveri Coming In Q4
AMD really needs to make up its mind and figure out how it interprets its own roadmaps. A few weeks ago the company said desktop Kaveri parts should hit the channel in mid-February 2014. The original plan called for a launch in late 2013, but AMD insists the chip was not delayed.
Now though, it told Computerbase.de that the first desktop chips will indeed appear in late 2013 rather than 2014, while mobile chips will be showcased at CES 2014 and they will launch in late Q1 or early Q2 2014.
As we reported earlier, the first FM2+ boards are already showing up on the market, but at this point it’s hard to say when Kaveri desktop APUs will actually be available. The most logical explanation is that they will be announced sometime in Q4, with retail availability coming some two months later.
Kaveri is a much bigger deal than Richland, which was basically Trinity done right. Kaveri is based on new Steamroller cores, it packs GCN graphics and it’s a 28nm part. It is expected to deliver a significant IPC boost over Piledriver-based chips, but we don’t have any exact numbers to report.
nVidia Launching New Cards
We weren’t expecting this and it is just a rumour, but reports are emerging that Nvidia is readying two new cards for the winter season. AMD of course is launching new cards four weeks from now, so it is possible that Nvidia would try to counter it.
The big question is with what?
VideoCardz claims one of the cards is an Ultra, possibly the GTX Titan Ultra, while the second one is a dual-GPU job, the Geforce GTX 790. The Ultra is supposedly GK110 based, but it has 2880 unlocked CUDA cores, which is a bit more than the 2688 on the Titan.
The GTX 790 is said to feature two GK110 GPUs, but Nvidia will probably have to clip their wings to get a reasonable TDP.
We’re not entirely sure this is legit. It is plausible, but that doesn’t make it true. It would be good for Nvidia’s image, especially if the revamped GK110 products manage to steal the performance crown from AMD’s new Radeons. However, with such specs, they would end up quite pricey and Nvidia wouldn’t sell that many of them – most enthusiasts would probably be better off waiting for Maxwell.
nVidia’s CUDA 5.5 Available
Nvidia has made its CUDA 5.5 release candidate supporting ARM based processors available for download.
Nvidia has been aggressively pushing its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Now the firm has announced the availability of a CUDA 5.5 release candidate, the first version of the language that supports ARM based processors.
Aside from ARM support, Nvidia has improved supported Hyper-Q support and now allows developers to have MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.
Ian Buck, GM of GPU Computing Software at Nvidia said, “Since developers started using CUDA in 2006, successive generations of better, exponentially faster CUDA GPUs have dramatically boosted the performance of applications on x86-based systems. With support for ARM, the new CUDA release gives developers tremendous flexibility to quickly and easily add GPU acceleration to applications on the broadest range of next-generation HPC platforms.”
Nvidia’s support for ARM processors in CUDA 5.5 is an indication that it will release CUDA enabled Tegra processors in the near future. However outside of the firm’s own Tegra processors, CUDA support is largely useless, as almost all other chip designers have chosen OpenCL as the programming language for their GPUs.
Nvidia did not say when it will release CUDA 5.5, but in the meantime the firm’s release candidate supports Windows, Mac OS X and just about every major Linux distribution.
Are CUDA Applications Limited?
Acceleware said at Nvidia’s GPU Technology Conference (GTC) today that most algorithms that run on GPGPUs are bound by GPU memory size.
Acceleware is partly funded by Nvidia to provide developer training for CUDA to help sell the language to those that are used to traditional C and C++ programming. The firm said that most CUDA algorithms are now limited by GPU local memory size rather than GPU computational performance.
Both AMD and Nvidia provide general purpose GPU (GPGPU) accelerator parts that provide significantly faster computational processing than traditional CPUs, however they have only between 6GB and 8GB of local memory that constrains the size of the dataset the GPU can process. While developers can push more data from system main memory, the latency cost negates the raw performance benefit of the GPU.
Kelly Goss, training program manager at Acceleware, said that “most algorithms are memory bound rather than GPU bound” and “maximising memory usage is key” to optimising GPGPU performance.
She further said that developers need to understand and take advantage of the memory hierarchy of Nvidia’s Kepler GPU and look at ways of reducing the number of memory accesses for every line of GPU computing.
The point Goss was making is that GPU computing is relatively cheap in terms of clock cycles relative to the time it takes to fetch data from local memory, let alone loading GPU memory from system main memory.
Goss, talking to a room full of developers, proceeded to outline some of the performance characteristics of the memory hierarchy in Nvidia’s Kepler GPU architecture, showing the level of detail that CUDA programmers need to pay attention to if they want to extract the full performance potential from Nvidia’s GPGPU computing architecture.
Given Goss’s observation that algorithms running on Nvidia’s GPGPUs are often constrained by local memory size rather than by the GPU itself, the firm might want to look at simplifying the tiers of memory involved and increasing the amount of GPU local memory so that CUDA software developers can process larger datasets.