Are CUDA Applications Limited?
Acceleware said at Nvidia’s GPU Technology Conference (GTC) today that most algorithms that run on GPGPUs are bound by GPU memory size.
Acceleware is partly funded by Nvidia to provide developer training for CUDA to help sell the language to those that are used to traditional C and C++ programming. The firm said that most CUDA algorithms are now limited by GPU local memory size rather than GPU computational performance.
Both AMD and Nvidia provide general purpose GPU (GPGPU) accelerator parts that provide significantly faster computational processing than traditional CPUs, however they have only between 6GB and 8GB of local memory that constrains the size of the dataset the GPU can process. While developers can push more data from system main memory, the latency cost negates the raw performance benefit of the GPU.
Kelly Goss, training program manager at Acceleware, said that “most algorithms are memory bound rather than GPU bound” and “maximising memory usage is key” to optimising GPGPU performance.
She further said that developers need to understand and take advantage of the memory hierarchy of Nvidia’s Kepler GPU and look at ways of reducing the number of memory accesses for every line of GPU computing.
The point Goss was making is that GPU computing is relatively cheap in terms of clock cycles relative to the time it takes to fetch data from local memory, let alone loading GPU memory from system main memory.
Goss, talking to a room full of developers, proceeded to outline some of the performance characteristics of the memory hierarchy in Nvidia’s Kepler GPU architecture, showing the level of detail that CUDA programmers need to pay attention to if they want to extract the full performance potential from Nvidia’s GPGPU computing architecture.
Given Goss’s observation that algorithms running on Nvidia’s GPGPUs are often constrained by local memory size rather than by the GPU itself, the firm might want to look at simplifying the tiers of memory involved and increasing the amount of GPU local memory so that CUDA software developers can process larger datasets.
AMD’s Richland Coming In June
Richland is set to replace AMD’s Virgo platform, powered by Trinity processors, and this change will happen in June 2013, most likely coinciding with Computex 2013.
AMD has just launched the first batch of Richland mobile APUs and we still have to see some notebook designs hitting the market. We wrote about mobile Richland APUs.
As of late last year Desktop Richland was always set to launch in June 2013 and the fastest of them is the A10 6800K, clocked at 4.1GHz and 4.4 with Turbo. It also features Radeon HD 8670D graphics that run at 844 MHz. This is the fastest Richland part and it comes unlocked, ready to replace the current AMD A10 5800K. In Europe, the A10 5800K currently sells for 112, while in US the same CPU sells for $129.00 (boxed).
The alpha dog A10 6800K is followed by A10 6700, A8 6600K (Unlocked) and A8 6500. AMD has a mix of 100W and 65W quad-core Richland desktop SKUs. There will be a single A6 6400K (Unlocked) SKU and the A4 6300, both dual-cores with 65W TDP.
Production ready samples were churned out in late January, while volume production is scheduled for late March 2013. The announcement was always scheduled for June 2013 and Richland last through most of 2013, until Kaveri with 28nm Steamroller comes on line.
Intel’s Pentium Getting Updated
Intel is going to update its desktop Pentium family with several slightly faster Ivy Bridge-based processors.
According to CPU World the chips should hit the shops in the second quarter of 2013 which is a quarter after January’s refresh of budget desktop families, and one quarter before the launch of Haswell. The new chips have the original titles of Pentium G2030, G2030T, G2120T and G2140. They will have two cores, but lack Hyper-Threading technology, and can run two threads before getting all confused.
Both the G2000 and G2100 series CPUs support only basic features, like Intel 64 and Virtualization. They do integrate HD graphics which are clocked at 650 MHz and dual-channel memory controller, that supports DDR3-1333 on the G2030 and the G2030T, and up to DDR3-1600 on the G2120T and the G2140.
Pentium G2030T and G2120T are low-power models, replacing G2020T and G2100T but are clocked 100 MHz higher, that is at 2.6GHz and 2.7GHz respectively. However they still fit into 35 Watt thermal envelope. Pentium G2030 and G2140 mainstream microprocessors will be faster than “T” SKUs, and they will have 57 per cent higher TDP. Intel expects these to replace the G2020 and G2130 SKUs. The G2030 will run at 3 GHz. The G2140 will operate at 3.3 GHz. No word on prices yet.
Ivy Bridge E Delayed Until Fall
y Bridge E, Intel’s ultra-high end chip that is set to replace the Core i7 3970X, has been delayed. It doesn’t look like it was anything major. Our sources tell us that the decision was made by Intel server guys who did not want to launch this chip in Q3 as originally indented.
Since Q3 starts in July, a relatively slow month for IT, the normal time to launch products is late August or September, but at this time there is no confirmation that this will happen at this time.
Sandy Bridge E, or Core i7 3960X, was launched in Q4 2011, November 14th to be precise. This can give you a clue on when to expect the successor.
Originally Ivy Bridge E was supposed to launch in Q3, one quarter after the launch of quad-core Desktop Haswell processors. Ivy Bridge E works in X79 motherboards but we do expect that a few key motherboard vendors will have their newer versions ready for the launch of the new $999 flagship processor.
If Intel continues at this pace, it will take quite a while before we see Haswell E in action.
AMD Goes Richland
There have been more than enough leaks dealing with Richland, AMD’s successor to the Trinity powered Virgo platform, and we even had a chance to see some leaks regarding its successor, codenamed Kaveri. As you may already know, Richland is planned to last through 2013 and it is clear that this is very important chip for AMD.
Based on the Piledriver architecture and built using 32nm technology, Richland will feature an integrated GPU that will be upgraded to Radeon HD 8000 series, a generation ahead of Trinity. As you know, there has been a lot of leaks regarding the Richland parts and the quad-core A10-6800K with Radeon HD 8670D graphics is expected to pack quite a punch. Best of all, Richland will still use the same FM2 socket.
According to our sources, the NDA will be lifted on 12th of March, 8am EST, and we are sure that we will see at least a couple of reviews as well as some additional info regarding the price and the availability date.
Will Intel’s Haswell Debut With Bugs?
According to a report over at Hardware.info that managed to get their hands on an internal Intel document, it appears that Intel’s Haswell platform might have a problem with its USB 3.0 host controller.
Although it is not as serious as the Cougar Point SATA 3Gbps bug, the USB 3.0 controller on Haswell platform will have issues with the S3 sleep mode and devices that are connected via USB 3.0 port. Apparently, when waking from S3 sleep, applications that are accessing the data from, for example, USB 3.0 storage device might freeze and force the user to reopen them manually.
Thankfully, the bug will be more of a nuisance rather than a problem as any loss of data is excluded. Intel does not plan to delay the launch and it is still scheduled for mid-2013, according to an Intel representative comment for Hardware.info. Intel is apparently still researching what other consequences this issue could possibly have and plans to resolve the problem in a future CPU stepping.
Intel Takes A Shot At ARM
ARM chips practically rule the mobile chip market, but Intel is trying to carve out a foothold with its new x86 chips, with relatively little success.
Intel claims its parts can outperform ARM chips in benchmarks and its manufacturing process lead should help it deliver faster and smaller chips. However, in spite of Intel’s claims, few vendors seem interested in its mobile chips.
Speaking to CNN, Intel mobile chief Mike Bell stressed that Intel has the software and systems competence to be the most successful player on the market. He pointed out that Intel can develop software to get the most out of its hardware and that Intel single core chips outperform multicore ARM designs.
“It’s a question of whether you’d rather have a jet engine or two propellers,” said Bell.
Granted, Bell has to tout the company line, but his engine comparison works both ways. Crop dusters and ultralight planes don’t need jet engines, or two piston engines for that matter. That is what really matters and Intel knows it. Not everyone needs a turbojet or turbofan, and not everyone needs an Intel core, especially not in mid- to low-end devices.
Intel believes its next generation 22nm mobile parts, with integrated LTE, will allow it to score some tablet and smartphone partners in late 2013 or 2014. However, Intel will have nothing to take on new A15 class ARM chips this year.
Quantum Computing Making Strides
Researchers at the University of Innsbruck in Austria have managed to transfer quantum information from an atom to a photon, which is being seen as a breakthrough in the making of quantum computers.
According to Humans Invent the breakthrough allows quantum computers to exchange data at the speed of light along optical fibres. Lead researcher on the project Tracy Northup said that the method allows the mapping of quantum information faithfully from an ion onto a photon.
Northup’s team used an “ion trap” to produce a single photon from a trapped calcium ion with its quantum state intact using mirrors and lasers. No potential cats were injured in the experiment. The move enables boffins to start to play with thousands of quantum bits rather than just a dozen or so. This means that they can get a computer to do specific tasks like factoring large numbers or a database search, faster.
AMD Debuts R5000
AMD has released its Firepro R5000 graphics card that has video over IP capabilities.
AMD typically promotes its workstation class Firepro cards using CAD/CAM software, however this time the company is relying on remote viewing as the big selling point for its latest workstation graphics card. AMD’s Firepro R5000 has a GPU that uses its Graphics Core Next (GCN) architecture and Teradici PC video over IP technology to send graphics output over the network.
AMD used its Pitcarin GPU coupled to 2GB of GDDR5 memory in the Firepro R5000. However it isn’t AMD’s GPU that is the big selling point of the Firepro R5000 but rather Teradici’s Tera2240 chip that encrypts display output before sending it out on the network, while supporting up to 60fps (frames per second).
AMD’s Firepro R5000 is intended to be used in render farms, with each final image being sent over an IP network to the end host, and the firm claims that the technology can be used in education, financial and media environments.
The Firepro R5000 is a single slot graphics card that has two mini Displayport outputs that can drive two 2500×1600 displays, however it can also drive a further four remote displays at 1920×1200 resolution by sending data over its RJ45 Ethernet port.
Both AMD and Teradici talked up the low configuration overheads of the Firepro R5000.
Is Non-Volatile Memory The Next Craze?
A report from analysts Yole Developpement claims that MRAM/STTMRAM and PCM will lead the Emerging Non-Volatile Memory (ENVM) market and earn a combined $1.6bn by 2018. If the North Koreans have not conquered America, by 2018 then MRAM/STTMRAM and PCM will surely be the top two ENVM on the market.
Yole’s Yann de Charentenay said that their combined sales will almost double each year, with double-density chips launched every two years. So far we have only had FRAM, PCM and MRAM to play with and they were available in low-density chips to only a few players. The market was quite limited and considerably smaller than the DRAM and flash markets which had combined revenues of $50bn+ in 2012, the report said. In the next five years the scalability and chip density of those memories will be greatly improved and will spark many new applications, says the report.
ENVM will greatly improve the input/output performance of enterprise storage systems whose requirements will intensify with the growing need for web-based data supported by cloud servers, the report said. Mobile phones will increase its adoption of PCM as a substitute to flash NOR memory in MCP packages thanks to 1GB chips made available by Micron in 2012, it added. The next milestone will be the higher-density chips, expected in 2015, will allow access to smart phone applications that are quickly replacing entry-level phones.