ARM To Focus On 64-bit SoC
ARM announced its first 64-bit cores a while ago and SoC makers have already rolled out several 64-bit designs. However, apart from Apple nobody has consumer oriented 64-bit ARM devices on the market just yet. They are slowly starting to show up and ARM says the transition to 64-bit parts is accelerating. However, the first wave of 64-bit ARM parts is not going after the high-end market.
Is 64-bit support on entry-level SoCs just a gimmick?
This trend raises a rather obvious question – are low end ARMv8 parts just a marketing gimmick, or do they really offer a significant performance gain? There is no straight answer at this point. It will depend on Google and chipmakers themselves, as well as phonemakers.
Qualcomm announced its first 64-bit part late last year. The Snapdragon 410 won’t turn many heads. It is going after $150 phones and it is based on Cortex A53 cores. It also has LTE, which makes it rather interesting.
MediaTek is taking a similar approach. Its quad-core MT6732 and octa-core MT6752 parts are Cortex A53 designs, too. Both sport LTE connectivity.
Qualcomm and MediaTek appear to be going after the same market – $100 to $150 phones with LTE and quad-core 64-bit stickers on the box. Marketers should like the idea, as they’re getting a few good buzzwords for entry-level gear.
However, we still don’t know much about their real-world performance. Don’t expect anything spectacular. The Cortex A53 is basically the 64-bit successor to the frugal Cortex A7. The A53 has a bit more cache, 40-bit physical addresses and it ends up a bit faster than the A7, but not by much. ARM says the A7 delivers 1.9DMIPS/MHz per core, while the A53 churns out 2.3DMIPS/MHz. That puts it in the ballpark of the good old Cortex A9. The first consumer oriented quad-core Cortex A9 part was Nvidia’s Tegra 3, so in theory a Cortex A53 quad-core could be as fast as a Tegra 3 clock-for-clock, but at 28nm we should see somewhat higher clocks, along with better graphics.
That’s not bad for $100 to $150 devices. LTE support is just the icing on the cake. Keep in mind that the Cortex A7 is ARM’s most efficient 32-bit core, hence we expect nothing less from the Cortex A53.
The Cortex A57 conundrum
Speaking to CNET’s Brooke Crothers, ARM executive vice president of corporate strategy Tom Lantzsch said the company was surprised by strong demand for 64-bit designs.
“Certainly, we’ve had big uptick in demand for mobile 64-bit products. We’ve seen this with our [Cortex] A53, a high-performance 64-bit mobile processor,” Lantzch told CNET.
He said ARM has been surprised by the pace of 64-bit adoption, with mobile parts coming from Qualcomm, MediaTek and Marvell. He said he hopes to see 64-bit phones by Christmas, although we suspect the first entry-level products will appear much sooner.
Lantzsch points out that even 32-bit code will run more efficiently on 64-bit ARMv8 parts. As software support improves, the performance gains will become more evident.
But where does this leave the Cortex A57? It is supposed to replace the Cortex A15, which had a few teething problems. Like the A15 it is a relatively big core. The A15 was simply too big and impractical on the 32nm node. On 28nm it’s better, but not perfect. It is still a huge core and its market success has been limited.
As a result, it’s highly unlikely that we will see any 28nm Cortex A57 parts. Qualcomm’s upcoming Snapdragon 810 is the first consumer oriented A57 SoC. It is a 20nm design and it is coming later this year, just in time for Christmas as ARM puts it. However, although the Snapdragon 810 will be ready by the end of the year, the first phones based on the new chip are expected to ship in early 2015.
While we will be able to buy 64-bit Android (and possibly Windows Phone) devices before Christmas, most if not all of them will be based on the A53. That’s not necessarily a bad thing. Consumers won’t have to spend $500 to get a 64-bit ARM device, so the user base could start growing long before high-end parts start shipping, thus forcing developers and Google to speed up 64-bit development.
If rumors are to be believed, Google is doing just that and it is not shying away from small 64-bit cores. The search giant is reportedly developing a $100 Nexus phone for emerging markets. It is said to be based on MediaTek’s MT6732 clocked at 1.5GHz. Sounds interesting, provided the rumour turns out to be true.
nVidia Speaks On Performance Issue
Nvidia has said that most of the outlandish performance increase figures touted by GPGPU vendors was down to poor original code rather than sheer brute force computing power provided by GPUs.
Both AMD and Nvidia have been using real-world code examples and projects to promote the performance of their respective GPGPU accelerators for years, but now it seems some of the eye popping figures including speed ups of 100x or 200x were not down to just the computing power of GPGPUs. Sumit Gupta, GM of Nvidia’s Tesla business said that such figures were generally down to starting with unoptimized CPU code.
During Intel’s Xeon Phi pre-launch press conference call, the firm cast doubt on some of the orders of magnitude speed up claims that had been bandied about for years. Now Gupta told The INQUIRER that while those large speed ups did happen, it was possible because of poorly optimized code to begin with, thus the bar was set very low.
Gupta said, “Most of the time when you saw the 100x, 200x and larger numbers those came from universities. Nvidia may have taken university work and shown it and it has an 100x on it, but really most of those gains came from academic work. Typically we find when you investigate why someone got 100x [speed up] is because they didn’t have good CPU code to begin with. When you investigate why they didn’t have good CPU code you find that typically they are domain scientist’s not computer science guys – biologists, chemists, physics – and they wrote some C code and it wasn’t good on the CPU. It turns out most of those people find it easier to code in CUDA C or CUDA Fortran than they do to use MPI or Pthreads to go to multi-core CPUs, so CUDA programming for a GPU is easier than multi-core CPU programming.”