nVidia Releases CUDA
Nvidia has released CUDA – its code that lets developers run their code on GPUs – to server vendors in order to get 64-bit ARM cores into the high performance computing (HPC) market.
The firm said today that ARM64 server processors, which are designed for microservers and web servers because of their energy efficiency, can now process HPC workloads when paired with GPU accelerators using the Nvidia CUDA 6.5 parallel programming framework, which supports 64-bit ARM processors.
“Nvidia’s GPUs provide ARM64 server vendors with the muscle to tackle HPC workloads, enabling them to build high-performance systems that maximise the ARM architecture’s power efficiency and system configurability,” the firm said.
The first GPU-accelerated ARM64 software development servers will be available in July from Cirrascale and E4 Computer Engineering, with production systems expected to ship later this year. The Eurotech Group also plans to ship production systems later this year.
Cirrascale’s system will be the RM1905D, a high density two-in-one 1U server with two Tesla K20 GPU accelerators, which the firm claims provides high performance and low total cost of ownership for private cloud, public cloud, HPC and enterprise applications.
E4′s EK003 is a production-ready, low-power 3U dual-motherboard server appliance with two Tesla K20 GPU accelerators designed for seismic, signal and image processing, video analytics, track analysis, web applications and Mapreduce processing.
Eurotech’s system is an “ultra-high density”, energy efficient and modular Aurora HPC server configuration, based on proprietary Brick Technology and featuring direct hot liquid cooling.
Featuring Applied Micro X-Gene ARM64 CPUs and Nvidia Tesla K20 GPU accelerators, the new ARM64 servers will provide customers with an expanded range of efficient, high-performance computing options to drive compute-intensive HPC and enterprise data centre workloads, Nvidia said.
Nvidia added, “Users will immediately be able to take advantage of hundreds of existing CUDA-accelerated scientific and engineering HPC applications by simply recompiling them to ARM64 systems.”
ARM said that it is working with Nvidia to “explore how we can unite GPU acceleration with novel technologies” and drive “new levels of scientific discovery and innovation”.
nVidia Outs CUDA 6
Nvidia has made the latest GPU programming language CUDA 6 Release Candidate available for developers to download for free.
The release arrives with several new features and improvements to make parallel programming “better, faster and easier” for developers creating next generation scientific, engineering, enterprise and other applications.
Nvidia has aggressively promoted its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Available now, the CUDA 6 Release Candidate brings a major new update in unified memory access, which lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other.
“This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications,” Nvidia said in a blog post on Thursday.
There’s also the addition of “drop-in libraries”, which Nvidia said will accelerate applications by up to eight times.
“The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent,” the chip designer added.
Multi-GPU Scaling has also been added to the CUDA 6 programming language, introducing re-designed BLAS and FFT GPU libraries that automatically scale performance across up to eight GPUs in a single node. Nvidia said this provides over nine teraflops of double-precision performance per node, supporting larger workloads of up to 512GB in size, more than it’s supported before.
“In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides,” Nvidia said.
The previous CUDA 5.5 Release Candidate was issued last June, and added support for ARM based processors.
Aside from ARM support, Nvidia also improved Hyper-Q support in CUDA 5.5, which allowed developers to use MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.
nVidia’s CUDA 5.5 Available
Nvidia has made its CUDA 5.5 release candidate supporting ARM based processors available for download.
Nvidia has been aggressively pushing its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Now the firm has announced the availability of a CUDA 5.5 release candidate, the first version of the language that supports ARM based processors.
Aside from ARM support, Nvidia has improved supported Hyper-Q support and now allows developers to have MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.
Ian Buck, GM of GPU Computing Software at Nvidia said, “Since developers started using CUDA in 2006, successive generations of better, exponentially faster CUDA GPUs have dramatically boosted the performance of applications on x86-based systems. With support for ARM, the new CUDA release gives developers tremendous flexibility to quickly and easily add GPU acceleration to applications on the broadest range of next-generation HPC platforms.”
Nvidia’s support for ARM processors in CUDA 5.5 is an indication that it will release CUDA enabled Tegra processors in the near future. However outside of the firm’s own Tegra processors, CUDA support is largely useless, as almost all other chip designers have chosen OpenCL as the programming language for their GPUs.
Nvidia did not say when it will release CUDA 5.5, but in the meantime the firm’s release candidate supports Windows, Mac OS X and just about every major Linux distribution.