nVidia Outs CUDA 6
Nvidia has made the latest GPU programming language CUDA 6 Release Candidate available for developers to download for free.
The release arrives with several new features and improvements to make parallel programming “better, faster and easier” for developers creating next generation scientific, engineering, enterprise and other applications.
Nvidia has aggressively promoted its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Available now, the CUDA 6 Release Candidate brings a major new update in unified memory access, which lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other.
“This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications,” Nvidia said in a blog post on Thursday.
There’s also the addition of “drop-in libraries”, which Nvidia said will accelerate applications by up to eight times.
“The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent,” the chip designer added.
Multi-GPU Scaling has also been added to the CUDA 6 programming language, introducing re-designed BLAS and FFT GPU libraries that automatically scale performance across up to eight GPUs in a single node. Nvidia said this provides over nine teraflops of double-precision performance per node, supporting larger workloads of up to 512GB in size, more than it’s supported before.
“In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides,” Nvidia said.
The previous CUDA 5.5 Release Candidate was issued last June, and added support for ARM based processors.
Aside from ARM support, Nvidia also improved Hyper-Q support in CUDA 5.5, which allowed developers to use MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.