High Performance and Distributed Computing
Multi-core processors have been all the rage for several years now. Octa-core processors rule the roost at the time of writing this blog. Stopped to wonder why more cores? And did you know that today's GPUs can have upto 3000 cores?! So why not throw away the CPU and plug in another GPU?
What can you do with millions of independent hand-held processors equipped with fancy sensors(think mobile phones or just regular PCs)? Distributed computing!
But HPC has been around way before graphics cards were even a proper thing. We also talk about vanilla flavours of HPC, High Performance Computing like using FORTRAN and C/C++ with OpenMP.
This page answers these questions and is a guide to GPU computing and other forms of high performance computing.
To answer the question about the CPU vs the GPU; the reason why we now have multiple cores on our CPUs is simply a consequence of a break-down of Moore's law. As chips get smaller and smaller, they produce more and more heat. So rather than trying to make one super-packed and powerful CPU, engineers found that they could get better overall performance and lower costs by simply using more than one of the bigger, less powerful CPUs in tandem. But then the GPU is like 3000 CPUs put together? Nope. Not every core is made equal. GPU and CPU cores are designed for completely different things. CPUs don't have great throughput and rather excel at managing latency. It is the exact opposite with GPUs that have massive throughput but can suffer from severe latency in memory access and thread synchronisation.
So what are GPUs better at doing than CPUs? Parallel operations, many orders of magnitude better. In fact that is what GPUs have always been used for: to perform thousands of floating point operations simultaneously. This is essential for high throughput graphical applications like games and Image/Video editing software. They are able to achieve this high parallelism due to a completely different chip architecture, with each core working in parallel. Combine that with mutlithreading, which can be thought of as dividing each core into hundreds or thousands of virtual cores, and you have a device that can cut down compute times by factors of 100 or even much more. So what takes months on a CPU can be done in days or hours on a GPU.
So are GPUs like the new silver bullet of computing? Nope. GPUs follow the SIMD(Single Input Multiple Data) architecture. The caveat is that all the threads are running the same code and typically the GPU has to wait for all of them to finish. So what happens when you want only one copy of code to run, ie when you have sequential commands? Yeah, that's where CPUs are the boss.
So the idea is: The CPU calls the GPU to perform the computations that can be 'parallelised' and then control goes back to the CPU. Repeat.
So why the hype? GPGPU.
General Purpose GPU computing or GPGPU, is about using the SIMD paradigm to do general computations. Because, graphical or not, the basic mathematical operations are the same. Multi-particle simulations like the N-body problem or CFD benefit greatly from GPGPU. But what's really created the greatest hype is the use they find in training Deep Neural Networks. It can be argued that DNNs have become so powerful and received so much attention simply because their training can be parallelised, and we now have the hardware to do GPGPU. Vendors like Nvidia are now building 'GPUs' specifically to do Deep Learning.
Here are the best tools to start with -
- CUDA - is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). It is restricted to NVIDIA GPUs.
- Open Computing Language (OpenCL) - is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators.
- OpenMP (Open Multi-Processing) - is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran on most platforms. The benefit of OpenMP is that it doesn't require major modification of your source code (unlike the others in this list) and can be activated by simply adding a flag at compile time.
- Message Passing Interface (MPI) - is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computing architectures. It is independent of programming language and is useful for large clusters of computational units.
CUDA or OpenCL?
Go with CUDA if you have a CUDA enabled Nvidia GPU, and especially if you are on Windows. On linux, give CUDA a shot, but it has a lot of wierd requirements and known issues. That being said, CUDA has a lot of great libraries and they are only growing.
This is perhaps the quickest and comprehensible introduction to CUDA programming in C/C++. CUDA is presently supported on C/C++ and FORTRAN. However, C/C++ is highly recommended over FORTRAN due to larger libraries and the fact that the only CUDA FORTRAN compiler out there at present is proprietary and costs a lot.
OpenCL is however more general and runs on most GPU architectures. But it's strength is also it's weakness: it pays a slight toll in performance as unlike CUDA, it is not optimised for any architecture.
- Programming Massively Parallel Processors by Kirk and Hwu - The most comprehenive and highly recommended.
- CUDA by Example by Sanders and Kandrot
- The CUDA C programming guide by NVIDIA - is also very comprehensive
There are a lot of things you can do with this new found power. You can save yourself hours in performing high-demand simulations and while training DNNs. But to be fair, you don't really need CUDA for Deep Learning anymore. Modern deep learning libraries like TensorFlow (GPU version) utilize CUDA internally to train deep learning models.
IITB has been declared as a GPU Centre of Excellence by NVIDIA. More info about research here. There are lots of professors in every department who need HPC in their work. Let's Go!
The idea of parallelism is also put to use in Supercomputers and computing clusters. OpenMP is the most widely used and standard and can run on just about anything, from a cluster of mobile phones to a supercomputer.