The definition of grid computing entails the notion of making computing power...
At a glance
- Grid computing involves aggregating processing capacity, storage and other resources to achieve a task that would be beyond the resources of a given institution.
- The electricity grid is used by some as a model for grid computing, but processor power 'on tap' is a long way from reality for complex projects.
- Volunteer computing projects rely on individuals donating spare 'processing cycles' on their PCs to solve problems in areas such as medicine, astrophysics and climatology.
- Large academic research projects, such as those planned for the Large Hadron Collider (LHC), rely on grid computing to analyse their vast data sets.
- Issues around security and mutual trust when 'donating' capacity are balanced by advantages of higher resource utilisation and contributing to worthwhile projects.
- Educational networks could be used to advance volunteer projects or for in-house processing tasks, but security issues may limit the former while the latter may be more effectively achieved through 'cloud' services.
Getting on the grid
The broad definition of grid computing, otherwise known as utility computing, entails the notion of making computing power as available as the national grid - some strategists foresee a time when you will be able to plug a terminal into a 'wall socket' and get all the computing power you need. This view simplifies the current state of computing to 'pure' processor power, analogous to electricity, without reference to all the complexities of differing processor architectures, storage requirements, peripheral interactions and a host of other factors. In many respects cloud computing (see
TechNews 11/08) offers these facilities by providing computing power and storage via the internet; the user does not know where those servers are located, but can lease the capacity required.
'Grid computing', in more common use and as discussed in this article, refers to a form of distributed computing whereby users can access spare capacity on other people's resources to deal with tasks that would take far too long on in-house hardware. Provision is enabled by a complex web of co-operative pacts and predefined service level agreements (SLAs) that are a far cry from the 'plug in, use now and get billed after' vision of utility computing. As defined on
Wikipedia:
Grid computing (or the use of computational grids) is the combination of computer resources from multiple administrative domains applied to a common task.
This definition indicates one of the key features of current computing grids: heterogeneity. There are many computing platforms and a whole mass of communities, research projects and nascent standards, only some of which will be covered in this article.
Grid computing is most suited to scalable, massively parallel computing tasks. These applications can generally handle out-of-order processing, with algorithms that deal with late or missing results, and rely on 'message passing' protocols to control execution by allocating tasks, sharing progress and transferring completed data to the appropriate point. Such tasks include searching very large data sets, video rendering, climate simulations, genome analysis, processing particle physics data and drug research. Some projects involve 'volunteer computing' where people grant access for applications to run on spare processor capacity while their computer is idle. One the most widely known examples is the
SETI@home project, searching for signals attributable to intelligent sources among the radio background 'noise' of the universe. Some projects allow the greatest contributors to propose their own tasks to be run on the virtual, networked processor.
Many large academic research projects also use grid computing, taking advantage of facilities in partner organisations to process data during idle time, perhaps at night or between in-house applications.
Educause has a helpful article,
7 things you should know about Grid Computingand the Worldwide LHC Computing Grid (WLCG) has published
Grid computing in five minutes.
The structure of the grid
The grid is inherently heterogeneous, a loose collection of processors, storage, specialised hardware (for example particle accelerators, electron microscopes and particle accelerators) and network infrastructure. For each task, appropriate hardware has to be discovered, processor time booked, network capacity scheduled (especially where large data sets are involved) and collation of results organised. Although this can be achieved on a peer-to-peer basis (in which no one machine has overall control), it is generally arranged as a client-server structure. 'Middleware', is often utilised to manage the applications and resources required to achieve a particular outcome, such as the
Globus toolkit or Berkeley University's
BOINC software (both of which are open source).
The complexities of managing grid applications are offset by significant advantages, including:
- access to resources beyond those available within a given institution
- optimisation of spare capacity
- flexibility to scale and reconfigure available resources
- avoidance of single points of failure in the computing infrastructure used
- data replication across a number of facilities
- provision of 'virtual' resources in-house, so that experienced researchers are less tempted to go to institutions elsewhere.
Academic institutions have created partnership groups for sharing resources, notably
GridPP(for particle physics tasks in the UK), the EU's
EGEE science network and the UK's
National Grid Service (NGS); while international directories like the
GridGuide provide international contacts. The
Open Grid Forum (OGF) has been behind a number of substantive projects, especially developing standards for the protocols required to deliver and manage grid applications.
Volunteer computing
Volunteer projects are the simplest structure of grid computing: a server provides an application for users to download and a series of 'work units' to be processed during the processor's idle time; each work unit is independent, so results can be returned in any order. However, the researchers running the application do not know whether the user or client PC will produce accurate, authentic results, so tasks are generally randomly duplicated between users, with results compared to ensure validity. The owner of the client PC has to manage the installation and patching of the client application, while trusting that the application provider is doing the work purported, that no malware is being delivered and that the application will not interfere with the operation of the computer. Networks of PCs in schools and colleges could contribute huge numbers of spare processing cycles to these projects, but management overheads and security concerns often deter system managers from volunteering their resources.
Applications include research into disease, medicines, climate change, astronomy and particle physics.
GridRepublic and the
World Community Grid allow users to select the projects they wish to contribute to, while
Intel is promoting its volunteer projects through Facebook. Many projects, such as the protein folding simulation
Folding@home, now support processing using games consoles and the parallel instruction pipelines found on graphics processors. (See 'GPU computing' in
TechNews 09/08.)
Research networks
Collaborative networks of academic researchers can assume that the infrastructure is trusted, diminishing the problems faced by public volunteer projects. However, the actual tasks are often far more complex, involving very large data sets and a much greater range of hardware, from desktop PCs through to supercomputers.
The Large Hadron Collider (LHC)
will be reliant on massive grid computing capabilities to process the data that it is expected to produce. The WLCG has 11 Tier 1 and 140 Tier 2 data centres that will distribute the 15 million gigabytes (15 petabytes) of data created each year. The primary fibre optic network links will run at 10Gbps, allowing data transfers of several gigabytes per second through clustered channels.
Computing facilities at this scale represent a considerable investment, so the prioritisation, scheduling and job control are critical to effective use. A number of projects and protocols (in the widest sense) are focussed on this issue. For example:
- GridFTP is an extension to the standard internet file transfer protocol (FTP), allowing much larger blocks of data to be simultaneously and securely transmitted across multiple channels, as well as providing the facility for just part of a single, extremely large file to be downloaded.
- GridCOMP and Phosphorus are assembling frameworks and services to facilitate higher level project management.
Commercial opportunities
Large companies, including those involved in pharmaceuticals, aerospace simulations, data mining for market research and prospecting, also have immense processing requirements, but the data they handle can be commercially sensitive. SLAs must be legally watertight, covering issues like security, intellectual property and data protection (especially where personal information is held).
The function of
GridEcon is to create an online auctioning, scheduling and management system for computational capacity. The EU's
SIMDAT project had a wider remit, investigating all elements of grid computing, from protocol standardisation through to systems that allow companies to readily create virtual organisations through which they can define projects involving establishing, administering and securely taking down distributed processing and storage capacity.
The grid and the cloud
Many applications already run in the 'cloud', leasing facilities such as Amazon's Web Services (AWS) or Microsoft's soon to be launched Windows Azure Platform. Although these may use a distributed computing model to provide the services, they have a single point of accountability through the provider's SLA. The grid computing applications outlined in this article are far more complex, but they can provide computing power for 'free', or at a substantially reduced price, for academic researchers, while ensuring near full utilisation of expensive computing resources. This grid remains more informal in structure, collaborative in development and altruistic in nature, although it is becoming more formalised as the environment matures and the scale of individual projects increases, especially as commercial entities begin to adopt these approaches.
Educational establishments could consider donating spare computing cycles to advance areas of research considered to be for the good of humanity, although they need to factor in the management overheads that deployment is likely to incurand consider whether it will add significantly to energy consumption. Middleware, such as BOINC, could be deployed across a large institution to manage in-house processing tasks, or capacity could be leased from one of the cloud providers. However, access to massively scalable, grid computing resources is likely to remain the province of research organisations based in higher education and industry.