The Remote Computation System (RCS)

P. Arbenz, W. Gander and M. Oettli,
Institute of Scientific Computing

Keywords: high performance computing, distributed computing

Funding: This project was supported by the Swiss National Science Foundation.


Abstract

Wide area computer networks have become a basic part of today's computing infrastructure. These networks connect a variety of machines, from workstations to supercomputers, presenting an enormous computing resource. Furthermore, sufficient software for solving problems in numerical linear algebra on high performance computers is around today.

However, the access and the use of these computers and the software is often complicated. A major problem for the inexperienced user to exploit such high performance computers is that he has to deal with machine dependent low level details.

The goal of this project is to make high performance computing accessible to scientists and engineers without the need for extensive training in parallel computing and allowing them to use resources best suited for a particular phase of the computation. Also, the emphasis is laid on algorithms for solving problems in numerical linear algebra, the concepts presented here are applicable to any high performance algorithms.

This goal shall be achieved with a remote computation system (RCS), which provides an easy-to-use mechanism for using computational resources remotely. The user's view of the RCS is that of an ordinary software library. The user calls RCS library routines (e.g. to solve a system of linear equations) within his program running on a workstation. In contrast to common libraries, the problem is not necessarily solved on the local workstation, but is dynamically allocated on an arbitrary machine in a given pool of computers, in order to minimize the response time. Because RCS is called asynchronously, it allows distributed applications with several solvers running concurrently on different computer platforms.

The Remote Computation System consists of two components, a library of interface routines and the run time system. The underlying computational software can be any existing scientific package such as LAPACK. Before running a RCS application, the user first has to start up the RCS run time system. RCS is a single user system but multiple RCS applications are allowed per user to run concurrently. The components of the RCS run time system are shown in Figure below. [system overview]
The server is the core of the RCS. Its task is to accept requests from user's application and to start an appropriate solver on a host in the pool. If the remote host is not specified by the user, the server selects the solver-host pair such that the response time is minimized. Such a selection process has not yet been done in the context of a numerical library. In order to makef an optimal choice, the server needs information about

A daemon called monitor on each host is responsible for periodically measuring the dynamic parameters. All other information is static and is read from a configuration file at startup time.

Publications


Contact

We would like to hear your suggestions, ideas, opinions and comments on this project. Feel free to send an email.

Peter Arbenz
Institut für Wissenschaftliches Rechnen
ETH Zentrum IFW C 25.2
8092 Zürich, Switzerland
Phone: +41-1-632 74 32
FAX: +41-1-632 11 72
Email: arbenz@inf.ethz.ch
finger

[ CS-Department | Scientific Computing ]
Last update: 25 April 1996