Keywords: high performance computing, distributed computing
Funding: This project was supported by the Swiss National Science Foundation.
However, the access and the use of these computers and the software is often complicated. A major problem for the inexperienced user to exploit such high performance computers is that he has to deal with machine dependent low level details.
The goal of this project is to make high performance computing accessible to scientists and engineers without the need for extensive training in parallel computing and allowing them to use resources best suited for a particular phase of the computation. Also, the emphasis is laid on algorithms for solving problems in numerical linear algebra, the concepts presented here are applicable to any high performance algorithms.
This goal shall be achieved with a remote computation system (RCS), which provides an easy-to-use mechanism for using computational resources remotely. The user's view of the RCS is that of an ordinary software library. The user calls RCS library routines (e.g. to solve a system of linear equations) within his program running on a workstation. In contrast to common libraries, the problem is not necessarily solved on the local workstation, but is dynamically allocated on an arbitrary machine in a given pool of computers, in order to minimize the response time. Because RCS is called asynchronously, it allows distributed applications with several solvers running concurrently on different computer platforms.
The Remote Computation System consists of two components, a library
of interface routines and the run time system. The underlying
computational software can be any existing scientific package such
as LAPACK.
Before running a RCS application, the user first has to start up
the RCS run time system. RCS is a single user system but multiple
RCS applications are allowed per user to run concurrently.
The components of the RCS run time system are shown in
Figure below.
The server is the core of the RCS. Its task is to accept requests from
user's application and to start an appropriate solver on a host in the
pool. If the remote host is not specified by the user, the server
selects the solver-host pair such that the response time is
minimized. Such a selection process has not yet been done in the context
of a numerical library. In order to makef an optimal choice, the server
needs information about
Peter Arbenz