New Problems in Software Complexity
Abstract
Complexity is a major problem in modern simulation software. Users always
want more features, thus making both the implementation and the mastering
of software constantly more difficult. Language designers have estimated
that a good computer language should have less than 100 keywords. If it has
more, the compiler becomes large and clumsy; the limited acceptance of
PL/I illustrates the fate of languages with large, clumsy compilers. The
user's manual for a language should be less than 100 pages long to allow
the average user to master all its features. Clearly, these are serious
limitations for simulation software, since the number of required keywords
is dictated by the complexity of the underlying tasks rather than by
the wishes of the language designer.
How can we overcome this problem? I believe one of the keys lies in
separating data management from the language itself. A simulation system
should have a data base management system (DBMS) adapted to the specific
needs of simulation. Standridge has described such a DBMS. Programs that
perform different parts of a system analysis may then be implemented
independently; they can communicate through the data base. This approach
has many advantages; let me begin with some that are unique to simulation:
- Ability to combine outputs from several runs (sometimes called an
overplot). Many simulation languages, such as CSMP-III, offer this
feature. However, it becomes natural when all results are saved in a
data base. A separate postprocessor can then readily retrieve and
display data, regardless of which specific run created it. The
postprocessor communicates with the simulation program only through
the data base.
- Ability to combine simulated and experimental results. All that is
necessary is a real time data acquisition program that stores its
results in the data base. Although this combination would greatly
simplify model validation, it does not exist in any current simulation
language of which I am aware.
- Dynamic loading of tables. The simulationist need not enter tabular
data directly into the simulation program (e.g., using a CSMP FUNCTION
statement). Instead, the tables are simply stored in the data base.
This data may be user generated, generated by other simulation runs, or
even generated by real measurements. For example, this feature would
greatly simplify the solution of the finite-time Riccati differential
equation. This equation must be computed backward in time, while the
system equations must then be computed forward using the backward
solution.
- Statistical analysis of noisy data. One often would like to analyze
stochastic results statistically. It is natural to store them in the
data base and let an independent statistical analysis program (e.g.,
SAS or SPSS) perform the analysis. The simulation analyst then need
not duplicate the functions of well established packages.
- Range analysis. With stochastic models, one often wishes to display
the range of the results. Managers generally prefer this representation,
since it allows them to see trends and confidence limits that are
difficult to express. Again, a postprocessor can perform this kind of
analysis through the data base without involving the simulation
language at all.
- Actual storage of models, parameter values, experimental frames, and
other information related to the simulation project. Modeling projects,
like other projects, have their own internal data base; this may include
partial models, models with different levels of detail or different
orientations, validation tests, and base cases. Usually the analyst
keeps this information in a file cabinet or in a stack of old runs.
Logically, nowadays, like personnel files or ac counting records, it
belongs in a data base system. Oren and Zeigler ave discussed the
automation of model management.'
Another advantage of this approach is that the independent modules can have
separate manuals. A manager may then, for example, study only the
postprocessors manual, since he or she will use the computer only to display
data produced by other people's programs. The approach also allows a natural
division of tasks, rather than requiring people to write programs that must
communicate with each other directly.
Interested in reading the
full paper?
(2 pages, 172,853 bytes, pdf)
Homepage
Last modified: January 12, 2006 -- © François Cellier