|
Click
here to go directly to the pretty graphs and skip all the words
in this article.
Our performance group at Digital is often asked to compare Alpha
and VAX systems in terms of VUPs. We are unable to do so, for reasons
described in this article. But other comparisons are possible, and
may be of even greater interest.
What Are VUPs? VUPs
stands for VAX Units of Performance, with the VAX 11/780 = 1.0.
The term was used in at least two senses during the 1980s. Loosely
speaking, one can take any benchmark, divide its time by the time
on a VAX 11/780 system, and call the result a VUP rating for that
particular benchmark.
A stricter interpretation of the term was the Geometric Mean of
the ratios of the execution times of 99 specific CPU-intensive benchmarks,
including 71 Fortran benchmarks, 4 COBOL benchmarks, and 24 Lisp
benchmarks. For example, using the strict interpretation, the VAX
6000 Model 410 achieved 6.64 VUPs.
VUP ratings were useful both outside of Digital and within Digital.
Customers used them to evaluate CPU upgrades, and VAX CPU designers
used both the overall ratings and the detailed benchmark-by-benchmark
performance ratings to evaluate system design tradeoffs and drive
performance improvements.
VUP ratings were extremely useful at the time that they were invented,
but they also had some limitations:
1. Details
Digital generally published the overall VUP ratings for systems,
but did not usually publish details of each of the individual
benchmarks. For example, on a single-CPU VAX 8800 system, 32 double-precision
Fortran programs ranged from 3.35x to 7.06x faster than the VAX
11/780, but this level of detail was not usually published.
2. Run rules
The performance groups responsible for running the benchmarks
understood the rules of the benchmarks, but the rules were not
generally published to other groups and outside of the corporation.
It was not always easy to find a definitive source of answers
to questions such as "Can the benchmarks be re-compiled as
new compilers come along? Are source code improvements allowed?"
3. SMP
The 99 benchmarks were originally defined to measure single-stream
compute-intensive processing. As SMP (Symmetric Multi-Processor)
systems were added, the metric was extended for multi-stream and
parallel processing.
4. Competition
VUPs could not be used to compare hardware from different vendors,
because the suite was not made generally available to other vendors,
and Digital did not invest the time to run it on other vendor's
systems.
5. I/O
VUP ratings were CPU-centric, especially when referring to the
99 compute-intensive benchmarks. VUP ratings did not provide very
much information to users who were primarily concerned about disk
I/O, displaying images on graphics devices, or processing of complete
interactive transactions.
6. 11/780
Comparing performance to the VAX 11/780 was of great interest
as the first few successor models were introduced, but became
less interesting as the years passed.
For all these reasons many performance engineers felt that VUPs
were an oversimplified way of rating systems.
But Marketing loved it, because customers loved it. No matter how
often Engineering went to DECUS and said "Performance Is More
Complicated Than A Simple Matter of VUPs", customer after customer
insisted on knowing the VUPs rating for every new VAX system. It
had the virtue of simplicity and it was easy (or seemed easy) to
understand.
Engineering agreed to Marketing's request to continue rating VAX
systems by their number of VUPs throughout the VAX lifetime. Once
the tradition was established, it could not be broken. But with
Alpha we took a stand, and you will not find any official Digital
ratings of Alpha processors in terms of VUPs. Instead, Alpha processors
have been rated in terms of industry standard benchmarks, for example:
- TPC-A, TPC-C, and TPC-D from the Transaction Processing Council,
http://www.tpc.org
- SPECweb96, LADDIS, GPC, and the SPEC CPU benchmarks, all from
the Standard Performance Evaluation Corporation, http://www.specbench.org
- SYSmark32 from BAPco, the Business Application Performance Corporation,
http://www.bapco.com
- The AIM suites, from AIM Technology, http://www.aim.com
- Linpack, from the University of Tennessee, http://netlib2.cs.utk.edu/performance
- STREAMS, currently hosted at the University of Virginia, http://www.cs.virginia.edu/stream
These industry standard benchmarks (hereafter referred to as ISBs)
address the technical concerns listed above:
1. Details
ISBs require full disclosure of both detailed benchmark results
and overall metrics. If you care primarily about the performance
of weather prediction programs, you can look at the specific program
in the SPEC CPU95 suite which does that operation. If EXCEL is
the only thing you care about, you can look up the appropriate
column in the BAPco results. GPC breaks results down first by
wireframe and surface and then by individual benchmarks.
2. Run rules
ISB run rules are available to the public and are written with
great detail. If you visit the SPEC web site to download the CPU95
run rules, you'll be downloading a 22KB file; at the TPC site,
the TPC-C run rules are 1.2MB.
Having a definitive source of run rules is important because
performance engineers (and customers) want to know that measurements
can be compared to each other even though they may be done by
different people in different places at different times.
When a question comes up, such as "Q. Can the benchmarks
be re-compiled?", it is actually not so important WHAT the
answer is, provided that everyone uses the SAME answer when running
that particular ISB. For example, SPEC CPU95 says "Yes"
to this question, but the compiler must be disclosed and must
be generally available. BAPco says "No", preferring
shrink-wrap applications.
Even questions such as whether or not code modifications are
allowed can reasonably differ from one ISB to another. SPEC CPU95
does not allow source code changes, but Linpack 1000 does. The
former therefore indicates the performance of portable code, and
the latter indicates the best performance that can be achieved
by code that fully exploits a particular system.
3. SMP
Metrics are well-defined for the evaluation of multi-processor
systems. For example, TPC-C measures whole-system throughput,
and SPEC CPU95 has metrics for both single-CPU performance and
SMP throughput. The SPECrate metrics are descended from the methods
developed as VUPs were extended to SMP systems.
4. Competition
Comparisons are easily done across vendors. As of February 1997
the BAPco web site compares results from 17 vendors, the TPC web
site also has 17, and the SPEC site has CPU95 results from 16
vendors.
5. I/O
ISBs measure more than just CPU performance. You can still get
a measure of raw CPU power, for example with SPECint95. But SPECfp95
brings in the main memory system; TPC adds disks and networks;
and GPC concentrates on graphics devices.
6. 11/780
The ISBs do not center on the now 20-year-old VAX 11/780. For
example, TPC-C uses transactions per minute; SPECweb96 uses operations
per second; and SPEC CPU95 compares performance to the Sun SPARCstation
10/40, introduced in 1993.
So we in Engineering can breathe a sigh of relief. That obsolete
VUP metric is dead. Right? Nobody cares about VUPs anymore, right?
So What? I Still Want to Know! Digital
performance groups may not use the term VUPs anymore, but others
still ask. An AltaVista search of Digital's Intranet and of external
web sites turned up 150 references during 1996 alone. Customers
who are considering upgrades to VAX systems want to know how to
compare them versus Alpha systems. In January 1997, even a Digital
Engineering Vice President asked a similar question - she wanted
comparison statistics between the oldest CMOS VAX systems and the
newest Alpha processors.
This author agrees that VUPs are obsolete. But perhaps interesting
historical comparisons can be made by using industry standard benchmarks.
The ISB with the best historical information in the VAX line is
the CPU suite from SPEC. The suite evolved over time:
- The SPEC Newsletter Volume 1, Issue 1 (Fall 1989) introduced
the "SPEC Benchmark Suite Release 1.0", now generally
referred to as SPECmark89. The suite included 10 benchmarks, two
of which were drawn from the VUPs set of 99 referenced at the
beginning of this article. Like VUPs, the SPECmark89 used the
VAX 11/780 as the basis of comparison, with the VAX 11/780 = 1.0
SPECmark89.
- In early 1992 SPEC introduced two suites, SPECint92 and SPECfp92,
as replacements for the SPECmark89 suite. The new suites added
additional types of benchmarks and retired the term "SPECmark",
because it was felt that integer processing and floating point
processing are fundamentally different workloads (SPEC Newsletter
Volume 4 Issue 1, March 1992).
- In August, 1995 SPEC updated the CPU benchmark by introducing
SPEC CPU95 (http://www.specbench.org/osg/cpu95/press.html).
The new suite contains substantially larger problems than the
older suite, and intentionally exercises main memory performance
as well as CPU and cache performance.
So can we get a valid historical comparison by running one of these
three suites on both the oldest and the newest systems? Unfortunately
the answer is still "no". It would not be practical to
run the newest suite on the oldest hardware, because it would take
months to get a valid run and because our group gave away our vintage
system - "the" 1.0 SPECmark89 VAX 11/780 - to a museum
over two years ago.
Running the oldest suite on the newest hardware is perhaps technically
possible, but would not be considered valid both because SPEC has
retired the older benchmarks and because the run time would be so
short that the work of each benchmark would be overshadowed by its
benchmark setup and reporting activity.
Perhaps a different approach will help.
Table 1: SPEC History by CMOS Generation
cmos1
1986 |
2.0 |
VAX
6000-210 |
2.9 |
|
|
|
|
cmos2
1988 |
1.5 |
VAX
6000-410 |
7.1 |
|
|
|
|
cmos3
1990 |
1.0 |
VAX
6000-510 |
13.2 |
|
|
|
|
cmos4
1992 |
0.7 |
VAX
7000-610 |
46.6 |
45.0 |
|
|
|
DEC
7000-610 |
175.6 |
176.0 |
103.1 |
|
|
cmos5
1994 |
0.5 |
AlphaServer
8400 5/300 |
|
512.9 |
341.4 |
12.4 |
7.43 |
cmos6
1996 |
0.35 |
EV6
Estimated |
|
|
|
50e |
30e |
In Table 1, the first two columns represent the Complimentary Metal
Oxide Semiconductor generation number at Digital Semiconductor,
its approximate date of introduction, and the smallest feature size.
Sample systems using the CMOS processes are listed, but are not
necessarily from the same dates as when the process itself was introduced.
For example, the MicroVAX 3500 used CMOS1 and was introduced in
1987, but the VAX 6000-210 did not arrive until 1988.
The CMOS6 process was introduced during 1996 and Digital 21164
chips are now fabricated using that process. At the fall, 1996 Microprocessor
Forum, Digital described the Alpha 21264, a new chip that will fully
exploit the CMOS6 process, with performance of SPECfp95 50e (e=estimated)
and SPECint95 30e. The Alpha 21264 is informally known as EV6. For
details, see http://www.digital.com/info/semiconductor/a264up1/index.html.
Notice that each row of Table 1 uses at least one well-defined
metric that is also used by the previous row. For example, the DEC
7000-610 is about 4 times faster than the VAX 7000-610 (176/45),
and the AlphaServer 8400 5/300 is about 3 times faster than
the DEC 7000-610 (512.9/176). So a rough history of performance
can be obtained by multiplying out each of these ratios.
For
a graph that multiplies out the relative performance click on the
bar graph thumbnail, Figure 1. This graph is intended to give a
rough idea of relative performance. It is in no way implied that
SPEC95 can be converted into SPEC92, or SPEC92 into SPEC89!
Notice that it is very hard to distinguish among the early systems.
A log scale brings more into focus, if you click on the line graph
thumbnail, Figure 2.
Figure 2 draws lines to connect similar processors, but does not
connect the VAX line to the Alpha line because the systems are so
different. Even though both the VAX 7000-610 and the DEC 7000-610
were built using the same CMOS4 technology, the same system bus,
and the same memory system, the Alpha gained a 4x performance advantage
through its aggressive application of RISC technology.
Summary So, how many
VUPs is that Alpha in the window? Unfortunately, there is no precise
answer. But using industry standard benchmarks, one can estimate
that contemporary Alpha systems are on the order of 1000 times faster
than the earliest VAX systems.
At the time of publication, the author was a member of the CSD
Performance Group at Digital Equipment Corporation. He contributed
to the performance improvement of compilers, memory systems, editors,
4GLs, and windowing systems.
|