Status:
Active hardware manufacturer.
Overview of Organization:
Meiko Scientific was founded in 1985 after
Inmos management recommended a delay in the introduction of the transputer. The
transputer development team headed by Miles Chesney resigned from Inmos and
formed Meiko to exploit the new parallel processing technology. Nine weeks later
they demonstrated a transputer system at the SIGGRAPH graphics trade show in San
Francisco in July 1985. The Computing Surface which developed from it became
commercially available in 1986.
Meiko currently has more than 100 employees in its Bristol Research and
Development centre, and offices and distributors throughout Europe, Asia and the
United States.
Important customers/end users for Meiko are Shell Exploration and Production;
British Aerospace; Chemical Design; Defense Research Agency Malvern, and Applied
Geophysical Software and the UK Atomic Energy Authority
Meiko have produced a range of products from workstation add-on boards to
large stand-alone parallel systems with hundreds of processors.
Platforms Documented:
Contact Address:
Meiko
Reservoir Place
1601 Trapelo Road
Waltham MA 02154, USA.
Tel (617) 890 7676
Fax (617) 890 5042
or
Meiko
650 Aztec West
Almondsbury
Bristol BS12 4SD, UK.
Tel 0454 616171
Fax 0454 618188
See Also:
Meiko's own WWW server.
The Meiko Computing Surface
Overview of Platform:
The original Computing Surface from Meiko was
built from a large number of transputer processing nodes. These nodes could be
connected together to form logical processing domains to give each user a
dedicated resource from the pool of nodes in the whole machine. The original
computing surface was solely built from Transputers, but now more exotic
``flavours'' are available and both SPARC and i860 nodes may be intermixed to
form a heterogeneous system. See the description of the Concerto
below.
The Computing Surface was first known as just the "CS" but is now sometimes
referred to as Meiko's CS1 in contradistinction to their CS2 product. The
operating system for the early CS models was a UNIX like subset known as MeikOS.
Meiko quickly realised the advantages of using a pre-existing operating system
with which their customer base would be more familiar, and provided a version of
SunOS. Later models of the CS also provided a Sun compatible host or front end
processor. A number of boards were provided which allowed a CS to be mounted on
a Sun workstation and latterly, as a stand alone machine with a SPARC node as an
internal "log-on" module.
The early CS was difficult to debug, but later models were provided with tdb,
a transputer debugger, and pdb, and more general parallel debugger.
Compute Hardware:
The CS was purchased as a cabinet and power supply and
a kit of boards and software modules that could be configured to customer
requirements. Each board had at its heart a transputer, and the original MK009
compute board and MK015 graphics board each had T414 integer transputers as the
node CPUs. When the T800 series of floating point transputers became available
from Inmos, Meiko supplied a range of compute, mass storage, interface and
graphics boards.
Interconnect / Communications System:
The early CS models had a topology
that was configurable with a number of twisted wire pairs in the back of the
cabinet, and it was possible to reconfigure the machine in less than an hour
manually. Later models of the CS has a switchable backplane, that allowed the
topology to be reconfigured in software. At the time this was a very valuable
feature, when switching technology was still primitive compared to the present,
and to obtain the best performance on an application it was vital to use the
optimum topology to maximise data locality. The Meiko CS proved an invaluable
tool for the experimental investigation of different topologies. In addition to
the 4 nearest neighbour links on each transputer node, the Computing Surface
Network (CSN) communications architecture allowed point-to-point communication
within the heterogeneous node system.
Memory System:
Memory in the CS1 was entirely distributed - both
physically and logically. Early T4 transputer nodes came with 256kBytes of
memory, but later T8 compute nodes typically had between 1 and 8 MBytes of
memory. The machine was programmed as a real memory machine, in that their was
no virtual memory, consequently a lot of the art of programming a CS was in
knowing how to minimise code and data size.
Benchmarks / Compute and data transfer performance:
The compute
performance of the early T4 transputers was quite low at typically less than
1MIPS per node. The T8 nodes turned out to be very well balanced indeed and
typically yielded around 1MFLOPS per node.
The transputer links are designed to support a communications rate of 5, 10
or 20 Mbits per second, depending on the model. In principle this should work
two way, doubling the peak speed. In practice, for applications level
programming, the effective rate of communications per node is around 1MByte per
second.
The fast I/O boards on the CS1 allowed up to 80MBytes/second transfer rate.
Operating System Software and Environment:
The CS was originally
programmed using Occam and Occam channels as the only available communications
mechanism between processors. This was quickly seen to present software
engineering difficulties, with a limited amount of Occam software being
available worldwide. Meiko rapidly provided C and Fortran compilers, and these
could communicate in a message passing paradigm using initially Meiko's own CS
Tools package. CS Tools provides a port based set of explicit message passing
primitives and is in many ways superior to some of the other message passing
systems used worldwide. Unfortunately, CS Tools is not available on any other
organisations platforms but Meiko's and so is likely to go the way of all
small-market proprietary software. PARMACS was also available on the later CS
models.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
Special I/O nodes were available as separate boards to be
integrated into the system. Two examples are the Mass Store Element (MK021) and
the Data Port Element (MK040). The Mass Store Element provided a memory mapped
SCSI interface at 3MBytes/s which could be connected to a 100MByte or 500MByte
Winchester disk or a 1-2GByte laser disk. The Data Port Element provided an I/O
link capable of 80MBytes/s data transfer rate.
The early CS was difficult to integrate into a general purpose computing
environment with its non-compatible operating system, binary source files of
Occam. VAX systems were used as an early attempt to map the files on the Meiko
system onto a filing system and environment that users would find more familiar.
Later models of the CS with the SunOS operating system were however well
integrated and could cross mount file systems with user workstations.
One of the largest computing surfaces built from T800s was the Edinburgh
Concurrent Supercomputer which had over 450 T800 transputer nodes. Although this
machine was nearly always partitioned up by users into about a dozen domains,
experiments done at Edinburgh did prove that applications could be made to run
on the full machine. Scaling experiments inevitably showed that the 4 nearest
neighbour links between transputer nodes were not sufficient for many algorithms
and applications. Nevertheless, for many then state-of-the art application
problem sizes, 16 or 64 or 128 transputers proved a very well balanced compute
resource.
Notable Applications / Customers / Market Sectors:
Meiko had a number of
noteworthy CS installations worldwide. As noted above, the largest CS1
installation was at the Edinburgh parallel Computing centre in Scotland. A
number of universities still operate CS1 hardware, and it was a particularly
good platform in terms of being able to buy hardware incrementally and still
have a useful machine at each stage. The UK Atomic Energy Authority used Meiko
CS hardware for computational fluid dynamics and engineering simulations.
Another major application area for Meiko has been OLTP and special purpose Meiko
Oracle server machines have been configured.
Overall Comments:
The Meiko CS1 was effectively the pioneering platform
for MIMD and SPMD computing in Europe. Many of the ideas and methods seen in
current commercial platforms owe their origins to the early CS and those who
worked on it. Perhaps its most noteworthy feature is that of balance between
compute and communications.
Meiko Concerto
Overview of Platform:
In response to increased compute performance on
competitor machines, Meiko designed an enhancement to their Computing Surface
that involved making hybrid nodes, using two T800 transputers and an Intel i860
chip. This machine was perhaps not marketed very well and went under a variety
of different names. It is usually referred to as the Concerto (reflecting that
the transputers and i860s were acted in concert) but is also sometimes just
called Meiko's i860 CS.
This machine proved interesting since it was one of the earliest to bring
together a vector node in a distributed memory MIMD machine. Although Meiko
successfully increased the compute performance of the hardware, this machine was
let down by immature compiler technology and a poor balance between compute and
communications performance.
Compute Hardware:
The Concerto had two Inmos T8 transputers and one
Intel i860 chip on each node. These three chips communicated together via a
shared memory bus system.
Interconnect / Communications System:
The Concerto nodes used the four
links per T8 transputer to form an eight-link hybrid node. A software switchable
backplane allowed these nodes to be configured at will in any desired topology.
Memory System:
The memory shared between the three chips on each node
was accessible to the user as a real distributed memory system, and typically
nodes had 8, 16 or 32 MBytes of memory. There was no virtual memory system, so
it was important to configure the machine with enough memory for the
application.
Benchmarks / Compute and data transfer performance:
The i860 chip is
notionally capable of in excess of 60MFLOPS, and indeed hand-crafted assembler
coded application have achieved in excess of 50MFLOPS per node in highly
vectorizable sections. More typically, well written vector Fortran applications
achieved between 7 and 12 MFLOPS per node. This is partially due to there being
insufficient bandwidth between nodes to keep the i860 busy, but also because of
the difficulties in building a compiler smart enough to make good use of the
many facilities on the i860 chip.
The communications system was only slightly better than that of single T8 CS
at around a few MBytes per second per link achievable with careful programming.
Operating System Software and Environment:
The Concerto ran the SunOS
operating system, and the native message passing system was the Meiko CS Tools
package, a port based system, much the same as the later models of the CS.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
I/O on the Concerto was implemented through the T8 links
communications structure, so that each node could only access the filing system
through an effective bottleneck of one T8 link. This proved somewhat of a
handicap for general applications codes, although applications rewritten
specially for this machine were able to make use of the large real memory system
and achieve superlinear speedups in some cases by avoiding the paging costs of
virtual memory. The first model of the Concerto suffered from teething troubles
in the board design. This machine was slightly ahead of its time, and the design
pushed board integration technology close to its limits. This problem was solved
in the later Concerto models by modifications to the board layout.
Considering the Edinburgh machine was run 24 hours a day, 7 days a week for
nearly two years on two demanding applications, it proved remarkably reliable.
Notable Applications / Customers / Market Sectors:
The most notable
installation of this machine was the Grand Challenge Machine installed at
Edinburgh parallel Computing Centre, for the QCD and Carr Parrinello simulation
projects. This machine had 64 of the hybrid nodes and was run as a dedicated
resource for those two application codes. Latterly, as the teething troubles
were overcome, EPCC was successful in porting industrial applications to this
machine, including computational fluid dynamics simulations.
Several other sites still have smaller machines in the form of one or two
boards as accelerators to CS1 machines, or as in-Sun boards used for development
purposes.
Incremental scaling is possible by adding boards to the system, although it
appears that 64 nodes is an effective practical limit due to the filing system
communications bottleneck.
Overall Comments:
This machine pioneered some interesting ideas in terms
of hardware balance. It is probably most noteworthy as indicating the role of
vector computing in the new high performance platforms. (Physically) distributed
memory systems are probably the only way a machine can be made scalable, and
vector technology is relegated to the node level of a parallel machine where it
can fulfill a valuable role if well implemented.
Meiko CS2
Overview of Platform:
The Meiko CS2, like the CS1 has a modular hardware
and software configuration, and therefore a system can be chosen according to
customer needs and budget.
Compute Hardware:
Each compute node in a CS2 is a SPARC superscalar
microprocessor, with an optional attached Fujitsu vector processing unit. A
specific installation will be some mix of scalar and vector nodes. There are
also two variants of the scalar nodes, one optimised for I/O intensive
applications, and one for computationally intensive applications.
Interconnect / Communications System:
Nodes communicate via a
multi-stage, multi-level switch, which unlike the nearest neighbour transputer
links of the CS1, offer a low latency any-where-to-anywhere connectivity.
Memory System:
Each node will typically have between 64 and 128 MBytes
of memory per node, but by the use of a Direct memory Access (DMA) facility
between nodes, the system has the support mechanism for virtual global memory.
Benchmarks / Compute and data transfer performance:
Meiko peak
performance figures are 40MFLOPS (double precision) per scalar node, and early
application results suggest that achievable sustainable figures of in excess of
half of this may be likely.
The vector nodes are reported by Meiko as capable of 200MFLOPS in double
precision for peak performance.
Meiko figures of 100MBytes per second bidirectionally between nodes are
probably realistic, and the architecture components are designed to be able to
sustain 800MBytes second ultimately.
The error corrected memory system is organised into 16 independent banks with
an aggregate bandwidth of 3.2GBytes per second.
Operating System Software and Environment:
The CS2 uses a multiple
instance Solaris UNIX operating system, to provide a multi user access system.
Unlike the CS1 and Concerto platforms, the CS2 supports arbitrary user logins on
the nodes.
Meiko envisage the CS2 can be used as a multiple UNIX; for explicit message
passing programs; and for data parallel programming. The CS Tools port-based
message passing environment is provided, as are the PVM, PARMACS portable
message passing systems, and Meiko also provide an Intel NX/2 look-alike message
passing interface.
Meiko provide array extended Fortran compilers which are supported by native
CS Tools communications calls.
A toolset including performance analysis and multi-process debugger is
supplied.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
The scalar nodes optimised for I/O have peripheral interfaces
for ethernet and for two SCSI-2 disk controllers and three SBus slots. Nodes
optimized for computational performance do not have this direct peripheral I.O
capability.
The attached I/O devices have facilities for striped, mirrored and RAID
filestores to give a total storage capacity of over 4TBytes using commodity
disks.
Networking is provided by multiple ethernet and FDDI connections running
standard protocols such as FTP,TCP/IP and UNIX streams and sockets, Telnet and
NFS. Multiple HiPPI connections for framestores and mass storage devices are
also provided.
Notable Applications / Customers / Market Sectors:
Notable customers at
present are the Lawrence Livermore National Laboratory (USA), which have ordered
a large vector node system. CERFACS (France) and CERN (Switzerland) which have
both ordered hybrid scalar/vector node systems. Southampton University already
have an early model of the CS2.
Meiko are clearly targeting their existing market in database systems and
ORACLE users with the I/O optimised scalar nodes. Other important market sectors
include computational electromagnetics, computational fluid dynamics, molecular
dynamics and engineering simulations.
Overall Comments:
This machine has some interesting hardware
developments, specifically the switching system and direct memory access
mechanism.
hawick@npac.syr.edu
saleh@npac.syr.edu