ECSSLSkeleton

所属分类:超算/并行计算
开发工具:Visual C++
文件大小:516KB
下载次数:79
上传日期:2007-06-29 16:56:50
上 传 者Edgard
说明:  这是一个非常好的精简的并行程序库,它底层使用MPI作为驱动,只使用了简单MPI API,从而兼容各种MPI 实现产品。这个类库的特定是实现了数据并行和任务并行
(This is a very good parallel to streamline the procedures, which use MPI as a bottom-driven, use only a simple MPI API, thereby various MPI compatible products. The library is the specific data parallel and parallel tasks)

文件列表:
MPIBench (0, 2002-03-26)
MPIBench\CVS (0, 2002-03-26)
MPIBench\CVS\Root (51, 2002-03-26)
MPIBench\CVS\Repository (19, 2002-03-26)
MPIBench\CVS\Entries (665, 2002-03-26)
MPIBench\Makefile.in (764, 2002-03-20)
MPIBench\Makefile (860, 2002-03-26)
MPIBench\alltestsSun (631, 2002-03-20)
MPIBench\alltestsSiemens (631, 2002-03-20)
MPIBench\bitonic.cpp (2222, 2002-03-20)
MPIBench\fft.cpp (2111, 2002-03-20)
MPIBench\gauss.cpp (1215, 2002-03-20)
MPIBench\gauss2.cpp (1215, 2002-03-20)
MPIBench\matmult.cpp (1876, 2002-03-20)
MPIBench\matmult.result (58, 2002-03-20)
MPIBench\myheader.h (5534, 2002-03-20)
MPIBench\samplesort.cpp (4341, 2002-03-20)
MPIBench\shortest.cpp (2020, 2002-03-20)
Makefile (1956, 2002-10-01)
Skeleton.h (97055, 2002-10-01)
all2alltest.cpp (1143, 2002-03-20)
alltestsSiemens (631, 2002-03-20)
alltestsSun (726, 2002-03-20)
autoconf (0, 2002-03-26)
autoconf\CVS (0, 2002-03-26)
autoconf\CVS\Root (51, 2002-03-26)
autoconf\CVS\Repository (19, 2002-03-26)
autoconf\CVS\Entries (505, 2002-03-26)
autoconf\Makefile.in (1666, 2002-03-20)
autoconf\Makefile (1762, 2002-03-26)
autoconf\config.guess (25887, 2002-03-20)
autoconf\config.log (562, 2002-03-26)
autoconf\config.status (5578, 2002-03-26)
autoconf\config.sub (25239, 2002-03-20)
autoconf\configure (70316, 2002-03-20)
autoconf\configure.in (20686, 2002-03-20)
autoconf\conftest.c (30, 2002-03-26)
autoconf\install-sh (4773, 2002-03-20)
... ...

******************************************************** * * * * * Draft Skeleton Library Version 0.1, Oct. 2002 * * * * * ******************************************************** The (early) draft skeleton library assumes a running installation of MPI, see: http://www-unix.mcs.anl.gov/mpi/ For the moment, the "library" consists of the header file Skeleton.h. The tar-file, which contains this Readme file, contains a couple of simple example programs, among them data parallel, task parallel, and mixed data and task parallel ones. A sequence of indepependent skeleton computations is also possible (see example sequencetest.cpp). Moreover, there are a few kernels of parallel applications, namely: - matrix multiplication (using the Gentleman's algorithm; matmult.cpp) - all pairs of shortest paths in a graph (shortest.cpp) - Gaussian elimination (gauss.cpp and gauss2.cpp) - FFT (fft.cpp) - bitonic sort (bitonic.cpp) - samplesort (samplesort.cpp) Basic ideas are: - two tier model: * a parallel computation consists of a sequence of independent task parallel computations. Each task parallel computation may nest task parallel skeletons arbitrarily (e.g. using pipe and farm). * an atomic (i.e. non-nested) task parallel computation can contain data parallelism (if there are just task parallel or just data parallel computations this reduces to a one tier model) - task parallel computations proceed in two steps: * 1) a process topology is generated by nesting constructors for the corresponding skeletons appropriately * 2) the outermost skeleton is started (which in turn starts all nested skeletons automatically) - a data parallel computation uses one or more distributed data structures (like distributed arrays or matrices) and applies data parallel skeletons to them. - sequential (i.e. non-data parallel) computations within a atomic task parallel computation are replicated by all processors participating in the (typically data parallel) computation. Thus, an atomic task parallel computation can be seen as a sequential computation, where the operations on distributed data structures happen to have parallel implementations. The latter is not visible to the user. A computation using skeletons has to call InitSkeletons(int argc, char **argv) first, and it has to finish with TerminateSkeletons(). All skeletons can have C(++) functions or partial applications as arguments. The following skeletons are (currently) provided: 1) data parallel skeletons: =========================== Task parallel skeletons are methods of class for a distributed data structure. Currently, two distributed data structures are provided, namely DistributedArray and DistributedMatrix. a) seletons for distributed arrays ================================== A distributed array consists of a sequence of equally sized partitions. Each processor collaborating in a data parallel computation gets exactly one partition and is responsible for all computations corresponding to the elements of these partitions. A processor cannot directly access the elements of other partitions. However, there are communication skeletons like permutePartition and broadcastPartition which allow to replace the own partition by another one. The elements of the new replacing partition inherit the indexes from the replaced one. Moreover, there are computation skeletons like fold and scan which allow to combine the elements of all partitions. DistributedArray: creates a DistributedArray; one of the constructors corresponds to scatter (i.e. transforms a local array into a distributed one). map: creates a new DistributedArray by applying an argument function to each element of an existing DistributedArray. (variants: mapIndex, mapInPlace, mapIndexInPlace,mapPartitionInPlace) zipWith: combines corresponding elements (wrt. their index) of two (equally sized) Distributed Arrays using a binary function. (variants: zipWithIndex, zipWithInPlace, zipWithIndexInPlace) gather: transforms a distributed array to an ordinary (local) array fold: folds the elements of a DistributedArray into a single value by repeatedly applying an associative and commutative binary function to them. broadcastPartition: broadcasts a partition of a DistributedArray to all other partitions of it. broadcast: broadcasts an array element to all other elements of the corresponding array. permute: permutes the elements of a DistributedArray, which have been assigned to the different processors, according to a bijective argument function. (variant: permutePartition) scan: replaces each element a[i] of a DistributedArray by the result of folding a[0]... a[i] using the associative and commutative binary argument function of scan. allToAll: each collaborating processor sends a block of elements to every other processor; the beginnings of all blocks are specified by a DistributedArray of index arrays. This DistributedArray contains one element per processor. The received blocks are concatenated without gaps in arbitrary order. If a processor receives less elements then its local partition of the DA can accommodate, the remaining elements are filled with a user specified (dummy) value. Different partitions may contain different numbers of dummy values. allToAll fails, if a processor receives more elements than it can accommodate. copy: copies a distributed array (by using a copy constructor) copyWithGap: copies each partition of a distributed array to a partition of another distributed array with possibly different size. Missing elements are filled with a user specified dummy value. Surplus elements are omitted. show: shows a distributed array on standard output (alternative: <<) Moreover there are methods for accessing and modifying properties of a distributed array: getFirst: delivers the index of the first element of the local partition getSize: gives the total number of elements of the distributed array getLocalSize: computes the size of the local partition of the DA isLocal: checks, whether the given index corresponds to an element in the local partition of the DA setLocal: sets an element of the local partition to a new value; the given index is not a global one but an offset within the local partition getLocal: delivers the value of an element of the local partition; the given index is an offset within the local partition set: sets an element of the local partition to a new value; the given index is a global one referring to the DA as a whole get: delivers the value of an element of the local partition; the given index is a global one referring to the DA as a whole b) skeletons for distributed matrices ===================================== Distributed matrices are analogous to distributed arrays, but two dimensional. A distributed matrix consists of a matrix of equally sized partitions. Each processor collaborating in a data parallel computation gets exactly one partition and is responsible for all computations corresponding to the elements of these partitions. When constructing a distributed matrix, the numbers of partitions in a row and in a column (of the matrix of partitions) are fixed. The partitions are assigned to the collaborating processors row by row (from top to bottom) and from left to right. A processor cannot directly access the elements of other partitions. However, there are communication skeletons like permutePartition and broadcastPartition which allow to replace the own partition by another one. The elements of the new replacing partition inherit the indexes from the replaced one. Moreover, there are computation skeletons like fold which allow to combine the elements of all partitions. DistributedMatrix: creates a DistributedMatrix; one of the constructors corresponds to scatter (i.e. transforms a local matrix into a distributed one). map: creates a new DistributedMatrix by applying an argument function to each element of an existing DistributedMatrix. (variants: mapIndex, mapInPlace, mapIndexInPlace,mapPartitionInPlace) zipWith: combines corresponding elements (wrt. their index) of two (equally sized) Distributed Matrices using a binary function. (variants: zipWithIndex, zipWithInPlace, zipWithIndexInPlace) gather: transforms a distributed matrix to an ordinary (local) matrix fold: folds the elements of a DistributedMatrix into a single value by repeatedly applying an associative and commutative binary function to them. broadcastPartition: broadcasts a partition of a DistributedMatrix to all other partitions of it. broadcast: broadcasts an element of a distributed matrix to all other elements of the corresponding matrix. permutePartition: permutes the partitions of a DistributedMatrix, which have been assigned to the different processors, according to argument functions specifying the new row and column of each partition, respectively. (variant: permute for single elements still missing) rotateRows: rotations the partitions of a DistributedMatrix cyclically in horizontal direction. The number of hops (steps) can be the same for all rows or it can depend on an argument function telling for each row, how many rotation steps are needed. Negativ numbers of hops correspond to cyclic rotations to the left, positiv numbers correspond to rotations to the right. rotateCols: analogously to rotateRows, but rotation in vertical direction. scan: (still missing) allToAll: (still missing) copy: see copy constructor show: shows a distributed matrix on standard output (alternative: <<) Moreover there are methods for accessing and modifying properties of a distributed matrix: getFirstRow: delivers the index of the first row of the local partition getFirstCol: delivers the index of the first column of the local partition getRows: gives the total number of rows of the distributed matrix getRows: gives the total number of columns of the distributed matrix getLocalRows: computes the number of rows of the local partition of the DM getLocalCols: computes the number of columns of the local partition of the DM getBlocksInRow: delivers the number of blocks in a row of the DM getBlocksInCol: delivers the number of blocks in a column of the DM isLocal: checks, whether the given index corresponds to an element in the local partition of the DM setLocal: sets an element of the local partition to a new value; the given indexes are not global but a offsets within the local partition getLocal: delivers the value of an element of the local partition; the given indexes are offsets within the local partition set: sets an element of the local partition to a new value; the given indexes are global ones referring to the DM as a whole get: delivers the value of an element of the local partition; the given indexes are global ones referring to the DA as a whole (variants: getLocalGlobal, getGlobalLocal: one index is an offset of the local partition, the other one is global.) 2) task parallel skeletons: =========================== Atomic: takes a sequence of inputs and transforms it into a sequence of output values by applying a unary function to each of the inputs. It may use data parallelism internally. Filter: more general than Atomic; for each input an arbitrary number of output values can be produced (including 0). Inputs are consumed using the auxiliary operation get, outputs are produced using operation put. Initial: variant of Atomic; produces outputs without consuming input values. Final: variant of Atomic; consumes inputs without producing outputs. Pipe: establishes a pipeline of arbritray task parallel skeletons. Farm: takes a sequence of inputs and forwards each of them to one of several workers. The outputs produced by the workers are merged non-deterministically and propagated. The workers can be arbitrary task parallel skeletons. The farm performes load balancing in the sense that an idle worker gets the next task. Loop: takes a sequence of inputs and forwards them to an arbitrary task parallel body skeleton. Output values produced by the body are progated to the next skeleton or given back to the body, depending on some condition. Par: takes a sequence of inputs and forwards each of them to ALL workers. The outputs produced by the workers are merged non-deterministically and propagated. The workers can be arbitrary task parallel skeletons. (In the current implementation, PAR is no base skeleton but implemented using Pipe, Filter, and Farm. Thus, the handling is slightly different.) Running the Examples ==================== If you want to try out the examples, simply adapt the path names in the Makefile and type "make all" and then execute "alltestSun" or "alltestsSiemens", if you are using a cluster of Sun (or Linux) workstations or a Siemens hpcline, respectively. For other machines, this shell script has to be adapted to the local conventions for starting MPI. If you are only interested in the runtimes (and not in the results) of the computations, change the variable OUTPUT in the Makefile to 0. Depending on the number of processors and the speed of your machine you should also adapt the number of processors and the problem size in the shell script "alltestSun" (or "alltestsSiemens"). Please send comments and suggestions to: kuchen@uni-muenster.de Herbert Kuchen, Feb. 19, 2002

近期下载者

相关文件


收藏者