| 3 | | == The Project == |
| 4 | | |
| 5 | | The existing solutions to program parallel architectures range from |
| 6 | | parallelizing compilers to distributed concurrent programming offered |
| 7 | | by libraries such as MPI. For shared-memory machines or |
| 8 | | multi-core machines, libraries based on threads are widely in |
| 9 | | use. Intermediate approaches |
| 10 | | propose a more structured parallelism. The parallelism is exposed to |
| 11 | | the programmer to a less extend, but still allows her to specify |
| 12 | | parallel aspects of the algorithm to be implemented. These |
| 13 | | intermediate approaches thus give more control over parallelism than |
| 14 | | automatic parallelization but are less complex than message passing or |
| 15 | | thread-based libraries. |
| 16 | | |
| 17 | | Algorithmic |
| 18 | | skeletons are |
| 19 | | one of these approaches. An algorithmic skeleton is a higher-order |
| 20 | | function that captures the pattern of a parallel algorithm such as a |
| 21 | | pipeline, a parallel reduction, etc. Often the sequential |
| 22 | | semantics of the skeleton is quite simple and corresponds to the usual |
| 23 | | semantics of similar higher-order functions in functional programming |
| 24 | | languages. The user of a skeleton library has just to compose some the |
| 25 | | skeletons to write her parallel application. In skeletal parallelism, |
| 26 | | data-structure are considered mostly globally for the whole parallel |
| 27 | | machine, even in the case of distributed memory machine. That eases |
| 28 | | the writing and reading of parallel programs compared to the Single |
| 29 | | Program Multiple Data (SPMD) paradigm in which data structures can |
| 30 | | only be described locally to a process. The development of SPMD or |
| 31 | | threaded programs for shared memory machines is also difficult because |
| 32 | | they may contain indeterminism and deadlocks. This is confirmed by |
| 33 | | the high complexity of related verification |
| 34 | | problems. |
| 35 | | |
| 36 | | When one is designing a parallel program, the parallel performance is |
| 37 | | of course important. It is thus very interesting for the programmer to |
| 38 | | rely on a simple yet realistic parallel cost model such as |
| 39 | | BSP (Bulk Synchronous |
| 40 | | Parallelism) or CGM (Coarse Grained Model). The BSP |
| 41 | | model targets all general purpose parallel architectures even if the |
| 42 | | abstract BSP computer is a distributed memory machine. Its execution |
| 43 | | model separates synchronization and communication and obliges both to |
| 44 | | be collective operations. It proposes a simple and accurate cost |
| 45 | | model (in this context, cost means the estimate of parallel execution |
| 46 | | time) making it possible to predict performances in a realistic and |
| 47 | | portable way. The theory of the proof of BSP |
| 48 | | programs is also close in complexity to |
| 49 | | the sequential case. The BSP model was used successfully for a broad |
| 50 | | variety of problems: scientific computation, |
| 51 | | genetic algorithms, genetic |
| 52 | | programming, neural networks, parallel databases, constraints |
| 53 | | solvers, etc. |
| 54 | | |
| 56 | | provides a set of data parallel skeletons which follow the BSP model |
| 57 | | of parallel computation. OSL is a library for C++ currently |
| 58 | | implemented on top of MPI and it uses meta-programming techniques to |
| 59 | | offer a good efficiency. Our goal is thus to provide an easy to use |
| 60 | | library for a widely used programming language and that allows simple |
| 61 | | reasoning about parallel performances based on a simple and portable |
| 62 | | cost model. |
| | 4 | provides a set of data parallel skeletons which follow the bulk synchronous parallel ([http://www.bsp-worldwide.org BSP]) model |
| | 5 | of parallel computation. OSL is a library for C++ currently implemented on top of MPI and it uses meta-programming techniques to offer a good efficiency. Our goal is thus to provide an easy to use |
| | 6 | library for a widely used programming language and that allows simple reasoning about parallel performances based on a simple and portable cost model. |