One of the major outcomes of EPEEC are the software releases that can be found updated on the dedicated EPEEC Github. In this dedicated repository, EPEEC experts released the final versions of the EPEEC programing environment including all planned features as depicted below:
The Parallelware Analyzer is available for providing suggestions for programmers to parallelize their applications. This tool helps following the proposed workflow of detecting defects affecting parallelization, and providing recommendations and opportunities. Automatic program transformations and annotations with OpenACC and OpenMP are provided to the programmer for improved vectorization and parallelization.
The OmpSs-2 programming model now includes extensions supporting OpenACC and OpenMP-target tasks, as well as the possibility to exploit tasking inside accelerators.
- BWAP (Bandwidth-Aware Page Placement) is a novel bandwidth-aware page placement for memory-intensive applications on NUMA systems. In contrast to the uniform interleaving policy offered by Linux, BWAP takes the asymmetric BWs of every node into account to determine and enforce an optimized application-specific weighted interleaving.
- Ambix performs dynamic page placement for hybrid multi-threaded architectures. It extends general placement mechanisms in order to consider an architecture that integrates Intel OptaneTM Persistent Memory. Ambix works with any platform where Intel OptaneTM is configured in “App Direct Mode”.
- ecoHMEM (Software Ecosystem for Heterogeneous Memory Management) is a software framework for automatic data placement in heterogeneous memory systems. It performs automatic data distribution at object allocation granularity to improve performance and enable more energy efficient memory configurations in architectures incorporating several software-manageable memory tiers, such as systems equipped with Intel Optane Persistent Memory. It is currently composed of Extrae, HMem Advisor, and FlexMalloc. ecoHMEM is first publicly released by EPEEC.
GPI, OmpSs@ArgoDSM and BSC performance analysis tool implementations
The distributed programming model GPI is now enhanced with two libraries: a compression library called Comprex and scalable collectives. Comprex compresses data which are to be communicated and has been combined with the all_reduce collective. It has been evaluated using large models with TensorFlow. The collectives are implemented as eventually consistent collectives, i.e. giving up on a globally consistent view of the properties and instead proceeding with computation upon arrival of a certain percentage of the data instead of the full amount. A second approach on stale synchronous parallelization (SSP) with a parameter server has been pursued. The parameter server has been implemented as a library that is based on GASPI communication primitives and has been tested using BPMF.
OmpSs@ArgoDSM has implemented mechanisms that preserve coherence when running with OmpSs tasks which are independent of each other. New features such as memory mapping allowing a better performance have been implemented and evaluated with synthetic benchmarks.
The BSC performance analysis tools help domain developers understand the performance behaviour and identify the current bottlenecks of the applications. Extrae has been extended to support GASPI applications and tested with the RTM application. Extrae has also been extended to instrument OpenACC calls.