Self-Consistent MPI Performance Guidelines
Message passing using the Message-Passing Interface (MPI) is at present the most widely adopted framework for programming parallel applications for distributed memory and clustered parallel systems. For reasons of (universal) implementability, the MPI standard does not state any specific performance guarantees, but users expect MPI implementations to deliver good and consistent performance in the sense of efficient utilization of the underlying parallel (communication) system. For performance portability reasons, users also naturally desire communication optimizations performed on one parallel platform with one MPI implementation to be preserved when switching to another MPI implementation on another platform. We address the problem of ensuring performance consistency and portability by formulating performance guidelines and conditions that are desirable for good MPI implementations to fulfill. Instead of prescribing a specific performance model (which may be realistic on some systems, under some MPI protocol and algorithm assumptions, etc.), we formulate these guidelines by relating the performance of various aspects of the semantically strongly interrelated MPI standard to each other. Common-sense expectations, for instance, suggest that no MPI function should perform worse than a combination of other MPI functions that implement the same functionality, no specialized function should perform worse than a more general function that can implement the same functionality, no function with weak semantic guarantees should perform worse than a similar function with stronger semantics, and so on. Such guidelines may enable implementers to provide higher quality MPI implementations, minimize performance surprises, and eliminate the need for users to make special, nonportable optimizations by hand. We introduce and semiformalize the concept of self-consistent performance guidelines for MPI, and provide a (nonexhaustive) set of such guidelines in a form that could be automat- - ically verified by benchmarks and experiment management tools. We present experimental results that show cases where guidelines are not satisfied in common MPI implementations, thereby indicating room for improvement in today's MPI implementations.