-
Notifications
You must be signed in to change notification settings - Fork 937
Closed
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
master branch
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone; then compiled from source
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
4d07260d9f79bb7f328b1fc9107b45e683cf2c4e 3rd-party/openpmix (v1.1.3-3319-g4d07260d)
9ac0b7ecee2c97c357bf6751fdaab7a10e62df14 3rd-party/prrte (psrvr-v2.0.0rc1-4133-g9ac0b7ecee)
Please describe the system on which you are running
- Operating system/version: Alinux2
- Computer hardware: intel
- Network type: EFA
Details of the problem
Starting from around 12/20/2021, the default process allocation policy seems to have changed for the master branch.
To give an example, say I want to run 8 MPI processes on 2 nodes, and I use the following command:
mpirun -n 8 --map-by ppr:4:node --machinefile ../../2instances my_application
Previous, the processes with MPI rank 0 to 3 will be allocated on the 1st node, the processes with MPI rank 4 to 7 will be allocated on the 2nd node.
Now, the allocation has changed to a round-robin style. e.g. rank 0, 2, 4, 6 are on 1st node, and rank 1, 3, 5, 7 are on the 2nd node.
To reproduce, I wrote the following an testing program:
#include <mpi.h>
#include <unistd.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int rank, size;
char hostname[256];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gethostname(hostname, 256);
printf("rank: %d hostname: %s\n", rank, hostname);
MPI_Finalize();
return 0;
}
and got the following output:
rank: 0 hostname: c5n-st-c5n18xlarge-1
rank: 1 hostname: c5n-st-c5n18xlarge-2
rank: 2 hostname: c5n-st-c5n18xlarge-1
rank: 3 hostname: c5n-st-c5n18xlarge-2
rank: 4 hostname: c5n-st-c5n18xlarge-1
rank: 5 hostname: c5n-st-c5n18xlarge-2
rank: 6 hostname: c5n-st-c5n18xlarge-1
rank: 7 hostname: c5n-st-c5n18xlarge-2
My questions are:
- Is this change intentional? My concern is that such a change will have profound impact on applications.
- What is the easiest way to make mpirun to go back to old behavior?