Skip to content

default process allocation process changed for master branch since 12/20/2021 #9850

@wzamazon

Description

@wzamazon

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

master branch

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone; then compiled from source

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 4d07260d9f79bb7f328b1fc9107b45e683cf2c4e 3rd-party/openpmix (v1.1.3-3319-g4d07260d)
 9ac0b7ecee2c97c357bf6751fdaab7a10e62df14 3rd-party/prrte (psrvr-v2.0.0rc1-4133-g9ac0b7ecee)

Please describe the system on which you are running

  • Operating system/version: Alinux2
  • Computer hardware: intel
  • Network type: EFA

Details of the problem

Starting from around 12/20/2021, the default process allocation policy seems to have changed for the master branch.

To give an example, say I want to run 8 MPI processes on 2 nodes, and I use the following command:

    mpirun -n 8 --map-by ppr:4:node --machinefile ../../2instances my_application

Previous, the processes with MPI rank 0 to 3 will be allocated on the 1st node, the processes with MPI rank 4 to 7 will be allocated on the 2nd node.

Now, the allocation has changed to a round-robin style. e.g. rank 0, 2, 4, 6 are on 1st node, and rank 1, 3, 5, 7 are on the 2nd node.

To reproduce, I wrote the following an testing program:

#include <mpi.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
	int rank, size;
	char hostname[256];

	MPI_Init(&argc, &argv);
	MPI_Comm_size(MPI_COMM_WORLD, &size);
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);

	gethostname(hostname, 256);

	printf("rank: %d hostname: %s\n", rank, hostname);

	MPI_Finalize();
	return 0;
}

and got the following output:

rank: 0 hostname: c5n-st-c5n18xlarge-1
rank: 1 hostname: c5n-st-c5n18xlarge-2
rank: 2 hostname: c5n-st-c5n18xlarge-1
rank: 3 hostname: c5n-st-c5n18xlarge-2
rank: 4 hostname: c5n-st-c5n18xlarge-1
rank: 5 hostname: c5n-st-c5n18xlarge-2
rank: 6 hostname: c5n-st-c5n18xlarge-1
rank: 7 hostname: c5n-st-c5n18xlarge-2

My questions are:

  1. Is this change intentional? My concern is that such a change will have profound impact on applications.
  2. What is the easiest way to make mpirun to go back to old behavior?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions