Skip to content

Conversation

@jsquyres
Copy link
Member

@jsquyres jsquyres commented May 2, 2016

Signed-off-by: Jeff Squyres [email protected]

Merge this if open-mpi/ompi-release#1121 is merged.

@jsquyres
Copy link
Member Author

jsquyres commented May 2, 2016

@rhc54 Getting some Mellanox Jenkins failures, which I think are about binding (this failure has nothing to do with this PR -- this PR is about adding a bullet to NEW regarding datatypes):

07:59:50 + for hca_dev in '$(ibstat -l)'
07:59:50 ++ cat /sys/class/infiniband/mlx4_0/device/numa_node
07:59:50 + var=0
07:59:50 + export TEST_CLOSEST_NUMA=0
07:59:50 + TEST_CLOSEST_NUMA=0
07:59:50 + /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace-2/ompi_install1/bin/mpirun -np 8 --map-by dist -mca rmaps_dist_device mlx4_0 -x TEST_CLOSEST_NUMA -x TEST_PHYS_ID_COUNT -x TEST_CORE_ID_COUNT /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace-2/jenkins_scripts/jenkins/ompi/mindist_test
07:59:52 
07:59:52 Success rank - 6: only one NUMA is scheduled.
07:59:52 
07:59:52 Success rank - 2: only one NUMA is scheduled.
07:59:52 
07:59:52 Success rank - 0: only one NUMA is scheduled.
07:59:52 
07:59:52 Success rank - 4: only one NUMA is scheduled.
07:59:52 
07:59:52 Error rank - 5: scheduled on wrong NUMA node - 1, should be 0
07:59:52 
07:59:52 Error rank - 7: scheduled on wrong NUMA node - 1, should be 0
07:59:52 
07:59:52 Error rank - 3: scheduled on wrong NUMA node - 1, should be 0
07:59:52 
07:59:52 Error rank - 1: scheduled on wrong NUMA node - 1, should be 0
07:59:53 -------------------------------------------------------
07:59:53 Primary job  terminated normally, but 1 process returned
07:59:53 a non-zero exit code. Per user-direction, the job has been aborted.
07:59:53 -------------------------------------------------------
07:59:53 --------------------------------------------------------------------------
07:59:53 mpirun detected that one or more processes exited with non-zero status, thus causing
07:59:53 the job to be terminated. The first process to do so was:
07:59:53 
07:59:53   Process name: [[35509,1],1]
07:59:53   Exit code:    1
07:59:53 --------------------------------------------------------------------------

@jsquyres
Copy link
Member Author

jsquyres commented May 2, 2016

@rhc54 The same Mellanox Jenkins issues are showing up on #1612 -- I can't tell if this is a bad test in Mellanox Jenkins, or if this is a real issue on master (but unrelated to these PRs).

@ibm-ompi
Copy link

Test passed.

@jsquyres
Copy link
Member Author

This was pulled into v1.10 and will come to master when the v1.10.2 bullets are manually brought back over. Closing this PR.

@jsquyres jsquyres closed this May 28, 2016
@jsquyres jsquyres deleted the pr/commit-if-1121-is-merged-into-1.10.3 branch May 28, 2016 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants