Результат MPI отличается от Slurm и с помощью команды - PullRequest
0 голосов
/ 08 ноября 2019

Я столкнулся с проблемой при запуске проекта MPI от Slurm.

a1 - мой исполняемый файл. Он работает хорошо, когда я просто запускаю mpiexec -np 4 ./a1

Но он не может работать хорошо, когда я запускаю его под Slurm, и похоже, что он останавливается посередине:

Это выводиспользуя mpiexec -np 4 ./a1, это правильно.

Processor1 will send and receive with processor0
Processor3 will send and receive with processor0
Processor0 will send and receive with processor1
Processor0 finished send and receive with processor1
Processor1 finished send and receive with processor0
Processor2 will send and receive with processor0
Processor1 will send and receive with processor2
Processor2 finished send and receive with processor0
Processor0 will send and receive with processor2
Processor0 finished send and receive with processor2
Processor0 will send and receive with processor3
Processor0 finished send and receive with processor3
Processor3 finished send and receive with processor0
Processor1 finished send and receive with processor2
Processor2 will send and receive with processor1
Processor2 finished send and receive with processor1
Processor0: I am very good, I save the hash in range 0 to 65
p: 4
Tp: 8.61754
Processor1 will send and receive with processor3
Processor3 will send and receive with processor1
Processor3 finished send and receive with processor1
Processor1 finished send and receive with processor3
Processor2 will send and receive with processor3
Processor1: I am very good, I save the hash in range 65 to 130
Processor2 finished send and receive with processor3
Processor3 will send and receive with processor2
Processor3 finished send and receive with processor2
Processor3: I am very good, I save the hash in range 195 to 260
Processor2: I am very good, I save the hash in range 130 to 195

И это вывод в Slurm, он не возвращает весь результат, как при использовании команды.

Processor0 will send and receive with processor1
Processor2 will send and receive with processor0
Processor3 will send and receive with processor0
Processor1 will send and receive with processor0
Processor0 finished send and receive with processor1
Processor1 finished send and receive with processor0
Processor0 will send and receive with processor2
Processor0 finished send and receive with processor2
Processor2 finished send and receive with processor0
Processor1 will send and receive with processor2
Processor0 will send and receive with processor3
Processor2 will send and receive with processor1
Processor2 finished send and receive with processor1
Processor2 will send and receive with processor3
Processor1 finished send and receive with processor2

Это мой Slurm.sh файл: я думаю, что допустил некоторую ошибку в том, что результат отличается от командного, но я не уверен в этом ...

#!/bin/bash

####### select partition (check CCR documentation)
#SBATCH --partition=general-compute --qos=general-compute

####### set memory that nodes provide (check CCR documentation, e.g., 32GB)
#SBATCH --mem=64000

####### make sure no other jobs are assigned to your nodes
#SBATCH --exclusive

####### further customizations
#SBATCH --job-name="a1"
#SBATCH --output=%j.stdout
#SBATCH --error=%j.stderr
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=12:00:00

mpiexec -np 4 ./a1

1 Ответ

0 голосов
/ 08 ноября 2019

Снова, вернись, чтобы решить мой вопрос. Я сделал глупую ошибку, что я использую неправильный slurm.sh для моего кода MPI. Правильный slurm.sh:

#!/bin/bash

####### select partition (check CCR documentation)
#SBATCH --partition=general-compute --qos=general-compute

####### set memory that nodes provide (check CCR documentation, e.g., 32GB)
#SBATCH --mem=32000

####### make sure no other jobs are assigned to your nodes
#SBATCH --exclusive

####### further customizations
#SBATCH --job-name="a1"
#SBATCH --output=%j.stdout
#SBATCH --error=%j.stderr
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=12
#SBATCH --time=01:00:00

####### check modules to see which version of MPI is available
####### and use appropriate module if needed
module load intel-mpi/2018.3
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

srun /.a1

Я такой глупый, поэтому я использую Konan как псевдоним ... Я надеюсь, что смогу стать умным.

...