Код и цели
У меня есть fortran
mpi
код, называемый elast3d_mpi.f
, который нужно скомпилировать в обоих windows и linux систем.
Ожидаемое поведение
Компиляция в linux выполняется как
mpif90 -o elast3d_mpi elast3d_mpi.f
ЗатемПрограмма может выполняться параллельно с командой mpirun
mpirun -n 2 elast3d_mpi
Выход терминала позволяет наблюдать, что на нем работают 2 процессора, как и ожидалось
There are 2 processors running this job.
Rank# 0 d1 = 1 d2 = 64
Rank# 1 d1 = 65 d2 = 128
...
Если программа работаетбез mpirun
в linux , затем он работает без ошибок и без параллельной обработки.
Проблема
Для того, чтобы скомпилировать его в Windowsиспользуется среда cygwin
. После установки этих пакетов
Package Version Status
_autorebase 001007-1 OK
alternatives 1.3.30c-10 OK
base-cygwin 3.8-1 OK
base-files 4.3-2 OK
bash 4.4.12-3 OK
binutils 2.29-1 OK
bzip2 1.0.8-1 OK
ca-certificates 2.32-1 OK
coreutils 8.26-2 OK
crypto-policies 20190218-1 OK
cygutils 1.4.16-2 OK
cygwin 3.0.7-1 OK
cygwin-debuginfo 3.0.7-1 OK
cygwin-devel 3.0.7-1 OK
dash 0.5.9.1-1 OK
diffutils 3.5-2 OK
editrights 1.03-1 OK
file 5.32-1 OK
findutils 4.6.0-1 OK
gawk 5.0.1-1 OK
gcc-core 7.4.0-1 OK
gcc-fortran 7.4.0-1 OK
getent 2.18.90-4 OK
grep 3.0-2 OK
groff 1.22.4-1 OK
gzip 1.8-1 OK
hostname 3.13-1 OK
info 6.7-1 OK
ipc-utils 1.0-2 OK
less 530-1 OK
libargp 20110921-3 OK
libatomic1 7.4.0-1 OK
libattr1 2.4.48-2 OK
libblkid1 2.33.1-1 OK
libbz2_1 1.0.8-1 OK
libcrypt0 2.1-1 OK
libfdisk1 2.33.1-1 OK
libffi6 3.2.1-2 OK
libgc1 8.0.4-1 OK
libgcc1 7.4.0-1 OK
libgdbm4 1.13-1 OK
libgfortran3 6.4.0-5 OK
libgfortran4 7.4.0-1 OK
libgmp10 6.1.2-1 OK
libgomp1 7.4.0-1 OK
libguile17 1.8.8-3 OK
libguile2.0_22 2.0.14-3 OK
libiconv 1.14-3 OK
libiconv2 1.14-3 OK
libintl8 0.19.8.1-2 OK
libisl15 0.16.1-1 OK
libltdl7 2.4.6-7 OK
liblzma5 5.2.4-1 OK
libmpc3 1.1.0-1 OK
libmpfr6 4.0.2-1 OK
libncursesw10 6.1-1.20190727 OK
libopenmpi-devel 3.1.3-1 OK
libopenmpi12 1.10.7-1 OK
libopenmpi40 3.1.3-1 OK
libopenmpicxx1 1.10.4-1 OK
libopenmpifh12 1.10.7-1 OK
libopenmpifh40 3.1.3-1 OK
libopenmpiusef08_40 3.1.3-1 OK
libopenmpiusetkr40 3.1.3-1 OK
libp11-kit0 0.23.15-1 OK
libpcre1 8.43-1 OK
libpipeline1 1.5.1-1 OK
libpkgconf3 1.6.0-1 OK
libpopt-common 1.16-2 OK
libpopt0 1.16-2 OK
libquadmath0 7.4.0-1 OK
libreadline7 7.0.3-3 OK
libsigsegv2 2.10-2 OK
libsmartcols1 2.33.1-1 OK
libssl1.1 1.1.1d-1 OK
libstdc++6 7.4.0-1 OK
libtasn1_6 4.14-1 OK
libunistring2 0.9.10-1 OK
libuuid1 2.33.1-1 OK
login 1.13-1 OK
make 4.2.1-2 OK
man-db 2.7.6.1-1 OK
mintty 3.0.6-1 OK
ncurses 6.1-1.20190727 OK
openmpi 3.1.3-1 OK
openmpi-debuginfo 3.1.1-2 OK
openssl 1.1.1d-1 OK
p11-kit 0.23.15-1 OK
p11-kit-trust 0.23.15-1 OK
pkg-config 1.6.0-1 OK
pkgconf 1.6.0-1 OK
rebase 4.4.4-1 OK
run 1.3.4-2 OK
sed 4.4-1 OK
tar 1.29-1 OK
terminfo 6.1-1.20190727 OK
terminfo-extra 6.1-1.20190727 OK
tzcode 2019c-1 OK
tzdata 2019c-1 OK
util-linux 2.33.1-1 OK
vim-minimal 8.1.1772-1 OK
w32api-headers 5.0.4-1 OK
w32api-runtime 5.0.4-1 OK
which 2.20-2 OK
windows-default-manifest 6.4-1 OK
xz 5.2.4-1 OK
zlib0 1.2.11-1 OK
В windows (7) программа компилируется аналогичным образом, но с использованием cygwin
терминала
mpif90 -o elast_3d_mpi.exe elast3d_mpi.f
1 - когда я пытаюсь запустить его с помощью mpirun
в терминале cygwin
, у меня появляется следующая ошибка
$ mpirun -n 2 elast3d_mpi.exe
-----------------------------------------------------------------
Sorry! You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file. Sorry!
-----------------------------------------------------------------
[gauss:00824] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00824] *** Process received signal ***
[gauss:00824] Signal: Segmentation fault (11)
[gauss:00824] Signal code: Address not mapped (23)
[gauss:00824] Failing at address: 0x0
Unable to print stack trace!
[gauss:00824] *** End of error message ***
2 - когда я запускаю его с использованием orterun
реализации cygwing
втерминал cmd
, у меня есть эта ошибка
C:\Users\io\Documents\elast-mpi>orterun.exe -np 2 elast3d_mpi
------------------------------------------------------------
Sorry! You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file. Sorry!
------------------------------------------------------------------
[gauss:00827] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 -
error /cygd
rive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/
mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00827] *** Process received signal ***
[gauss:00827] Signal: Segmentation fault (11)
[gauss:00827] Signal code: Address not mapped (23)
[gauss:00827] Failing at address: 0x0
Unable to print stack trace!
[gauss:00827] *** End of error message ***
1 [main] orterun 827 cygwin_exception::open_stackdumpfile:
Dumping stack t
race to orterun.exe.stackdump
3 - при запуске программы в Windows без ortermpi.exe
программа выводит следующую ошибку
C:\Users\io\Documents\elast-mpi>elast3d_mpi
---------------------------------------------------------------------
Sorry! You were supposed to get help about:
agent-not-found
from the file:
help-plm-rsh.txt
But I couldn't find that topic in the file. Sorry!
---------------------------------------------------------------------
[gauss:00833] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error /cygd
rive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_64/src/openmpi-3.1.3/orte/
mca/plm/rsh/plm_rsh_component.c(327)
[gauss:00833] Process received signal
[gauss:00833] Signal: Segmentation fault (11)
[gauss:00833] Signal code: Address not mapped (23)
[gauss:00833] Failing at address: 0x0
Unable to print stack trace!
[gauss:00833] End of error message
[gauss:00832] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on th
e local node in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_
64/src/openmpi-3.1.3/orte/mca/ess/singleton/ess_singleton_module.c at line 532
[gauss:00832] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on th
e local node in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.3-1.x86_
64/src/openmpi-3.1.3/orte/mca/ess/singleton/ess_singleton_module.c at line 166
--------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unable to start a daemon on the local node (-127) instead o
f ORTE_SUCCESS
---------------------------------------------------------------------
---------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------
An error occurred in MPI_Init
on a NULL communicator
MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job)
[gauss:00832] Local abort before MPI_INIT completed completed successfully, but
am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Наблюдения ивопросы
- Если программа работает без
mpirun
в linux , то она работает без ошибок и без параллельной обработки. - Запуск программыв Windows без
ortermpi.exe
программа выводит ошибки. - Звучит как проблема (среда).
- Это лучший способ скомпилировать эту программу в Windows?
- Могу ли я скомпилировать одну и ту же программу с кодом mpi fortran в Windows и Linux?
- Что я могу попробовать скомпилировать программу, чтобы программа работала в системе Windows?