Multithread MPI / SMP problem - Fortran runtime error
I'm running exciting-10 on Linux Lubuntu 15.
The make.inc file:
F90 = gfortran
F77 = $(F90)
F90_OPTS = -O3 -march=native -fopenmp -DUSEOMP -ffree-line-length-0
F77_OPTS = $(F90_OPTS)
CPP_ON_OPT = -cpp -DXS -DISO -DTETRA -DLIBXC
LIB_ARP = libarpack.a
# uncomment this line in case you want to use external LAPACK/BLAS library
export USE_SYS_LAPACK=true
LIB_LPK = -L./ -llapack -lblas
LIB_FFT = fftlib.a
LIB_BZINT = libbzint.a
LIBS = $(LIB_ARP) $(LIB_LPK) $(LIB_FFT) $(LIB_BZINT) -L/usr/lib/gcc/x86_64-linux-gnu/5/ -lgomp
F90_DEBUGOPTS = -g -fbounds-check -fbacktrace -Wall -ffree-line-length-0
F77_DEBUGOPTS = $(F90_DEBUGOPTS)
#Ignore if you don't have MPI or smplibs
MPIF90 = mpif90
MPIF90_OPTS = $(F90_OPTS) $(CPP_ON_OPT) -DMPI -DMPIRHO -DMPISEC
F77MT = $(F77)
F90MT = $(F90)
SMP_LIBS = $(LIBS)
SMPF90_OPTS = -fopenmp $(F90_OPTS)
SMPF77_OPTS = $(SMPF90_OPTS)
BUILDMPI=true
BUILDSMP=true
The program is compiled without any error
I'm running the following input file (using one thread it takes arround 20hours > without any error):
<input> <title>Zincite Raman RPA</title> <structure speciespath="/home/andrei/Downloads/exciting/species"> <crystal scale="1.889725989"> <basevect>3.2493999004 0.0000000000 0.0000000000</basevect> <basevect> -1.6246999502 2.8140628608 0.0000000000</basevect> <basevect>-0.0000000000 0.0000000000 5.2038002014</basevect> </crystal> <species speciesfile="Zn.xml"> <atom coord=" 0.333333343 0.666666687 0.000000000" /> <atom coord="0.666666627 0.333333313 0.500000000" /> </species> <species speciesfile="O.xml"> <atom coord="0.333333343 0.666666687 0.382099986" /> <atom coord="0.666666627 0.333333313 0.882099986" /> </species> </structure> <groundstate do="fromscratch" ngridk="4 4 4" rgkmax="5.0" xctype="LDA_PW"/> <properties> <raman getphonon="fromscratch" mode="4" nstep="5" displ="0.01" degree="2" elaser="1.16" elaserunit="eV" temp="298.15" broad="3.0"> <energywindow intv="0.0 0.005" points="4000" /> </raman> </properties> <xs xstype="TDDFT" ngridk="8 8 8" ngridq="8 8 8" vkloff="0.093 0.279 0.461" rgkmax="5.0" nempty="30" scissor="0.0312" broad="0.00367" dfoffdiag="true" tevout="true"> <energywindow intv="0.0 1.0" points="1000" /> <tddft fxctype="RPA"/> <qpointset> <qpoint> 0.0 0.0 0.0 </qpoint> </qpointset> </xs> </input>
When I trying to run this using openmpi (e.g. $mpirun -n 4 excitingmpi) or smp i got these errors after 3-5 minutes:
Waiting for other process to write:getevecfv:EVECFV.OUT
At line 131 of file ../../src/getevecfv.f90 (unit = 70, file = 'EVECFV.OUT')
Fortran runtime error: Non-existing record number
Waiting for other process to write:getevecfv:EVECFV.OUT
At line 131 of file ../../src/getevecfv.f90 (unit = 70, file = 'EVECFV.OUT')
Fortran runtime error: Non-existing record number
After I got this error, the computation work only in 1 thread.
I need some help please.
Andrei