Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

I am a beginner with Unix and MPI. I have a runPR.sh script as below

DIR=/directory/buildagain/bin/Project
 FILELIST=$1

 while read FILE
 do
     echo "Processing ${FILE}..."
     ./makeInp.sh ${FILE} ${FILE} >INP/${FILE}.inp
     ${DIR} -PR INP/${FILE}.inp
 done < ${FILELIST}

For the serial program, I run the program by typing make in /directory/buildagain and then ./runPR.sh values.txt. (values.txt just contains the line Chain)

EDIT: Here is a small portion of my code.

 int main( int argc, char *argv[ ] )
 {
      MPI_Status status;
      MPI_Init(&argc,&argv); 
      if( strcmp(argv[1],"-PR") == 0 )
           runPR(argc-2, &argv[2]);
      return 0;
 }

 int runPR(int argc, char* argv[])
 { 
      cout<<"run here"<<endl;

      int mynode, totalnodes;
      int sum,startval,endval,accum;
      int master=0;

      MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); // get totalnodes
      MPI_Comm_rank(MPI_COMM_WORLD, &mynode); // get mynode

      PROpt opt;
      Solve* ps = new Solve();
      cout<<"here1"<<endl;

      cout<<"total nodes "<<totalnodes<<endl;
      for(int j=0;j<totalnodes-1;j=j+1){

           cout<<"processor"<<mynode<<"  received from "<<j<<endl;

           ps->getFile(&opt,argv[0]);
      }
 }

By typing mpirun -np 4 ../directory/buildagain/bin/Project -PR INP/Chain.inp, I see run here, here, total nodes1 printed 4 times. But I don't see cout<<"processor"<<mynode<<" received from "<<j<<endl; printed out, and I would expected total nodes to show 4, not 1. Also, the program just stops. Why is this?

share|improve this question
    
I'm pretty sure that mpirun needs an actual executable that is linked against the MPI library. A shell script won't work. Which implementation and version of MPI are you running? Can you show me your line of code with mpi_init please? –  Otheus May 11 at 22:51
    
The executable is at /directory/buildagain/bin/Project. The runPR.sh calls the executable there. I am using openmpi/1.6 –  user4352158 May 12 at 1:39
    
I made some changes to the OP in that I posted a small code sample. –  user4352158 May 12 at 1:48
1  
I cannot reproduce this with gcc 5.1 and OpenMPI 1.8.4. I get the expected behavior based on your code example. How many CPU cores do you have? Can you compile and run simple MPI programs as expected? I won't address the issues with your code logic as that is offtopic for this site. –  casey May 12 at 14:08
    
I'm using gcc/4.7 and openmpi/1.6. Again, I have no problems with a helloworld program and an MWE of my actual code. However, my actual code shows only 1 node for cout<<"This node="<<mynode<<endl; –  user4352158 May 12 at 17:20

1 Answer 1

After you reported getting output like

total nodes=1

and

This node=0 

printed out 4 times, I concluded you are trying this: mpirun -np 4 script-name.sh. It does this because mpirun is launching 4 copies of a shell script which doesn't understand MPI communication semantics.

If you can somehow get launch mpirun on a script, then remember (1) the script is running in the local "head" node environment, not the remote one, (2) the script must exec to your program as its last and final breath, and (3) when the program runs, it's in the environment on possibly another node -- possibly not having access to the files you had on the head.

So the script should look like this:

PROG="$1"; shift;
OPT="$2"; shift    
for FILE in "$@"
do
     echo "Processing ${FILE}..."
     ./makeInp.sh ${FILE} ${FILE} >INP/${FILE}.inp
done
exec $PROG $OPT "$@"

Within PROG, you'll have to index ARGV to correspond to the current node/thread. (Do check that you haven't exceeded argc or you'll get a NULL-pointer violation.) I don't think there's another/better way.

share|improve this answer
    
typing mpirun -np 4 ../directory/buildagain/bin/Project -PR INP/Chain.inp with your changes to the script, I get the same output as before –  user4352158 May 12 at 23:25
    
So without the script at all? That's not good. Too much is dependent on your environemnt. Did you tell me at one point that mpirun -np 4 helloworld actually ran 4 threads? If not, maybe your system simply isn't configured for more than one node. (Even then, it would invoke 4 instances on one node...) I'm stumped. –  Otheus May 12 at 23:45
    
Also, I noticed a problem with the modified script. I edited the answer to reflect. –  Otheus May 12 at 23:45
1  
Something is wrong with his other code or his environment. I couldn't repro his code reduced to a mvce (taking out calls he didn't provide code for). –  casey May 13 at 22:31
1  
It's a mystery to @casey and myself how mpirun will launch your program with exactly 1 node when told to use 4. Either its your "environment", or there is some mysterious problem within Solve. Eliminate the possibility that it is related to your code, by removing Solve.new and ps->getFile and trying again. If MPI-size is still only size 1, the problem is (almost certainly) your environment. –  Otheus May 17 at 8:55

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.