1
\$\begingroup\$

I have a program which is designed to be highly parallelizable. I suspect that some processors are finishing this Python script sooner then other processors, which would explain behavior I observe upstream of this code. Is it possible that this code allows some mpi processes to finish sooner than others?

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nam ='lcoe.coe'
csize = 10000
with open(dacout) as f:
    for i,l in enumerate(f):
        pass
numlines = i
dakchunks = pd.read_csv(dacout,  skiprows=0, chunksize = csize, sep='there_are_no_seperators')
linespassed = 0
vals = {}
for dchunk in dakchunks:
    for line in dchunk.values:
        linespassed += 1
        if linespassed < 49 or linespassed > numlines - 50: continue
        else:
            split_line = ''.join(str(s) for s in line).split()
        if len(split_line)==2:
              if split_line[0] == 'nan' or split_line[0] == '-nan': 
                  continue

              if split_line[1] != nam: 
                  continue
              if split_line[1] not in vals:
                  try: 
                      vals[split_line[1]] = [float(split_line[0])]
                  except NameError: 
                      continue
              else:
                  vals[split_line[1]].append(float(split_line[0]))
# Calculate mean and x s.t. Percentile_x(coe_dat)<threshold_coe
self.coe_vals = sorted(vals[nam])
self.mean_coe = np.mean(self.coe_vals)
self.p90 = np.percentile(self.coe_vals, 90)
self.p95 = np.percentile(self.coe_vals, 95)

count_vals = 0.00
for i in self.coe_vals:
    count_vals += 1
    if i > coe_threshold: 
        break
self.perc = 100 * (count_vals/len(self.coe_vals))
if rank==0:
    print>>logf, self.rp, self.rd, self.hh, self.mean_coe
    print self.rp, self.rd, self.hh, self.mean_coe, self.p90, self.perc
\$\endgroup\$
2
  • \$\begingroup\$ Compound statements are against PEP 8: Compound statements (multiple statements on the same line) are generally discouraged. \$\endgroup\$ Commented Apr 7, 2016 at 20:00
  • 3
    \$\begingroup\$ Like all parallel programming, there are no guarantees about the order between processes/threads/whatever. If you need them all to finish at the same time you need some sort of wait at the end. If you have questions about how to do that you should ask on StackOverflow \$\endgroup\$ Commented Apr 7, 2016 at 20:22

1 Answer 1

4
\$\begingroup\$

I have a program which is designed to be highly parallelizable.

I can't see this, all I can see is a broken module level program.

As for improvements in your code. There are quite a few ways you can improve your for-loop to make vals:

  • Put it in a function,
  • Use a defaultdict, this should result in something like:

    if split_line[1] != nam: 
        continue
    vals[split_line[1]].append(float(split_line[0]))
    
  • This should make you realise that you'll only ever add to vals[nam]. And so change it to just a list.

  • slice dchunk.values.

    for line in dchunk.values[:49] + dchunk.values[numlines - 50:]:
        split_line = ''.join(str(s) for s in line).split()
    

    Alternately if this doesn't work use a comprehension with enumerate.

    for line in (l for i, l in enumerate(dchunk.values) if i < 49 or i > numlines - 50):
        split_line = ''.join(str(s) for s in line).split()
    
  • Merge all the if statements into one.

    if (len(split_line) != 2 or
        split_line[0] in ('nan', '-nan') or
        split_line[1] != nam):
        continue
    

You should change some of your variables to 'constants' using the UPPER_SNAKE_CASE convention.

And you can change count_vals into a single line with len and itertools.takewhile.

count_vals = float(len(takewhile(lambda i: i > coe_threshold, coe_vals)))

You can change the way you get numlines to len(list(f)).

\$\endgroup\$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.