0

As a result from simulations, I parsed the output using Pandas groupby(). I am having a bit of difficulty to plot the data the way I want. Here's the Pandas output file (suppressed for simplicity) that I'm trying to plot:

                 Avg-del   Min-del    Max-del Avg-retx  Min-retx    Max-retx
Prob Producers 
0.3  1           8.060291  0.587227  26.709371  42.931779  5.130041  136.216642  
     5           8.330889  0.371387  54.468836  43.166326  3.340193  275.932170  
     10          1.012147  0.161975   4.320447   6.336965  2.026241   19.177802  
0.5  1           8.039639  0.776463  26.053635  43.160880  5.798276  133.090358  
     5           4.729875  0.289472  26.717824  25.732373  2.909811  135.289244  
     10          1.043738  0.160671   4.353993   6.461914  2.015735   19.595393

My y-axis is delay and my x-axis is the number of producers. I want to have errorbars for probability p=0.3 and another one for p=0.5. My python script is the following:

import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.expand_frame_repr', False)

outputFile = 'averages.txt'
f_out = open(outputFile, 'w')

data = pd.read_csv(sys.argv[1], delimiter=",")
result = data.groupby(["Prob", "Producers"]).mean()

print "Writing to output file: " + outputFile
result_s = str(result)
f_out.write(result_s)
f_out.close()

*** Update from James ***
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]  
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')

plt.show()

Now I have 4 sliced arrays, one for each probability. How do I slice them again based on delay(del) and retx, and plot errorbars based on ave, min, max?

0

1 Answer 1

0

Ok, there is a lot going on here. First, it is plotting 6 lines. When your code calls

plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3')
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5')

it is calling plt.plot on a 3x3 array of data. plt.plot interprets this input not as an x and y, but rather as 3 separate series of y-values (with 3 points each). For the x values, it is imputing the values 0,1,2. In other words it for the first plot call it is plotting the data:

x = [1,2,3]; y = [8.060291, 8.330889, 1.012147]
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975]
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447]

Based on your x-label, I think you want the values to be x = [1,5,10]. Try this to see if it gets the plot you want.

# iterate over the first dataframe index
for prob_index in result.index.levels[0]:
    r = result.loc[prob_index]
    labels = [col for col in r]
    lines = plt.plot(r)
    [line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
    ax = plt.gca()
    ax.legend()
    ax.set_xticks(r.index)
    ax.set_ylabel('Latency (s)')
    ax.set_xlabel('Number of producer nodes')  
Sign up to request clarification or add additional context in comments.

1 Comment

Hi James, Thanks for your reply. I noticed r gets results and index by Prob. Good. One question remains though, since my data set has more columns, how do I slice r? I will update the question based on your code. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.