Plot 2D array with Pandas, Matplotlib, and Numpy

Question

As a result from simulations, I parsed the output using Pandas groupby(). I am having a bit of difficulty to plot the data the way I want. Here's the Pandas output file (suppressed for simplicity) that I'm trying to plot:

                 Avg-del   Min-del    Max-del Avg-retx  Min-retx    Max-retx
Prob Producers 
0.3  1           8.060291  0.587227  26.709371  42.931779  5.130041  136.216642  
     5           8.330889  0.371387  54.468836  43.166326  3.340193  275.932170  
     10          1.012147  0.161975   4.320447   6.336965  2.026241   19.177802  
0.5  1           8.039639  0.776463  26.053635  43.160880  5.798276  133.090358  
     5           4.729875  0.289472  26.717824  25.732373  2.909811  135.289244  
     10          1.043738  0.160671   4.353993   6.461914  2.015735   19.595393

My y-axis is delay and my x-axis is the number of producers. I want to have errorbars for probability p=0.3 and another one for p=0.5. My python script is the following:

import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.expand_frame_repr', False)

outputFile = 'averages.txt'
f_out = open(outputFile, 'w')

data = pd.read_csv(sys.argv[1], delimiter=",")
result = data.groupby(["Prob", "Producers"]).mean()

print "Writing to output file: " + outputFile
result_s = str(result)
f_out.write(result_s)
f_out.close()

*** Update from James ***
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]  
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')

plt.show()

Now I have 4 sliced arrays, one for each probability. How do I slice them again based on delay(del) and retx, and plot errorbars based on ave, min, max?

James · Accepted Answer · 2016-09-08 01:37:08Z

0

Ok, there is a lot going on here. First, it is plotting 6 lines. When your code calls

plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3')
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5')

it is calling plt.plot on a 3x3 array of data. plt.plot interprets this input not as an x and y, but rather as 3 separate series of y-values (with 3 points each). For the x values, it is imputing the values 0,1,2. In other words it for the first plot call it is plotting the data:

x = [1,2,3]; y = [8.060291, 8.330889, 1.012147]
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975]
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447]

Based on your x-label, I think you want the values to be x = [1,5,10]. Try this to see if it gets the plot you want.

# iterate over the first dataframe index
for prob_index in result.index.levels[0]:
    r = result.loc[prob_index]
    labels = [col for col in r]
    lines = plt.plot(r)
    [line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
    ax = plt.gca()
    ax.legend()
    ax.set_xticks(r.index)
    ax.set_ylabel('Latency (s)')
    ax.set_xlabel('Number of producer nodes')

answered Sep 8, 2016 at 1:37

James

37k4 gold badges53 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3261338 Over a year ago

Hi James, Thanks for your reply. I noticed r gets results and index by Prob. Good. One question remains though, since my data set has more columns, how do I slice r? I will update the question based on your code. Thanks

Collectives™ on Stack Overflow

Plot 2D array with Pandas, Matplotlib, and Numpy

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related