I am trying to plot some data effectively so I can visualise it but I am having some trouble. I have two values. One is discrete (0 or 1) and called label
. The other is a continuous value anywhere between 0 and 1. I wish to create a histogram, where on the X axis there would be numerous bars, for example one for every .25 of data, so four bars, where the first has the value of 0-0.25, the second 0.25-0.5, third 0.5-0.75 and fourth 0.75-1.
The y axis will then be split up by whether label is a 1 or a 0, so we end up with a graph like this :
If there is any effective, intelligent ways to split up my data (rather than just having four bars hardcoded for these values) I would be interested in this too, though that probably warrants another question. I will post it when I have code from this running.
I have both values stored in numpy arrays as follows, but am unsure how to plot a graphs like this :
import numpy as np
import pylab as P
variable_values = trainData.get_vector('variable') #returns one dimensional numpy array of vals
label_values = trainData.get_vector('label')
x = alchemy_category_score_values[alchemy_category_score_values != '?'].astype(float) #removing void vals
y = label_values[alchemy_category_score_values != '?'].astype(float)
fig = plt.figure()
plt.title("Feature breakdown histogram")
plt.xlabel("Variable")
plt.xlim(0, 1)
plt.ylabel("Label")
plt.ylim(0, 1)
xvals = np.linspace(0,1,.02)
plt.show()
The matplotlib tutorial shows the following code to roughly achieve what I want, but I can't really understand how it works (LINK) :
P.figure()
n, bins, patches = P.hist(x, 10, normed=1, histtype='bar', stacked=True)
P.show()
Any help is greatly appreciated. Thank you.
Edit :
I am now getting the error :
AssertionError: incompatible sizes: argument 'height' must be length 5 or scalar
I have printed my two numpy arrays and they are of equal length, one is discrete, the other continuous. Here is the code I am running :
x = variable_values[variable_values != '?'].astype(float)
y = label_values[label_values != '?'].astype(float)
print x #printing numpy arrays of equal size, x is continuous, y is discrete. Both of type float now.
print y
N = 5
ind = np.arange(N) # the x locations for the groups
width = 0.45 # the width of the bars: can also be len(x) sequence
p1 = plt.bar(ind, y, width, color='r') #error occurs here
p2 = plt.bar(ind, x, width, color='y',
bottom=x)
plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.xticks(ind+width/2., ('G1', 'G2', 'G3', 'G4', 'G5') )
plt.yticks(np.arange(0,81,10))
plt.legend( (p1[0], p2[0]), ('Men', 'Women') )
plt.show()
x = mu + sigma*P.randn(1000,3)
in the link you gave? This is used to make the three stacked bars. – Aris F. Feb 13 '14 at 19:14N
variable, which is the number of bars in the histogram. Either write a 4, or uselen(x)
. – logc Feb 13 '14 at 21:11