Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a large data set with the format x,y,value1,value2.... value# is the value of that variable at the position x, y. The data is read in from a csv file with the x y values being in semi-random order. The x y values are not on rectilinear grid. I have on the order of millions of data points.

What I would like to do is create an image of the value# variable.

Is there a built in mechanism for doing this? If there is not a built in mechanism, how do I build a two array of the vaule# with the correct ordering.

share|improve this question
    
I'm not 100% sure what you want to do. In order to save a plot you create, you use savefig(). Check out this answer - stackoverflow.com/questions/9622163/…. Also, check out the docs on scatterplots - matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter. And as I always suggest, brush up on how to formulate a good, coherent question - stackoverflow.com/help/how-to-ask. –  Austin A Feb 19 at 18:34
1  
I think what you need is an interpolation. Have a look at docs.scipy.org/doc/scipy/reference/generated/… –  imaluengo Feb 19 at 18:37

1 Answer 1

Do you only have single instances of x AND y? Are all your value#'s of equal length? If these are the cases it will be a lot easier for you. As far as I know, there is no simple way to tell imshow to do this, but hopefully someone else here knows more about this than I do. You might need to restructure the data. I would learn as much as I can about Python's Pandas package if you are wanting to work with large datasets. Like R, it allows the creation of data frames. I think imshow needs your data to be shaped as x by y with your value#'s as your cell values. Here is an example for you to follow that uses Pandas. There's probably a much more graceful way to go about this, but you should get the point.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(columns=['x','y','data_value'])
df['x'] = [1,2,1,2]
df['y'] = [1,1,2,2]
df['data_value'] = [1,2,3,4]

print(df) # so you see what's going on

df2 = pd.DataFrame(columns=df['x'].unique(), index = df['y'].unique())

print(df2) # so you see what's going on

# making x columns and y rows
for i in df2.index:
    for j in df2.columns:
        df2.ix[i,j] = (df[(df['y']==i) & (df['x']==j)]['data_value']).values[0]

print(df2)

Oh, and going to plot this (imshow didn't like the ints here)

plt.imshow(np.array(df2.astype(float)))
plt.show()
share|improve this answer
    
I think this will work. I will try it out later tonight. –  JMD Feb 22 at 0:14

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.