Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I have a file containing 3 columns, where the first two are coordinates (x,y) and the third is a value (z) corresponding to that position. Here's a short example:

x y z
0 1 14
0 2 17
1 0 15
1 1 16
2 1 18
2 2 13

I want to create a 2D array of values from the third row based on their x,y coordinates in the file. I read in each column as an individual array, and I created grids of x values and y values using numpy.meshgrid, like this:

x = [[0 1 2]    and   y = [[0 0 0]
     [0 1 2]               [1 1 1]
     [0 1 2]]              [2 2 2]]

but I'm new to Python and don't know how to produce a third grid of z values that looks like this:

z = [[Nan 15 Nan]
     [14  16  18]
     [17  Nan 13]]

Replacing Nan with 0 would be fine, too; my main problem is creating the 2D array in the first place. Thanks in advance for your help!

share|improve this question
    
I couldn't understand what is the expected functionality here. – Sait Jun 10 '15 at 19:05

Assuming the x and y values in your file directly correspond to indices (as they do in your example), you can do something similar to this:

import numpy as np

x = [0, 0, 1, 1, 2, 2]
y = [1, 2, 0, 1, 1, 2]
z = [14, 17, 15, 16, 18, 13]

z_array = np.nan * np.empty((3,3))
z_array[y, x] = z

print z_array

Which yields:

[[ nan  15.  nan]
 [ 14.  16.  18.]
 [ 17.  nan  13.]]

For large arrays, this will be much faster than the explicit loop over the coordinates.


Dealing with non-uniform x & y input

If you have regularly sampled x & y points, then you can convert them to grid indices by subtracting the "corner" of your grid (i.e. x0 and y0), dividing by the cell spacing, and casting as ints. You can then use the method above or in any of the other answers.

As a general example:

i = ((y - y0) / dy).astype(int)
j = ((x - x0) / dx).astype(int)

grid[i,j] = z

However, there are a couple of tricks you can use if your data is not regularly spaced.

Let's say that we have the following data:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1977)
x, y, z = np.random.random((3, 10))

fig, ax = plt.subplots()
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)

enter image description here

That we want to put into a regular 10x10 grid:

enter image description here

We can actually use/abuse np.histogram2d for this. Instead of counts, we'll have it add the value of each point that falls into a cell. It's easiest to do this through specifying weights=z, normed=False.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1977)
x, y, z = np.random.random((3, 10))

# Bin the data onto a 10x10 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
zi = np.ma.masked_equal(zi, 0)

fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)

plt.show()

enter image description here

However, if we have a large number of points, some bins will have more than one point. The weights argument to np.histogram simply adds the values. That's probably not what you want in this case. Nonetheless, we can get the mean of the points that fall in each cell by dividing by the counts.

So, for example, let's say we have 50 points:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1977)
x, y, z = np.random.random((3, 50))

# Bin the data onto a 10x10 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
counts, _, _ = np.histogram2d(y, x, bins=(10,10))

zi = zi / counts
zi = np.ma.masked_invalid(zi)

fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)

plt.show()

enter image description here

With very large numbers of points, this exact method will become slow (and can be sped up easily), but it's sufficient for anything less than ~1e6 points.

share|improve this answer
    
Unfortunately my x and y values don't actually correspond to indices. I suppose that was a poor choice of example! Instead they're a bit random, like 27 31 14 for my first row in the file. – Aero Jun 10 '15 at 19:40
    
If x and y are not indices how is the 2d array of z values defined? – Dave Jun 10 '15 at 22:26
    
@Dave - You have to define the extents and cellsize of the grid externally. In this example, it's the min/max of the data with a 10x10 grid in between. However, this type of "binning" data is very often used for later discrete analysis. In that case, you'll bin the data onto some pre-existing grid or bin it onto the grid that you'll be using for the rest of the analysis. – Joe Kington Jun 11 '15 at 12:07
    
@JoeKington: excellent answer, I liked the histogram method. Is there a similar trick in case of n-dimensional data? – Mannaggia Aug 11 '15 at 21:33
1  
@Mannaggia - You can use the same trick with np.histogramdd for the n-dimensional case, as far as I know. Be aware of the earlier caveat about speed, though. It may be slower with high-dimensional datasets. – Joe Kington Aug 11 '15 at 21:53

You could try something like:

import numpy as np

x = [0, 0, 1, 1, 2, 2]
y = [1, 2, 0, 1, 1, 2]
z = [14, 17, 15, 16, 18, 13]

arr = np.zeros((3,3))
yx = zip(y,x)

for i, coord in enumerate(yx):
    arr[coord] = z[i]

print arr
>>> [[  0.  15.   0.]
     [ 14.  16.  18.]
     [ 17.   0.  13.]]
share|improve this answer
2  
arr[coord[0], coord[1]] is a better way of indexing a 2d array. – hpaulj Jun 10 '15 at 19:25
    
Or even arr[coord] as with @Dave's answer – kezzos Jun 10 '15 at 19:28

Kezzos beat me to it but I had a similar approach,

x = np.array([0,0,1,1,2,2])
y = np.array([1,2,0,1,1,2])
z = np.array([14,17,15,16,18,13])
Z = np.zeros((3,3))
for i,j in enumerate(zip(x,y)): 
    Z[j] = z[i]

Z[np.where(Z==0)] = np.nan
share|improve this answer

If you have scipy installed, you could take advantage of its sparse matrix module. Get the values from the text file with genfromtxt, and plug those 'columns' directly into a sparse matrix creator.

In [545]: txt=b"""x y z
0 1 14
0 2 17
1 0 15
1 1 16
2 1 18
2 2 13
"""

In [546]: xyz=np.genfromtxt(txt.splitlines(),names=True,dtype=int)

In [547]: sparse.coo_matrix((xyz['z'],(xyz['y'],xyz['x']))).A     
Out[547]: 
array([[ 0, 15,  0],
       [14, 16, 18],
       [17,  0, 13]])

But Joe's z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z'] is considerably faster.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.