Sign up ×
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them, it only takes a minute:

I have a 2D Python array, from which I would like to remove certain columns, but I don't know how many I would like to remove until the code runs.

I want to loop over the columns in the original array, and if the sum of the rows in any one column is about a certain value I want to remove the whole column.

I started to do this the following way:

for i in range(original_number_of_columns)
    if sum(original_array[:,i]) < certain_value:
        new_array[:,new_index] = original_array[:,i]
        new_index+=1

But then I realised that I was going to have to define new_array first, and tell Python what size it is. But I don't know what size it is going to be beforehand.

I have got around it before by firstly looping over the columns to find out how many I will lose, then defining the new_array, and then lastly running the loop above - but obviously there will be much more efficient ways to do such things!

Thank you.

share|improve this question
1  
You might be able to just collapse the original array, but you probably need to work backwards, removing the farthest columns first. – Jiminion Jul 22 '13 at 16:27

3 Answers 3

You can use the following:

import numpy as np

a = np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]
)

print a.compress(a.sum(0) > 15, 1)

[[3]
 [6]
 [9]]
share|improve this answer

without numpy

my_2d_table = [[...],[...],...]
only_cols_that_sum_lt_x = [col for col in zip(*my_2d_table) if sum(col) < some_threshold]
new_table = map(list,zip(*only_cols_that_sum_lt_x))

with numpy

a = np.array(my_2d_table)
a[:,np.sum(a,0) < some_target]
share|improve this answer
    
The question was tagged with numpy, so there's no need for a non-numpy solution. Also, I believe a.sum(0) looks nicer than np.sum(a,0), but that's just me. Regardless, nice usage of advanced indexing, I forgot you could use boolean arrays for that too. – JAB Jul 22 '13 at 16:39
    
meh ... i like np.sum because its more explicit ... I would probably actually use np.sum(a,axis=0) – Joran Beasley Jul 22 '13 at 16:42

I suggest using numpy.compress.

>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, -3, 2], [4, 5, 7]])
>>> a
array([[ 1,  2,  3],
       [ 1, -3,  2],
       [ 4,  5,  7]])
>>> a.sum(axis=0)  # sums each column
array([ 6,  4, 12])
>>> a.sum(0) < 5
array([ False, True,  False], dtype=bool)
>>> a.compress(a.sum(0) < 5, axis=1)  # applies the condition to the elements of each row so that only those elements in the rows whose column indices correspond to True values in the condition array will be kept
array([[ 2],
       [-3],
       [ 5]])
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.