Python - Remove a row from numpy array?

Question

Hi all what I wan't should be really simple for somebody here..I want to remove a row from a numpy array in a loop like:

for i in range(len(self.Finalweight)):
        if self.Finalweight[i] >= self.cutoffOutliers:
            "remove line[i from self.wData"

I'm trying to remove outliers from a dataset. My full code os the method is like:

def calculate_Outliers(self):
    def calcWeight(Value):
        pFinal = abs(Value - self.pMed)/ self.pDev_abs_Med
        gradFinal = abs(gradient(Value) - self.gradMed) / self.gradDev_abs_Med
        return pFinal * gradFinal

    self.pMed = median(self.wData[:,self.yColum-1])
    self.pDev_abs_Med = median(abs(self.wData[:,self.yColum-1] - self.pMed))
    self.gradMed = median(gradient(self.wData[:,self.yColum-1]))
    self.gradDev_abs_Med = median(abs(gradient(self.wData[:,self.yColum-1]) - self.gradMed))    
    self.workingData= self.wData[calcWeight(self.wData)<self.cutoffOutliers]

    self.xData = self.workingData[:,self.xColum-1]
    self.yData = self.workingData[:,self.yColum-1]

I'm getting the following error:

ile "bin/dmtools", line 201, in plot_gride self.calculate_Outliers() File "bin/dmtools", line 188, in calculate_Outliers self.workingData= self.wData[calcWeight(self.wData)>self.cutoffOutliers] ValueError: too many indices for array

If you are removing a lot of elements it would probably be faster to just create a new array and fill it with the values that pass the cutoff. Each call to remove with a numpy array will force a lot of value swaps to move all of the elements after the deleted index down. Also, your loop won't work if you are removing elements. — GWW
– GWW, Commented Feb 18, 2011 at 19:13
@Canesin: OK, do the logic construction in a loop (if really necessary, but most probable it could also be vectorized) and then in one step (as I suggested) construct a new array based on your conditions. — eat
– eat, Commented Feb 18, 2011 at 19:44
@Canesin: So it seems that your for i in range(... construction could be replaced with simple statement `self.wData= self.wData[self.Finalweight>= self.cutoffOutliers] Right? An other observation, if your calculation variables are temporal in nature there is no need to treat them as instance variables. Thanks — eat
– eat, Commented Feb 18, 2011 at 19:56
@eat: self.wData= self.wData[self.Finalweight>= self.cutoffOutliers] give [] but I'm sure many rowns in wData statisfy — canesin
– canesin, Commented Feb 18, 2011 at 20:08
@Canesin: How can you be sure, if you haven't implemented yet your for i in range(... construct? Anyway these are my guidelines in general how to avoid deleting rows, columns, or elements of numpy-array in loops. I haven't considered your actual self.Finalweight calculation logic at all. Please clarify if my suggested way to handle 'deleting items' from numpy-array is no way applicable to you. Thanks — eat
– eat, Commented Feb 18, 2011 at 20:20

Sven Marnach · Accepted Answer · 2011-02-18 23:01:09Z

2

There is actually a tool in NumPy specifically made to mask out outliers and invalid data points: masked arrays. Example from the linked page:

x = numpy.array([1, 2, 3, -1, 5])
mx = numpy.ma.masked_array(x, mask=[0, 0, 0, 1, 0])
print mx.mean()

prints

2.75

edited Feb 18, 2011 at 23:01

answered Feb 18, 2011 at 22:44

Sven Marnach

607k123 gold badges965 silver badges865 bronze badges

You could also do it simply using x[x!=-1].mean().

Developer
– Developer

11/25/2012 10:33:37
Commented Nov 25, 2012 at 10:33

Add a comment |

Collectives™ on Stack Overflow

Python - Remove a row from numpy array?

1 Answer 1

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related