1

Hi all what I wan't should be really simple for somebody here..I want to remove a row from a numpy array in a loop like:

for i in range(len(self.Finalweight)):
        if self.Finalweight[i] >= self.cutoffOutliers:
            "remove line[i from self.wData"

I'm trying to remove outliers from a dataset. My full code os the method is like:

def calculate_Outliers(self):
    def calcWeight(Value):
        pFinal = abs(Value - self.pMed)/ self.pDev_abs_Med
        gradFinal = abs(gradient(Value) - self.gradMed) / self.gradDev_abs_Med
        return pFinal * gradFinal

    self.pMed = median(self.wData[:,self.yColum-1])
    self.pDev_abs_Med = median(abs(self.wData[:,self.yColum-1] - self.pMed))
    self.gradMed = median(gradient(self.wData[:,self.yColum-1]))
    self.gradDev_abs_Med = median(abs(gradient(self.wData[:,self.yColum-1]) - self.gradMed))    
    self.workingData= self.wData[calcWeight(self.wData)<self.cutoffOutliers]

    self.xData = self.workingData[:,self.xColum-1]
    self.yData = self.workingData[:,self.yColum-1]

I'm getting the following error:

ile "bin/dmtools", line 201, in plot_gride self.calculate_Outliers() File "bin/dmtools", line 188, in calculate_Outliers self.workingData= self.wData[calcWeight(self.wData)>self.cutoffOutliers] ValueError: too many indices for array

7
  • 1
    If you are removing a lot of elements it would probably be faster to just create a new array and fill it with the values that pass the cutoff. Each call to remove with a numpy array will force a lot of value swaps to move all of the elements after the deleted index down. Also, your loop won't work if you are removing elements. Commented Feb 18, 2011 at 19:13
  • @Canesin: OK, do the logic construction in a loop (if really necessary, but most probable it could also be vectorized) and then in one step (as I suggested) construct a new array based on your conditions. Commented Feb 18, 2011 at 19:44
  • @Canesin: So it seems that your for i in range(... construction could be replaced with simple statement `self.wData= self.wData[self.Finalweight>= self.cutoffOutliers] Right? An other observation, if your calculation variables are temporal in nature there is no need to treat them as instance variables. Thanks Commented Feb 18, 2011 at 19:56
  • @eat: self.wData= self.wData[self.Finalweight>= self.cutoffOutliers] give [] but I'm sure many rowns in wData statisfy Commented Feb 18, 2011 at 20:08
  • @Canesin: How can you be sure, if you haven't implemented yet your for i in range(... construct? Anyway these are my guidelines in general how to avoid deleting rows, columns, or elements of numpy-array in loops. I haven't considered your actual self.Finalweight calculation logic at all. Please clarify if my suggested way to handle 'deleting items' from numpy-array is no way applicable to you. Thanks Commented Feb 18, 2011 at 20:20

1 Answer 1

2

There is actually a tool in NumPy specifically made to mask out outliers and invalid data points: masked arrays. Example from the linked page:

x = numpy.array([1, 2, 3, -1, 5])
mx = numpy.ma.masked_array(x, mask=[0, 0, 0, 1, 0])
print mx.mean()

prints

2.75
1
  • You could also do it simply using x[x!=-1].mean(). Commented Nov 25, 2012 at 10:33

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.