Sign up ×
Mathematica Stack Exchange is a question and answer site for users of Mathematica. It's 100% free, no registration required.

Suppose I have a list called mask composed of 1,2,3,...n. n is different in different situation. Let me takes n=3 for demonstration

mask=RandomInteger[{1,3},1000000]

and another list

list = RandomReal[{0, 1}, 1000000];

I want to pick those element corresponding not equal to 1.

Pick[list, mask, _?(# != 1 &)]; // Timing

This takes 1.125 sec

But If I already know mask only composed of 1,2,3, then this

Pick[list, mask, 2 | 3]; // Timing

is faster, it takes 0.25 sec

But the problem is I am not sure that is in mask, so this is not general.

So the question is there more efficient way than this _?(# != 1 &) pattern? Why is it slower then pattern 2|3?

share|improve this question
    
As for why 2|3 is faster than _?(# != 1 &): it's because the latter involves evaluating Mathematica code (evaluating the pure function) for each test. The former doesn't. – Szabolcs 20 hours ago
    
I tried Pick[list, mask, Except[1]] but it fails because the Except matches the whole list. Pick[list, mask, Except[1, _Integer]] works and is the same speed as 2 | 3. – 2012rcampion 14 hours ago
    
@Szabolcs I don't understand. 2|3 doesn't evaluate for each test? Then how can it know which one to pick? – matheorem 13 hours ago
    
@2012rcampion good observation! Thank you! But in this particular case, Szabolcs's method is faster. – matheorem 8 hours ago
    
@matheorem Think about how Pick might be implemented in C. For _?(# != 1 &) you'd need a callback to the main evaluator (i.e. run Mathematica code) for each test. For 2|3 you don't. You just need to test for equality between 2 (or 3) and the given list element, but this test doesn't involve running Mathematica code. It can be done only in C. – Szabolcs 7 hours ago

1 Answer 1

up vote 8 down vote accepted

Since I think version 8, Pick is optimized for the case when the pattern is a single element (i.e. 1 or 2 but not 1|2), and when the inputs are packed arrays.

If you need performance, make sure that you hit this special case. Use vectorized arithmetic operations to transform the lists into a suitable form.

Pick[list, mask, _?(# != 1 &)]; // AbsoluteTiming
(* {0.547087, Null} *)

Pick[list, Unitize[mask - 1], 1]; // AbsoluteTiming
(* {0.019021, Null} *)

My BoolEval package tries to automate this process for more complicated cases, at the cost of only a little performance.

<< BoolEval`

BoolPick[list, mask == 1]; // AbsoluteTiming
(* {0.029157, Null} *)
share|improve this answer
    
Thank you so much. What do you mean by vectorized arithmetic operations, why is it so fast? Except Unitize, are there other vectorized arithmetic operations useful? – matheorem 13 hours ago
    
@matheorem The rule of thumb is that arithmetic on packed arrays of machine numbers tends to be fast. That's because it can be implemented very efficiently in terms of SIMD instructions, it's easy to parallelize (think adding two arrays) and packed array storage is very efficient. – Szabolcs 6 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.