Take the 2-minute tour ×
Mathematica Stack Exchange is a question and answer site for users of Mathematica. It's 100% free, no registration required.

Here is a code I wrote that is supposed to give the length of list after some splitting and regrouping. But it is inefficient. Is there a way to rewrite in more professional and efficient way? I guess the procedural loop should be replaced with some functional command. Not sure how though, so as not to overload the memory.

NN=10^9;      
 ll = Range[12];
        k = 0; n = 7;
        AbsoluteTiming@(For[i = 1, i <= NN, i++,

            k += Length[
               Split[Sort[RandomSample[ll, n], Less], #1 - #2 == -1 &]];

            ];)
        k
share|improve this question
2  
What is this code trying to calculate? –  MarcoB 19 hours ago
    
Welcome to Mathematica.SE! I hope you will become a regular contributor. To get started, 1) take the introductory Tour now, 2) when you see good questions and answers, vote them up by clicking the gray triangles, because the credibility of the system is based on the reputation gained by users sharing their knowledge, 3) remember to accept the answer, if any, that solves your problem, by clicking the checkmark sign, and 4) give help too, by answering questions in your areas of expertise. –  bbgodfrey 19 hours ago
1  
@bbgodfrey this a question out of ignorance. Do you have an automated or other way to place these welcome messages (which are very very useful)...I am sorry to be ignorant of this –  ubpdqn 18 hours ago
    
@ubpdqn If I recall correctly, there is a standard welcome message, but I use the one above, because it covers more and (I believe) in a better order. When I happen to see a new member question, I copy this message from my computer and paste it into a comment. I know of no automated way to do this. Too bad. –  bbgodfrey 11 hours ago
    
@bbgodfrey thank you for the information...will try to be as responsible myself in future:) –  ubpdqn 11 hours ago

3 Answers 3

The immediate improvement you wanted was the following, but of course it uses large amounts of memory:

NN = 10^9;
Total@Table[Length[Split[
 Sort[RandomSample[Range[12], 7], Less], #1 - #2 == -1 &]], {i, 1,
 NN}] // AbsoluteTiming

That is, using Table and Total (but it consumes all the memory of my 16GB RAM machine).

However, there are only 792 different subsets of length 7 drawn from Range[1,12]. Therefore, I can give you the exact mean of the estimator you have constructed:

NN = 10^9;
subs = Length@Split[#, #1 - #2 == -1 &] & /@ Subsets[Range[12], {7}];
Mean[subs]*NN

This takes basically no memory. Sorry that this doesn't really answer the general question, but it answers the specific.

share|improve this answer
2  
Probably Subsets[Range[12], {7}] intead of Subsets[Range[12], 7] –  belisarius 19 hours ago
    
I assumed simulation the goal, if not, your latter is the way to do it barring OP detailing the goal and it lends itself to direct probabilistic calculations. +1 –  ciao 18 hours ago
    
Thanks, belisarius - fixed the typo. –  Patrick Stevens 18 hours ago
1  
FWIW in this case the Table result, although it looks more satisfactory is only very marginally faster than the For loop (avoiding memory issues tested with smaller NN like 10^6 ). Ultimately it seems if you want to brute force crunch it out the procedural loop is the way to go. –  george2079 9 hours ago

A quick benchmark (loungebook caveats apply) of the three fastest posts so far for getting the direct result, over list lengths 8-22, taking Ceiling[length/2] at a time:

enter image description here

As can be seen, manipulating subsets to get to the answer, while both vastly faster than the OP, quickly gets untenable (and not much beyond this range becomes impossible - you simply won't have enough RAM/Disk to store the intermediate results).

The speed of the fastest is such that for larger problems, one can use its intermediate results to do an actual complete simulation faster than using the slower two to calculate things, and using it to directly calculate the mean is instantaneous.

Original post follows

Total[RandomChoice[1 + ((Tr@Unitize@Subtract[Differences@#, 1]) & /@ 
                         Subsets[ll, {n}]),NN]]

Accomplishes same thing, over 1000X faster on the loungbook...

But I remain curious - what exactly is the calculating? To what end?

In any case, if you just want the end result (and simulation is not the goal, which I assumed was), Patrick's second answer is the way to go...

It can be sped up quite a bit for large cases:

NN + NN Mean[(Tr@Unitize@Subtract[Differences@#, 1]) & /@ Subsets[ll, {n}]]

And... it appears after cursory investigation, this can be done nearly instantly with nil memory and no need to create subsets. If the end result is what you're after and that's of interest (i.e., this is not about simulation), comment, I'll investigate/polish and post when I've time (it's quite late, I'm off for a bit).

Update:

As alluded to, here's an instantaneous, for all practical purposes, solution to getting the end result. I've assumed the OP is after the total of lengths of realizations specified, using a source list that is a contiguous range of integers (actual range matters not), so given the length of the list (end of range for a {1,2,...} list), size of samples, and number of samples, this returns the value k would have on average (so using 1 for number of samples simply gives the average length.)

This does not suffer from the arrow to the knee of solutions so far: it does not need to create nor parse subsets, so it can handle arbitrary cases, e.g., a Range@100 list with n of 30 would take (assuming one could have sufficient RAM, which you can't) much much much longer than the age of the universe to complete with my second solution, an adaptation of Patrick's that's already 6X or so faster than his on larger cases. This completes in below timer resolution on a loungebook:

get[l_, n_, s_] := 
 With[{j = If[n <= Ceiling[l/2], n, l - n + 1]},
      (s Tr[#*Range@Length@#]/Tr@#) &@(Binomial[l - n + 1,
            Range[1, j]]*Binomial[n - 1, Range[0, j - 1]])];

get[12, 7, 1*^9] // Timing

(* {0., 3500000000} *)

N.b.: If it is a simulation you're after, perhaps to get the distribution of results, this can obviously be adapted to that purpose with negligible impact on performance (basically the overhead to grab samples)...

share|improve this answer
    
beautiful and insightful +1 of course...I had a slight variant with Unitize and Differences but efficiency always eludes me...:) –  ubpdqn 18 hours ago

Exploiting @ciao efficient way of counting the length of splits ( I have upvoted his answer) allows exact calculation of expectation of split length for this setup. The 792 (Binomial[12,7]) cases make it tractable.

tally = Tally[
   6 - ((Tr@Unitize@Subtract[Differences@#, 1]) & /@ 
      Subsets[Range[12], {7}])];
tot = Total@tally[[All, 2]];
prob = ProbabilityDistribution[
  Piecewise[{#2/tot, u == #1} & @@@ tally], {u, 1, 6, 1}]
Expectation[z, z \[Distributed] prob]

yielding 7/2 which the simulations approach.

Noting there can not be length of split >6 given 7 elements and if all 6 elements chosen split seventh chosen must be within 1 of an element. The 6 cases of 6 split:

{{{1, 2}, {4}, {6}, {8}, {10}, {12}}, {{1}, {3, 
   4}, {6}, {8}, {10}, {12}}, {{1}, {3}, {5, 
   6}, {8}, {10}, {12}}, {{1}, {3}, {5}, {7, 
   8}, {10}, {12}}, {{1}, {3}, {5}, {7}, {9, 
   10}, {12}}, {{1}, {3}, {5}, {7}, {9}, {11, 12}}} 
share|improve this answer
    
Use of probability functions makes for smiles, +1. Of course, if the OP is after more than just this particular case, all of our solutions go bonkers (range 100 taken 30 at a time, anyone?). I'm on to something for those, assuming OP comments/updates post with more info. –  ciao 16 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.