Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I want to optimize my Apriori algorithm for speed.

from itertools import combinations
import pandas as pd
import numpy as np

trans=pd.read_table('output.txt', header=None,index_col=0)

def apriori(trans, support=0.01, minlen=1):
    ts=pd.get_dummies(trans.unstack().dropna()).groupby(level=1).sum()

    collen, rowlen  =ts.shape

    #-------------Max leng (not used)
    #tssum=ts.sum(axis=1)
    #maxlen=int(tssum.loc[tssum.idxmax()])


    pattern = []
    for cnum in range(minlen, rowlen+1):
        for cols in combinations(ts, cnum):
            patsup = ts[list(cols)].all(axis=1).sum()
            patsup=float(patsup)/collen
            pattern.append([",".join(cols), patsup])
    sdf = pd.DataFrame(pattern, columns=["Pattern", "Support"])
    results=sdf[sdf.Support >= support]

    return results

When you input a dataframe of transactions:

>>> trans
        1  2    3    4
0                     
11      a  b    c  NaN
666     a  d    e  NaN
10101   b  c    d  NaN
1010    a  b    c    d
414147  b  c  NaN  NaN
10101   a  b    d  NaN
1242    d  e  NaN  NaN
101     a  b    c  NaN
411     c  d    e  NaN
444     a  b    c  NaN

[10 rows x 4 columns]

It will yield:

Ap=apriori(trans)
print Ap
>>> 
   Pattern  Support
0        a      0.6
1        b      0.7
2        c      0.7
3        d      0.6
4        e      0.3
5      a,b      0.5
6      a,c      0.4
7      a,d      0.3
8      a,e      0.1
9      b,c      0.6
10     b,d      0.3
12     c,d      0.3
13     c,e      0.1
14     d,e      0.3
15   a,b,c      0.4
16   a,b,d      0.2
18   a,c,d      0.1
20   a,d,e      0.1
21   b,c,d      0.2
24   c,d,e      0.1

I want to know if this can be optimized further so that it can run faster on large datasets. I also want to know if there a way to use purely pandas without combinations from itertools.

share|improve this question
    
You may want to check out pyFIM for an alternative approach: borgelt.net/pyfim.html –  Chris Farrow Apr 17 at 19:42
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.