Python: how to find values in a dataframe without loop?

Question

I have two dataframes net and M.

net =
        i  j   d
    0   5  3   3 
    1   2  0   2
    2   3  2   1 
    3   4  5   2   
    4   0  1   3
    5   0  3   4


M =
    0    1    2    3    4    5
0   0    3    2    4    1    5 
1   3    0    2    0    3    3 
2   2    2    0    1    1    4 
3   4    0    1    0    3    3     
4   1    3    1    3    0    2
5   5    3    4    3    2    0

I want to find in M the same values of net['d'], choose randomly a cell in M and create a new dataframe containing the coordinate of that cell. For instance

net['d'][0] = 3

so in M I find:

M[0][1]
M[1][0]
M[1][4]
M[1][5]
...

Finally net1 would be something like that

   net1 =
       i1  j1   d1
    0   1   5    3 
    1   5   4    2
    2   2   3    1 
    3   1   2    2   
    4   1   5    3
    5   3   0    4

This what I am doing:

I1 = []
J1 = []
for i in net.index:
    tmp = net['d'][i]
    ds = np.where( M == tmp)
    size = len(ds[0])
    ind = randint(size) ## find two random locations with distance ds
    h = ds[0][ind]
    w = ds[1][ind]
    I1.append(h)
    J1.append(w)
net1 = pd.DataFrame()
net1['i1'] = I1
net1['j1'] = J1
net1['d1'] = net['d']

I am wondering which is the best way to avoid that loop

Ted Petrou · Accepted Answer · 2016-12-09 00:44:31Z

You can stack the columns of M and then just sample it with replacement

net = pd.DataFrame({'i':[5,2,3,4,0,0], 
                    'j':[3,0,2,5,1,3], 
                    'd':[3,2,1,2,3,4]})

M = pd.DataFrame({0:[0,3,2,4,1,5], 
                  1:[3,0,2,0,3,3], 
                  2:[2,2,0,1,1,4],
                  3:[4,0,1,0,3,3],
                  4:[1,3,1,3,0,2],
                  5:[5,3,4,3,2,0]})

def random_net(net, M):
    # make long table and randomize order of rows and rename columns
    net1 = M.stack().reset_index()
    net1.columns =['i1', 'j1', 'd1']

    # get size of each group for random mapping
    net1_id_length = net1.groupby('d1').size()

    # add id column to uniquely identify row in net
    net_copy = net.copy()

    # first map gets size of each group and second gets random integer
    net_copy['id'] = net_copy['d'].map(net1_id_length).map(np.random.randint)
    net1['id'] = net1.groupby('d1').cumcount()

    # make for easy lookup
    net_copy = net_copy.set_index(['d', 'id'])
    net1 = net1.set_index(['d1', 'id'])

    # choose from net1 only those from original net
    return net1.reindex(net_copy.index).reset_index('d').reset_index(drop=True).rename(columns={'d':'d1'})

random_net(net, M)

output

   d1  i1  j1
0   3   5   1
1   2   0   2
2   1   3   2
3   2   1   2
4   3   3   5
5   4   0   3

Timings on 6 million rows

n = 1000000
net = pd.DataFrame({'i':[5,2,3,4,0,0] * n, 
                    'j':[3,0,2,5,1,3] * n, 
                    'd':[3,2,1,2,3,4] * n})

M = pd.DataFrame({0:[0,3,2,4,1,5], 
                  1:[3,0,2,0,3,3], 
                  2:[2,2,0,1,1,4],
                  3:[4,0,1,0,3,3],
                  4:[1,3,1,3,0,2],
                  5:[5,3,4,3,2,0]})

%timeit random_net(net, M)

1 loop, best of 3: 13.7 s per loop

But I would like to get only the values of net['d'] in the same order, so that net['d'] = net['d1'] — emax, Commented Dec 8, 2016 at 20:39
Check the function now. It will give you a random new net the same size as the original in the same order. It's actually not completely correct still - hold on.... — Ted Petrou, Commented Dec 8, 2016 at 20:57
Ok now its good to use. It now does a cross join from the original net to the net formed by stacking table M. Then it chooses randomly 1 new i1, j1 combination for each original value in net['d']. Before it was only choosing the value which is a mistake when there are duplicates in net['d'] — Ted Petrou, Commented Dec 8, 2016 at 21:23
I received an error whene I try to merge the dataframe. nets_merged = temp_net.merge(net1, left_on='d', right_on='d1') I get MemoryError: — emax, Commented Dec 8, 2016 at 21:49

Collectives™ on Stack Overflow

Python: how to find values in a dataframe without loop?

1 Answer 1

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related