1

I have two dataframes net and M.

net =
        i  j   d
    0   5  3   3 
    1   2  0   2
    2   3  2   1 
    3   4  5   2   
    4   0  1   3
    5   0  3   4


M =
    0    1    2    3    4    5
0   0    3    2    4    1    5 
1   3    0    2    0    3    3 
2   2    2    0    1    1    4 
3   4    0    1    0    3    3     
4   1    3    1    3    0    2
5   5    3    4    3    2    0

I want to find in M the same values of net['d'], choose randomly a cell in M and create a new dataframe containing the coordinate of that cell. For instance

net['d'][0] = 3  

so in M I find:

M[0][1]
M[1][0]
M[1][4]
M[1][5]
...

Finally net1 would be something like that

   net1 =
       i1  j1   d1
    0   1   5    3 
    1   5   4    2
    2   2   3    1 
    3   1   2    2   
    4   1   5    3
    5   3   0    4

This what I am doing:

I1 = []
J1 = []
for i in net.index:
    tmp = net['d'][i]
    ds = np.where( M == tmp)
    size = len(ds[0])
    ind = randint(size) ## find two random locations with distance ds
    h = ds[0][ind]
    w = ds[1][ind]
    I1.append(h)
    J1.append(w)
net1 = pd.DataFrame()
net1['i1'] = I1
net1['j1'] = J1
net1['d1'] = net['d']

I am wondering which is the best way to avoid that loop

1 Answer 1

0

You can stack the columns of M and then just sample it with replacement

net = pd.DataFrame({'i':[5,2,3,4,0,0], 
                    'j':[3,0,2,5,1,3], 
                    'd':[3,2,1,2,3,4]})

M = pd.DataFrame({0:[0,3,2,4,1,5], 
                  1:[3,0,2,0,3,3], 
                  2:[2,2,0,1,1,4],
                  3:[4,0,1,0,3,3],
                  4:[1,3,1,3,0,2],
                  5:[5,3,4,3,2,0]})

def random_net(net, M):
    # make long table and randomize order of rows and rename columns
    net1 = M.stack().reset_index()
    net1.columns =['i1', 'j1', 'd1']

    # get size of each group for random mapping
    net1_id_length = net1.groupby('d1').size()

    # add id column to uniquely identify row in net
    net_copy = net.copy()

    # first map gets size of each group and second gets random integer
    net_copy['id'] = net_copy['d'].map(net1_id_length).map(np.random.randint)
    net1['id'] = net1.groupby('d1').cumcount()

    # make for easy lookup
    net_copy = net_copy.set_index(['d', 'id'])
    net1 = net1.set_index(['d1', 'id'])

    # choose from net1 only those from original net
    return net1.reindex(net_copy.index).reset_index('d').reset_index(drop=True).rename(columns={'d':'d1'})

random_net(net, M)

output

   d1  i1  j1
0   3   5   1
1   2   0   2
2   1   3   2
3   2   1   2
4   3   3   5
5   4   0   3

Timings on 6 million rows

n = 1000000
net = pd.DataFrame({'i':[5,2,3,4,0,0] * n, 
                    'j':[3,0,2,5,1,3] * n, 
                    'd':[3,2,1,2,3,4] * n})

M = pd.DataFrame({0:[0,3,2,4,1,5], 
                  1:[3,0,2,0,3,3], 
                  2:[2,2,0,1,1,4],
                  3:[4,0,1,0,3,3],
                  4:[1,3,1,3,0,2],
                  5:[5,3,4,3,2,0]})

%timeit random_net(net, M)

1 loop, best of 3: 13.7 s per loop

7
  • But I would like to get only the values of net['d'] in the same order, so that net['d'] = net['d1']
    – emax
    Commented Dec 8, 2016 at 20:39
  • Check the function now. It will give you a random new net the same size as the original in the same order. It's actually not completely correct still - hold on....
    – Ted Petrou
    Commented Dec 8, 2016 at 20:57
  • That's perfect!
    – emax
    Commented Dec 8, 2016 at 21:03
  • Ok now its good to use. It now does a cross join from the original net to the net formed by stacking table M. Then it chooses randomly 1 new i1, j1 combination for each original value in net['d']. Before it was only choosing the value which is a mistake when there are duplicates in net['d']
    – Ted Petrou
    Commented Dec 8, 2016 at 21:23
  • I received an error whene I try to merge the dataframe. nets_merged = temp_net.merge(net1, left_on='d', right_on='d1') I get MemoryError:
    – emax
    Commented Dec 8, 2016 at 21:49

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.