2

what could be the most efficient way to generate a mxn binary array matrix which is contrained to the sum per column is equal 0 or 1 ?

somethin like this

[[0,0,1,0,0],
[1,1,0,0,0],
[0,0,0,0,1]
[0,0,0,0,0]]

m and n are going to be fixed, but n is larger than 500000 so iteration methods could take a long time until an appropied matrix is found.

1
  • What are the expectations in the row axis? Single value of 1, no values of 1, etc.? Commented Dec 14, 2016 at 20:52

4 Answers 4

5

You are selecting a random subset of the columns, and then for each column, a random row. Here's one way to do that using numpy. The binomial distribution is used to select which columns get a 1. Change the second argument of numpy.random.binomial to tweak the density of columns with a 1.

In [156]: m = 5

In [157]: n = 12

In [158]: a = np.zeros((m, n), dtype=int)

In [159]: cols = np.random.binomial(1, 0.7, size=n)

In [160]: a[np.random.randint(0, m, size=cols.sum()), np.nonzero(cols)[0]] = 1

In [161]: a
Out[161]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0],
       [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0]])

If you want a 1 in each column, here's a fairly concise method:

In [103]: m = 5

In [104]: n = 12

In [105]: a = (np.random.randint(0, m, size=n) == np.arange(m).reshape(-1, 1)).astype(int)

In [106]: a
Out[106]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1],
       [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

np.random.randint(0, m, size=n) is the random selection of row indices, one for each column. np.arange(m).reshape(-1, 1) is the sequence [0, 1, ..., m-1] stored in an array with shape (m, 1). When this is compared to the random values, broadcasting applies, so an boolean array with shape (m, n) is created. Just convert that to integers, and you have your result.

Sign up to request clarification or add additional context in comments.

Comments

1

You can select which column gets a value of 1:

a = numpy.zeros((ysize, xsize))
a[numpy.arange(ysize), numpy.random.choice(numpy.arange(xsize), ysize, replace=False)] = 1

Comments

1
  • First, generate a zero-filled matrix using a list comprehension and list multiplication (even faster) to generate each row. This is the step where you have to be fast. Transposing/copying data would hinder performance.
  • Then loop and choose a number. Between 0 and number of rows * 2 (so it sometimes goes off limits, leaving a 0-filled column). If it's in range, put a 1 there (you could change by (3*nb_rows)//2 to increase the number of ones. This step is fast: memory is already allocated, just mark cells.

code:

import random

m=7
n=5

matrix = [[0] * m for _ in range(n)]
print(matrix)
for i in range(m):
    a = random.randint(0,(3*n)//2)  # 66% chances to get a 1 somewhere
    if a < n:
        matrix[a][i] = 1

print(matrix)

a result:

[[0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0]]
[[0, 0, 0, 1, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 1, 0]]

Comments

1

You can use the standard library and a few clever list comprehensions to do this. First, we create the columns, with one in each being 1 or 0 so we satisfy the sum constraint. Then we flip the columns into rows to get the result.

from random import choice, randint

def generate_matrix(m, n):
    # Generate the columns
    columns = []
    for x in range(n):
        column = [0 for y in range(m)]
        column[randint(0, m - 1)] = choice([0, 1])
        columns.append(column)
    # Rotate the columns into rows
    rows = [
        [c[x] for c in columns]
        for x in range(m)
    ]
    return rows

m, n = 5, 4
matrix = generate_matrix(m, n)

Example output:

[0, 1, 0, 0]
[0, 0, 0, 0]
[0, 0, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 0]

2 Comments

but the problem with this formulation is that it is constraining the matrix with a mandatory column equal to 1, it is avoiding the posibility that a column sum could be equal to 0, so in the line column[random.randint(0, n - 1)] = 1 #could be write as: column[random.randint(0, n - 1)] = random.randint(0,1)
Updated. It now chooses 0 or 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.