Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

A covering array is a N x k array in which each element is a from a set of v symbols, and for every t columns, every possible set of v^t choices of the symbols appears at least once. The covering array number (CAN) is the smallest N for which a covering array exists, given k, v, and t. A list of known CANs is available here.

I want to parse all of the known CANs from these pages and to find how "efficient" they are - what I mean by this is the ratio of the known N compared to v^t.

I developed Python code that access every covering array page, and parses the tables. I then sort the list of covering arrays by this ratio, and plot it using matplotlib.pyplot (using a log-scale for the y axis).

from urllib.request import urlopen
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt

# covering array object
class CAElement:
    def __init__(self, N, t, k, v):
        self.N = N
        self.t = t
        self.k = k
        self.v = v
    def set_ratio(self, ratio):
        self.ratio = ratio
    def __str__(self):
        return "CA(N=%d; t=%d, k=%d, v=%d) -> %f" % (self.N, self.t, self.k, self.v, self.ratio)

CAArray = []

# iterate over v, t in the known table ranges
for v in range(2, 26):
    for t in range(2, 7):
        # download the webpage and find the elements
        url = "http://www.public.asu.edu/~ccolbou/src/tabby/%d-%d-ca.html" % (t, v)
        response = urlopen(url)
        soup = BeautifulSoup(response)
        tables = soup.findChildren('table')
        table = tables[0]
        rows = table.findChildren('tr')

        # iterate over all rows in the one table
        for row in rows:
            cells = row.findChildren('td') # has all of the table's elements
            elements = []
            for cell in cells:
                value = cell.string
                if value is not None and value != "k" and value != "N" and value != "Source":
                        elements.append(value)
            if len(elements) >= 2:
                kParsed = int(elements[0])
                NParsed = int(elements[1])
                element = CAElement(NParsed, t, kParsed, v)
                ratio = element.N / pow(element.v, element.t)
                element.set_ratio(ratio)
                CAArray.append(element)

# sort by N/(v^t)
CAArray.sort(key=lambda x: (x.ratio, x.N, x.v, x.t, x.k), reverse=True)

# print each element (in sorted order)
for element in CAArray:
    print(element)

# plotting - using log scale for y axis
# each point is colored according to t (i.e., the "strength" of the CA)
xs = range(0, len(CAArray))
ys = [y.ratio for y in CAArray]
colors = {2:"red", 3:"blue", 4:"green", 5:"yellow", 6:"orange"}
plt.scatter(xs, ys, c=[colors[x.t] for x in CAArray])
plt.axis([min(xs), max(xs), min(ys), max(ys)])
plt.yscale('log')
plt.show()

This code does exactly what I want it to do. However, there are some problems:

  • Accessing the webpages is somewhat slow, and could be faster.
  • The code does not seem very Pythonic (and by making it so could make iterating over the array much faster).

Any suggestions are welcome!

share|improve this question

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.