A covering array is a N
x k
array in which each element is a from a set of v
symbols, and for every t
columns, every possible set of v^t
choices of the symbols appears at least once. The covering array number (CAN) is the smallest N
for which a covering array exists, given k
, v
, and t
. A list of known CANs is available here.
I want to parse all of the known CANs from these pages and to find how "efficient" they are - what I mean by this is the ratio of the known N
compared to v^t
.
I developed Python code that access every covering array page, and parses the tables. I then sort the list of covering arrays by this ratio, and plot it using matplotlib.pyplot
(using a log-scale for the y axis).
from urllib.request import urlopen
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
# covering array object
class CAElement:
def __init__(self, N, t, k, v):
self.N = N
self.t = t
self.k = k
self.v = v
def set_ratio(self, ratio):
self.ratio = ratio
def __str__(self):
return "CA(N=%d; t=%d, k=%d, v=%d) -> %f" % (self.N, self.t, self.k, self.v, self.ratio)
CAArray = []
# iterate over v, t in the known table ranges
for v in range(2, 26):
for t in range(2, 7):
# download the webpage and find the elements
url = "http://www.public.asu.edu/~ccolbou/src/tabby/%d-%d-ca.html" % (t, v)
response = urlopen(url)
soup = BeautifulSoup(response)
tables = soup.findChildren('table')
table = tables[0]
rows = table.findChildren('tr')
# iterate over all rows in the one table
for row in rows:
cells = row.findChildren('td') # has all of the table's elements
elements = []
for cell in cells:
value = cell.string
if value is not None and value != "k" and value != "N" and value != "Source":
elements.append(value)
if len(elements) >= 2:
kParsed = int(elements[0])
NParsed = int(elements[1])
element = CAElement(NParsed, t, kParsed, v)
ratio = element.N / pow(element.v, element.t)
element.set_ratio(ratio)
CAArray.append(element)
# sort by N/(v^t)
CAArray.sort(key=lambda x: (x.ratio, x.N, x.v, x.t, x.k), reverse=True)
# print each element (in sorted order)
for element in CAArray:
print(element)
# plotting - using log scale for y axis
# each point is colored according to t (i.e., the "strength" of the CA)
xs = range(0, len(CAArray))
ys = [y.ratio for y in CAArray]
colors = {2:"red", 3:"blue", 4:"green", 5:"yellow", 6:"orange"}
plt.scatter(xs, ys, c=[colors[x.t] for x in CAArray])
plt.axis([min(xs), max(xs), min(ys), max(ys)])
plt.yscale('log')
plt.show()
This code does exactly what I want it to do. However, there are some problems:
- Accessing the webpages is somewhat slow, and could be faster.
- The code does not seem very Pythonic (and by making it so could make iterating over the array much faster).
Any suggestions are welcome!