class BencodeObject:
Putting object inside Class names is generally not a good idea, because it doesn't tell me anything.
def __init__(self,filename):
with open(filename,'rb') as diskbytes:
self.loads(diskbytes)
def _isdictionary(self,bits): return(bits == b"d")
There is no need for the parens around the expression. They just clutter things
def _isnumber(self,bits): return(bits > b"/" and bits < b":")
That's a confusing way to express this. Instead use bits >= b'0' and bits <= '9'
def _iscolon(self,bits): return(bits == b":")
def _isint(self,bits): return(bits == b"i")
def _islist(self,bits): return(bits == b"l")
def _isend(self,bits): return(bits == b"e")
def _isnull(self,bits): return(bits == b"")
Its not clear how useful these functions are. You call them once, and calling them isn't shorter then just checking the string for equality in the first place. Also the pythons style guide recommends using underscores_to_seperate_words. As it is I keep reading isend as i send.
def loads(self,bcbuffer):
tokens = []
null = b""
numbuffer = null # a buffer for storing number bytes
I'm not sure a constant for an empty string is neccessary. If you insist I suggest it should be a global constant.
last = []
Last suggests, well, last. But this is more of a stack. I think it could have a better name
while True:
atom = bcbuffer.read(1)
if self._isdictionary(atom): last.append(atom)
Mixing between on-the-same-line, and on-the-next lines within a string of if-elifs looks ugly and makes it harder to read. I'm generally not a fan of putting the body on the same line as the if expression. But it really doesn't work when you mix both styles.
elif self._isnumber(atom):
numbuffer += atom
elif self._iscolon(atom):
tokens.append(bcbuffer.read(int(numbuffer)))
numbuffer = null
elif self._isint(atom): last.append(atom)
elif self._islist(atom): last.append(atom)
You last.append(atom)
a lot. Consider or
to combine the conditions
elif self._isend(atom):
char = last.pop()
if self._isint(char):
tokens.append(int(numbuffer))
numbuffer = null
else: break
You don't distinguish between finding invalid characters and the end of the file. In both case you just quit. I'd recommend throwing an exception if you find an invalid character.
self.bytestrings = tokens
But the tokens also include numbers. So storing them in bytestrings is kinda odd.
self.trackerURL = str(self.bytestrings[1],"utf-8")
Your loop follows the common pattern of building data structures over several iterations of the loop and then consuming them. Don't do that. It ends up being hard to follow the logic of the function because I've got to figure out what happens over several iterations of the loop. Instead, each iteration of the loop should handle a logical unit.
Here is how I'd approach the problem:
def read_number(bcbuffer, terminator):
read_byte = lambda: bcbuffer.read(1)
bytes_iter = iter(read_byte, terminator)
text = ''.join(bytes_iter)
return int(text)
def parse_tokens(self, bcbuffer):
tokens = []
while True:
atom = bcbuffer.read()
if atom == b'i':
tokens.append( read_number(bcbuffer, b'e') )
elif atom >= b'0' and atom <= b'9':
length = read_number(bcbuffer, b':')
tokens.append( bcbuffer.read(length) )
elif atom in b"lde":
pass # ignore the structure which we don't care about
elif atom == b'':
break
else:
raise IOError('Did not expect %s' % atom)
return tokens
Notice that I don't have a collection of variables to store partial results. Because each iteration of the loop handles a single "token" I don't need to store data other then my result across the iterations. It is also much clearer, I think, what is going on in the parsing.
I've not tested this code in anyway, use at own risk. I've ignored the possiblity of invalid bencodes. read_number will end up in an infinite loop if it cannot find the terminator. non-digits before the terminator will cause int to throw a ValueError. Various methods to handle those exist. I'd probably wrap file.read() to throw an exception when trying to read past the end of the file.