I haven't done much Python programming, but I really like the language so I've been using it for side projects that I work on. The problem with this is that I don't have the opportunity to have my peers review my code and give suggestions.
I've been working on a project that will be scraping GitHub looking for security vulnerabilities. I've created a separate file in the project that contains all functions that interact with GitHub's API.
import requests
import re
import base64
import os
def getRepos(since=0):
url = 'http://api.github.com/repositories'
data = """{
since: %s
}""" % since
response = requests.get(url, data=data)
if response.status_code == 403:
print "Problem making request!", response.status_code
print response.headers
matches = re.match(r'<.+?>', response.headers['Link'])
next = matches.group(0)[1:-1]
return response.json(), next
def getRepo(url):
response = requests.get(url)
return response.json()
def getReadMe(url):
url = url + "/readme"
response = requests.get(url)
return response.json()
# todo: return array of all commits so we can examine each one
def getRepoSHA(url):
# /repos/:owner/:repo/commits
commits = requests.get(url + "/commits").json()
return commits[0]['sha']
def getFileContent(item):
ignoreExtensions = ['jpg']
filename, extension = os.path.splitext(item['path'])
if extension in ignoreExtensions:
return []
content = requests.get(item['url']).json()
lines = content['content'].split('\n')
lines = map(base64.b64decode, lines)
print 'path', item['path']
print 'lines', "".join(lines[:5])
return "".join(lines)
def getRepoContents(url, sha):
# /repos/:owner/:repo/git/trees/:sha?recursive=1
url = url + ('/git/trees/%s?recursive=1' % sha)
# print 'url', url
response = requests.get(url)
return response.json()
The code is run from here:
import github
import json
def processRepoContents(repoContents):
# for each entry in the repo
for tree in repoContents['tree']:
contentType = tree['type']
print 'contentType --- ', contentType
# if type is "blob" get the content
if contentType == 'blob':
github.getFileContent(tree)
print '***blob***'
elif contentType == 'tree':
print '***tree***'
# if type is "tree" get the subtree
if __name__ == '__main__':
repos, next = github.getRepos()
for repo in repos[0:10]:
# repoJson = github.getRepo(repo['url'])
sha = github.getRepoSHA(repo['url'])
repoJson = github.getRepoContents(repo['url'], sha)
processRepoContents(repoJson)
I was hoping to get some feedback as to whether or not I am doing anything that would be considered not a best practice.
Also - I have all these functions in a file called github.py
and I then include it by using import github
wherever I need it. I'm assuming that it would not make sense to create a class to wrap these functions, since there is never any state for the class to keep track of that the functions would require. Does this reasoning make sense or should I wrap these functions in a class?
If anyone is interested they can see all of the code in the repo here - I would love to get feed back on the rest of the code (there isn't much more), but didn't think it would all fit in this question.
prowler.py
from your github. I find it relevant for this review. – Dex' ter Jan 2 at 6:56