Proxy evasion browser in Python

Question

This code works. It is designed to retrieve website data directly, parse it, and open it as normal. I will develop it further later. Right now I want to know if there is any way to improve efficiency, such as not using a file etc. I would also like to know if my code could be made more pythonic (this would include PEP standards), or incidences where concatenation would be acceptable to readability. Here is the code:

import urllib.request
from tkinter import *
import webbrowser
import os

proxy_handler = urllib.request.ProxyHandler(proxies=None)
opener = urllib.request.build_opener(proxy_handler)


def navigate(query):
    response = opener.open(query)
    html = response.read()
    return html

def parse(junk):
    ugly = str(junk)[2:-1]
    lines = ugly.split('\\n')
    return lines

while True:
    url = input("Path: ")
    dump = navigate(url)

    content = parse(dump)
    with open('cache.html', 'w') as f:
        for line in content:
            f.write(line)

    webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html')

Josay · Accepted Answer · 2014-02-18 16:18:37Z

Your code does seem to follow PEP 8 which is a good thing. Also, the logic is splitted in different small functions which is even better.

If you want to make your code portable, you probably shouldn't hardcode / in your path but use os.sep or even better os.path.join instead.

In order not to call write multiple times, you could use writelines.

Now, from a functional point of view, it might be a good idea to create temporary files with tempfile for instance. If you don't want to use it, it might be better to use a hash of the original url or of the content and use it in the name of the new file.

Thank you, I never would have thought of that stuff. Could you explain more to a n00b why using tempfile would be advantageous to my programs functionality? — Alex Thornton, Feb 18 at 17:39

xv435 · Answer 2 · 2014-02-18 16:24:46Z

up vote 2 down vote

1. Make sure your variable names are both short and to the point, using a parameter called junk is pointless. Also, this could cause conflicts if you ever write other code and import this.

def parse(junk):
    ugly = str(junk)[2:-1]
    lines = ugly.split('\\n')
    return lines

2. urllib is outdated. Unless there is a specific reason you need to use urllib, you should use urllib2. Urllib2 is faster and more simple.

import urllib.request

3. This part looks a bit messy: webbrowser.open_new_tab(os.path.dirname(os.path.abspath(__file__))+'/cache.html') Perhaps separate into multiple lines for readability.

edited Feb 18 at 16:24

answered Feb 18 at 16:18

xv435
1186

Thanks alot, but I get an importerror for urllib2 running python 3.3.2 – Alex Thornton Feb 18 at 17:37

Okay. Maybe ask on StackOverflow about that. – xv435 Feb 18 at 17:38

1

urllib is outdated in Python 2, but in Python 3, urllib was removed and urllib2 was renamed to urllib.request. As such, import urllib.request should not be a problem. – icktoofay Feb 19 at 1:52

Okay. Thanks for the info. – xv435 Feb 20 at 1:58

add comment

asked	4 months ago
viewed	71 times
active	4 months ago

current community

your communities

more stack exchange communities

Proxy evasion browser in Python

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python file-system http or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Proxy evasion browser in Python

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python file-system http or ask your own question.

Related

Hot Network Questions