Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am writing a Python script that creates a customized, efficient adblocking hosts file. Basically it checks the user's DNS cache and sees if any entries there are listed in the popular hosts files. If so, it adds those entries to the user's hosts file. The purpose is to keep the hosts file small and avoid adverse effects on browsing speed. I was looking for any input on my overall approach, any ways to make it faster, and any stylistic revisions/suggestions.

import os
import urllib.request
import subprocess
import re
import datetime

# Create a list of ad domains from hosts files found online
def get_ad_dmns(src_urls):
    dmns = set()
    for src in src_urls:
        entries = [line.decode('utf-8') for line in list(urllib.request.urlopen(src))]
        for entry in entries:
            # If hosts file entry is a valid block rule, add domain to list
            if entry.startswith(('0.0.0.0', '127.0.0.1')):
                dmns.add(entry.split()[1])
    return dmns

# Create a list of domains found in the user's DNS cache
def get_dns_dmns():
    dns_cache = subprocess.check_output('ipconfig /displaydns').decode('utf-8')
    # Regex pattern to match domains in the DNS cache
    pattern = '\n\s+(\S+)\r\n\s+-'
    dmns = re.findall(pattern, dns_cache)
    return dmns

# Create a list of domains currently in the user's hosts file
def get_cur_dmns(hosts_dir):
    os.chdir(hosts_dir)
    dmns = set()
    hosts_file = open('hosts', 'r')
    for entry in hosts_file:
        if entry.startswith(('0.0.0.0', '127.0.0.1')):
            dmns.add(entry.split()[1])
    hosts_file.close()
    return dmns

# Write new domains to the hosts file   
def write_hosts_file(dmns, hosts_dir):
    os.chdir(hosts_dir)
    hosts_file = open('hosts', 'a')
    hosts_file.write('\n# Updated: {}\n'.format(datetime.datetime.now()))
    for dmn in dmns:
        hosts_file.write('0.0.0.0 {}\n'.format(dmn))
    hosts_file.close()

def main():
    hosts_dir = 'C:/Windows/System32/drivers/etc'
    srcs = ['http://winhelp2002.mvps.org/hosts.txt',
            'http://someonewhocares.org/hosts/zero/hosts']
    ad_dmns = get_ad_dmns(srcs)
    dns_dmns = get_dns_dmns()
    cur_dmns = get_cur_dmns(hosts_dir)
    dmns_to_add = [dmn for dmn in dns_dmns if dmn in ad_dmns and dmn not in cur_dmns]
    if dmns_to_add:
        print ('Adding {} domain(s)...'.format(len(dmns_to_add)))
        write_hosts_file(dmns_to_add, hosts_dir)

if __name__ == '__main__':
    main()
share|improve this question
2  
This code is a lot more fun to read if you interpret "dmns" as "damnations" instead of "domains". –  200_success Jul 12 at 16:35
    
You're adding only domains that are in the machine's DNS cache to the blocklist. Isn't the point of these blocklists to prevent undesirable sites from being accessed, before the act, not after? –  200_success Jul 12 at 16:47
    
Yeah, this obviously isn't the most robust approach to adblocking. You have to visit the site first, encounter ads, then run the script, and then reload. Also, hosts-file adblocking isn't the most reliable anyway. Nothing really compares to Firefox + ABP. But I was just trying to develop an adblock solution that, in the end, is the most efficient and just covers the sites I use the most (which are less than 1% of the filters in a typical ABP list or online hosts file). –  amt528 Jul 12 at 16:51

1 Answer 1

up vote 6 down vote accepted

Overall, this is pretty good, but all code can be better! A few suggestions:

  • There aren’t any docstrings telling me what a function does, or what it returns. With a small codebase like this, it’s fairly easy for me to just read the code, but it’s a good habit to get into.

  • I’m nitpicking, but module imports should really be alphabetically ordered. It’s not so much of a problem for short scripts like this, but it’s really useful in large codebases.

  • In the get_ad_dmns() function, you create the entries list by iterating over urllib.request.urlopen(src), then immediately iterate over the list. Could you skip creating the list? i.e.

    for line in list(urllib.request.urlopen(src)):
        entry = line.decode('utf-8')
        # rest of the code here
    

    I think that might be a little simpler and more memory efficient.

  • Rather than using dmns as an abbreviation for domains, just use the full name. Characters are cheap, and it makes your code easier to read.

  • In get_cur_dmns(), rather than using open(myfile) ... close(myfile), the more idiomatic construction is:

    with open('hosts'() as hosts_file:
        for entry in hosts_file:
            # do stuff with the entry
    

    Ditto for write_hosts_file().

  • In your main() function, the dmns_to_add list comprehension is just a little hard to read. I’d suggest adding an extra line break for the not in cur_dmns line to make it easier to read.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.