Returning a tuple utilizing this script [closed]

Question

The script below is taken from this site. It doesn't currently work, but I have made it work at my own computer (not currently accessible) by changing what BeautifulSoup looks for.

The script is intended to print info to the console, however, what I really want is to utilize this script to return a tuple (self.tomatometer, self.audience) (Look at the function def _process(self)).

What I want to do is pass this script a list of movie titles (in a for loop) and have it return the self.tomatometer and self.audience variables to the caller.

I managed to do this by adding return (self.tomatometer,self.audience) at the end of def _process(self), however it doesn't seem recommended and is convoluted:

Let's say I call this script convrt.py. This is what I've done:

import convrt
# this is what I'm doing, it's working, but seems weird.
convrt.RottenTomatoesRating("Movie Title Here")._process()
# returns a (self.rottenmeter, self.audience) tuple

PyCharm is warning me that I'm accessing a private method of a class. I know there isn't really anything private, but I still think this might not be the best way to have a tuple returned from using this script?

The original script:

#!/usr/bin/env python
# RottenTomatoesRating
# Laszlo Szathmary, 2011 ([email protected])

from BeautifulSoup import BeautifulSoup
import sys
import re
import urllib
import urlparse

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15'

class RottenTomatoesRating:
    # title of the movie
    title = None
    # RT URL of the movie
    url = None
    # RT tomatometer rating of the movie
    tomatometer = None
    # RT audience rating of the movie
    audience = None
    # Did we find a result?
    found = False

    # for fetching webpages
    myopener = MyOpener()
    # Should we search and take the first hit?
    search = True

    # constant
    BASE_URL = 'http://www.rottentomatoes.com'
    SEARCH_URL = '%s/search/full_search.php?search=' % BASE_URL

    def __init__(self, title, search=True):
        self.title = title
        self.search = search
        self._process()

    def _search_movie(self):
        movie_url = ""

        url = self.SEARCH_URL + self.title
        page = self.myopener.open(url)
        result = re.search(r'(/m/.*)', page.geturl())
        if result:
            # if we are redirected
            movie_url = result.group(1)
        else:
            # if we get a search list
            soup = BeautifulSoup(page.read())
            ul = soup.find('ul', {'id' : 'movie_results_ul'})
            if ul:
                div = ul.find('div', {'class' : 'media_block_content'})
                if div:
                    movie_url = div.find('a', href=True)['href']

        return urlparse.urljoin( self.BASE_URL, movie_url )

    def _process(self):
        if not self.search:
            movie = '_'.join(self.title.split())

            url = "%s/m/%s" % (self.BASE_URL, movie)
            soup = BeautifulSoup(self.myopener.open(url).read())
            if soup.find('title').contents[0] == "Page Not Found":
                url = self._search_movie()
        else:
            url = self._search_movie()

        try:
            self.url = url
            soup = BeautifulSoup( self.myopener.open(url).read() )
            self.title = soup.find('meta', {'property' : 'og:title'})['content']
            if self.title: self.found = True

            self.tomatometer = soup.find('span', {'id' : 'all-critics-meter'}).contents[0]
            self.audience = soup.find('span', {'class' : 'meter popcorn numeric '}).contents[0]

            if self.tomatometer.isdigit():
                self.tomatometer += "%"
            if self.audience.isdigit():
                self.audience += "%"
        except:
            pass

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print "Usage: %s 'Movie title'" % (sys.argv[0])
    else:
        rt = RottenTomatoesRating(sys.argv[1])
        if rt.found:
            print rt.url
            print rt.title
            print rt.tomatometer
            print rt.audience

Why do you think return (self.tomatometer, self.audience) is convoluted? — SuperBiasedMan, Nov 18 '15 at 16:25
Not that specifically, but the fact I'm accessing a private method of a class in a script that is actually supposed to print stuff to the console. Assuming I remove the "__" in __process(), would this be a nice, pythonic implementation/modification? — zerohedge, Nov 18 '15 at 16:27
I see now, I wrote an answer suggesting a different approach. Also try to write a title that summarizes what your code does, not what you want to get out of a review. For examples of good titles, check out Best of Code Review 2014 - Best Question Title Category You may also want to read How to get the best value out of Code Review - Asking Questions. — SuperBiasedMan, Nov 18 '15 at 16:51
If I understand you correctly, this is not your script, and you are asking about changes you've done to it? But you've posted the original code (not your code), and still wondering why your changes (which we can't see) doesn't work? All in all, I'm sorry to say that this is then off-topic for Code Review as I understand the guidelines. — holroy, Nov 18 '15 at 18:15
@holroy you would've understood better if you had read my OP (my adjustments do work, and I detailed what they were). In any case, thank you for the helpful comment after the question has already been answered! — zerohedge, Nov 18 '15 at 18:18

SuperBiasedMan · Accepted Answer · 2015-11-18 17:05:41Z

I'll answer your question first. Yes, it is backwards to return the value from _process. You made _process private because it's part of __init__. It's just part of creating your object. Instead of returning values from that, I think you should create a __str__ method. This is a magic method which returns a string value so when you call:

print rt

it will print out the string returned from this method. This is used by classes to create human readable outputs of the data in their class, which is exactly what you need.

Yours could look like this:

def __str__(self):
    if self.url:
        return '\n'.join(self.url, self.title, self.tomatometer, self.audience)
    else:
        return "Couldn't find data for " + self.title

This is largely the same as your approach, splitting up the attributes one per line using str.join. Note that I replaced your found with a direct test to self.url, see more about that below. I also made a more useful response for if no url is found, so the user can be sure that something happened rather than getting nothing printed.

Now here's how the class usage would look:

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print "Usage: %s 'Movie title'" % (sys.argv[0])
    else:
        print RottenTomatoesRating(sys.argv[1])

Of course you can store it and print rt too if you want.

To just get the parameters directly rather than printing them it's even easier. You can access any attribute externally with the dot syntax. So you could do this:

rt = RottenTomatoesRating("Movie Title")
result = (rt.rottenmeter, rt.audience)

General notes

You don't need to comment all the different attributes of your class. They're pretty well named so it's easy to know what they mean. Though I'd leave out found and just directly test if RottenTomatoesRating.title. It's more direct since that's all found is based on.

You also have myopener = MyOpener() in the class definition. This makes it a class attribute instead of one specific to an instance of your class. That just means every RottenTomatoesRating object will have the same myopener. If that's what you want, make it clear that it's a class attribute by separating it from the initialised values.

Speaking of which, do you realise that you don't need to initialise all the attributes outside of __init__? Defining title and search within __init__ is all you need to do for them. No matter what __init__ will run so they will always have values. Now, if you want the other attributes to at least have None as their value then it is good to keep them, but if you only had them there because you thought it was necessary then they can be removed. The difference is that if you remove url from there and it never gets set, you wouldn't be able to call rating.url as it would raise an AttributeError.

Thank you, very much, for taking the time to answer this and educate me. As I said in my OP, this script was actually written by the blog's author, not by me. I've only adjusted it slightly to get what I need from it (a tuple of (self.rottenmeter, self.audience). I do not need anything printed, only this tuple returned when I call this script from another .py file. Since I'm not sure that was clear from my OP (sorry!), I'm not sure \__str\__ is what I want here, since I don't need to print anything, or execute this file as an executable. — zerohedge, Nov 18 '15 at 16:55
@zerohedge Ah, I see I misunderstood you. I'll add a note about that. — SuperBiasedMan, Nov 18 '15 at 16:56
thank you! Just note that the code quoted in my OP does not currently work because BS is targeting the wrong elements, in my modified code BS targets correctly, and a return is added at the end of _process(). Hence my question about its efficiency. — zerohedge, Nov 18 '15 at 17:05
I've seen your edit, thank you very much! I thought about doing it this way, but then it means that I have to delete the last block of code to avoid printing to the console when this is done? And one more question: is one way (mine vs yours) faster than the other? — zerohedge, Nov 18 '15 at 17:07

Stack Exchange Network

current community

your communities

more stack exchange communities

Returning a tuple utilizing this script [closed]

1 Answer 1

General notes

Not the answer you're looking for? Browse other questions tagged python object-oriented beautifulsoup or ask your own question.

Hot Network Questions

Returning a tuple utilizing this script [closed]

1 Answer 1

General notes

Not the answer you're looking for? Browse other questions tagged python object-oriented beautifulsoup or ask your own question.

Related

Hot Network Questions