Iterating based on list of strings in Python

Question

With a list of YouTube videoIDs in a text file, the code below aims to loop through these while getting the comment feeds from all these videos. Could anyone spot the looping error(s) I must have made, but cannot find?

# Set the videoID list
f = open('video_ids.txt', 'r')
videoID_list = f.read().splitlines()
f.close()

# Cycle through videoID list getting comments via the YouTube API
for video_id in videoID_list:
#Define the comments generator
def comments_generator(yt_service, video_id):
    comment_feed = yt_service.GetYouTubeVideoCommentFeed(video_id=video_id)
    while comment_feed is not None:
        for comment in comment_feed.entry:
            yield comment
        next_link = comment_feed.GetNextLink()
        if next_link is None:
            comment_feed = None
        else:
            comment_feed = yt_service.GetYouTubeVideoCommentFeed(next_link.href)

        for comment in comments_generator(yt_service, video_id):

            # About the video
            video_title = entry.media.title.text
            video_date = entry.published.text

            # About comments
            author_name = comment.author[0].name.text
            raw_text = comment.content.text 
            comment_date = comment.published.text

            # Keep only alphanumeric characters and spaces in the comment text
            text = re.sub(r'\W+', ' ', raw_text)

            # Write to a file ('a' means append) - Comment text is set to lowercase [.lower()]
            f = open('video_comments.tsv', 'a')
            f.write("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))

            # Also print results on screen - Comment text is set to lowercase [.lower()]
    print("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))

What's the problem? What is the expected input and output? Do you get a Traceback? The more info you give the easier it is to help, and the more people will be interested in trying. — Gareth Webber, Apr 22 '13 at 10:16
I am trying to get the code to fetch the comments for all videoIDs in the video_ids.txt file. But the code halts after having fetched comments only for the first video in the txt. Hope that clarifies things. — textnet, Apr 22 '13 at 11:11

mekegi · Answer 1 · 2013-04-22 11:16:20Z

up vote 0 down vote

After fix some bugs in your code:

import gdata.youtube
import gdata.youtube.service
import re

yt_service = gdata.youtube.service.YouTubeService()

# Set the videoID list
f = open('video_ids.txt', 'r')
videoID_list = f.read().splitlines()
f.close()

#Define the comments generator
def comments_generator(yt_service, video_id):
  comment_feed = yt_service.GetYouTubeVideoCommentFeed(video_id=video_id)
  while comment_feed is not None:
    for comment in comment_feed.entry:
      yield comment
    next_link = comment_feed.GetNextLink()
    if next_link is None:
      comment_feed = None
    else:
      comment_feed = yt_service.GetYouTubeVideoCommentFeed(next_link.href)

f = open('video_comments.tsv', 'a')

# Cycle through videoID list getting comments via the YouTube API
for video_id in videoID_list:

  for comment in comments_generator(yt_service, video_id):

    video_entry = yt_service.GetYouTubeVideoEntry(video_id=video_id)

    # About the video
    video_title = video_entry.title.text
    video_date = video_entry.published.text
    # About comments
    author_name = comment.author[0].name.text
    raw_text = comment.content.text
    comment_date = comment.published.text

    # Keep only alphanumeric characters and spaces in the comment text
    text = re.sub(r'\W+', ' ', raw_text)
    # Write to a file ('a' means append) - Comment text is set to lowercase [.lower()]

    f.write("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))


    # Also print results on screen - Comment text is set to lowercase [.lower()]
f.close()
print("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))

edited Apr 22 '13 at 11:16

answered Apr 22 '13 at 10:32

mekegi
31016

Thanks mekegi! However I am trying to get the code to fetch the comments for all videoIDs in the video_ids.txt file. The version you suggested still stops after having fetched comments for the first video in the text file. Any ideas? – textnet Apr 22 '13 at 11:10

f = open('video_comments.tsv', 'a') inside "for" bad idea. Open file before cicles and close after cicle. I edited code, try again – mekegi Apr 22 '13 at 11:18

Thanks for helping out! But sorry. Still only gets comments for first video. To me it seems reasonable that 'for video_id in videoID_list' would read each videoID in that textfile, and then repeat the routine for getting all comments for all of them, appending to the outfile. But no success... – textnet Apr 22 '13 at 15:44

in video_ids.txt after every video_id need newline sample "xxxxx\nyyyyy" – mekegi Apr 23 '13 at 7:16

I got it to work now. The problem somehow seemed to be with getting both "video_entry." and "comment." data from the API. If choosing only one type, the loop will work as desired. – textnet Apr 23 '13 at 11:17

add a comment |

asked	1 year ago
viewed	61 times
active	1 year ago

current community

your communities

more stack exchange communities

Iterating based on list of strings in Python

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python api loops youtube or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Iterating based on list of strings in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python api loops youtube or ask your own question.

Related

Hot Network Questions