Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

With a list of YouTube videoIDs in a text file, the code below aims to loop through these while getting the comment feeds from all these videos. Could anyone spot the looping error(s) I must have made, but cannot find?

# Set the videoID list
f = open('video_ids.txt', 'r')
videoID_list = f.read().splitlines()
f.close()

# Cycle through videoID list getting comments via the YouTube API
for video_id in videoID_list:
#Define the comments generator
def comments_generator(yt_service, video_id):
    comment_feed = yt_service.GetYouTubeVideoCommentFeed(video_id=video_id)
    while comment_feed is not None:
        for comment in comment_feed.entry:
            yield comment
        next_link = comment_feed.GetNextLink()
        if next_link is None:
            comment_feed = None
        else:
            comment_feed = yt_service.GetYouTubeVideoCommentFeed(next_link.href)

        for comment in comments_generator(yt_service, video_id):

            # About the video
            video_title = entry.media.title.text
            video_date = entry.published.text

            # About comments
            author_name = comment.author[0].name.text
            raw_text = comment.content.text 
            comment_date = comment.published.text

            # Keep only alphanumeric characters and spaces in the comment text
            text = re.sub(r'\W+', ' ', raw_text)

            # Write to a file ('a' means append) - Comment text is set to lowercase [.lower()]
            f = open('video_comments.tsv', 'a')
            f.write("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))

            # Also print results on screen - Comment text is set to lowercase [.lower()]
    print("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))
share|improve this question
1  
What's the problem? What is the expected input and output? Do you get a Traceback? The more info you give the easier it is to help, and the more people will be interested in trying. –  Gareth Webber Apr 22 '13 at 10:16
    
I am trying to get the code to fetch the comments for all videoIDs in the video_ids.txt file. But the code halts after having fetched comments only for the first video in the txt. Hope that clarifies things. –  textnet Apr 22 '13 at 11:11

1 Answer 1

After fix some bugs in your code:

import gdata.youtube
import gdata.youtube.service
import re

yt_service = gdata.youtube.service.YouTubeService()

# Set the videoID list
f = open('video_ids.txt', 'r')
videoID_list = f.read().splitlines()
f.close()

#Define the comments generator
def comments_generator(yt_service, video_id):
  comment_feed = yt_service.GetYouTubeVideoCommentFeed(video_id=video_id)
  while comment_feed is not None:
    for comment in comment_feed.entry:
      yield comment
    next_link = comment_feed.GetNextLink()
    if next_link is None:
      comment_feed = None
    else:
      comment_feed = yt_service.GetYouTubeVideoCommentFeed(next_link.href)

f = open('video_comments.tsv', 'a')

# Cycle through videoID list getting comments via the YouTube API
for video_id in videoID_list:

  for comment in comments_generator(yt_service, video_id):

    video_entry = yt_service.GetYouTubeVideoEntry(video_id=video_id)

    # About the video
    video_title = video_entry.title.text
    video_date = video_entry.published.text
    # About comments
    author_name = comment.author[0].name.text
    raw_text = comment.content.text
    comment_date = comment.published.text

    # Keep only alphanumeric characters and spaces in the comment text
    text = re.sub(r'\W+', ' ', raw_text)
    # Write to a file ('a' means append) - Comment text is set to lowercase [.lower()]

    f.write("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))


    # Also print results on screen - Comment text is set to lowercase [.lower()]
f.close()
print("{}\t{}\t{}\t{}\t{}\t{}\t\r".format(video_title, video_date[:10], comment_date[:10], comment_date[11:19], author_name, text.lower()))
share|improve this answer
    
Thanks mekegi! However I am trying to get the code to fetch the comments for all videoIDs in the video_ids.txt file. The version you suggested still stops after having fetched comments for the first video in the text file. Any ideas? –  textnet Apr 22 '13 at 11:10
    
f = open('video_comments.tsv', 'a') inside "for" bad idea. Open file before cicles and close after cicle. I edited code, try again –  mekegi Apr 22 '13 at 11:18
    
Thanks for helping out! But sorry. Still only gets comments for first video. To me it seems reasonable that 'for video_id in videoID_list' would read each videoID in that textfile, and then repeat the routine for getting all comments for all of them, appending to the outfile. But no success... –  textnet Apr 22 '13 at 15:44
    
in video_ids.txt after every video_id need newline sample "xxxxx\nyyyyy" –  mekegi Apr 23 '13 at 7:16
    
I got it to work now. The problem somehow seemed to be with getting both "video_entry." and "comment." data from the API. If choosing only one type, the loop will work as desired. –  textnet Apr 23 '13 at 11:17

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.