Tagged Questions
-7
votes
0answers
61 views
Looking to build a web-scraper that searches Google for emails [closed]
I'm looking to build or use a preexisting web scraper to search Google for emails. For example, I have an Excel spreadsheet with a list of emails and I'm trying to find their owners or who they may be ...
0
votes
0answers
20 views
check if link doesnt redirect to standard page through python urllib2
I am trying to check through loop what all links are valid against which are not valid and are redirecting to some standard page
import urllib2
import csv
i=18509
yyy = ...
1
vote
1answer
18 views
scrape data from website that turned next page when scrolled to bottom using Python and BeautifulSoup
If I need to scrape data from website that load next page automatically when one scrolled to be bottom of the page (i.e. endless extending the page) using Python and Beautiful, how can I do that? Is ...
0
votes
1answer
32 views
Problems with BeautifulSoup and lxml parser
I noticed a strange behavior when scraping some webpages using BeautifulSoup 4.1.0 and the lxml parser. The built-in html.parser didn't work for the webpage I was trying to scrape and I decided to use ...
0
votes
1answer
29 views
Alter beautifulSoup code to extract words
here is some code I found online that will get prices (i.e. decimals) from websites. I need to alter this code so it doesn't return a decimal, but a string.
from bs4 import BeautifulSoup
import ...
0
votes
1answer
31 views
regarding web scraping - using urllib (maybe also beautifulsoup)
Website I'm scraping from: link
The tags I want to parse between: START - <p id="p-1">, FINISH - </p>
My code:
from urllib import urlopen
from bs4 import BeautifulSoup
import re
html = ...
0
votes
1answer
36 views
Python scraper mechanize/javascript
I have to scrape all info for former US governors from this site. However, to read out the results and then follow the links, I need to access the different results pages, or, preferably, simply set ...
0
votes
1answer
35 views
remote server performance when scraping
I have used the following problem in other questions, but this time my question regards server performance. And so, I decided to ask a new question.
I try to run the spider below. It only has to go ...
0
votes
1answer
35 views
Web scraping with Python, but values are empty
I'd like to grab values from this site: http://cdn.ime-co.ir/ with BeautifulSoup , but values are empty when I try to import tables. I think disabled with javascrip or anything that I don't know.
...
3
votes
2answers
73 views
Speed up web scraber
I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the job. It is, however, really slow (it ...
2
votes
1answer
32 views
python urllib.open(url) but return history data
I use python's urllib library for checking the update of one webpage every 5 sec.
But after I run the program a few hours, It seems that the urllib.open(url) just returns the outdated data.It usually ...
2
votes
1answer
42 views
How do I use Scrapy to crawl within pages?
I am using Python and Scrapy for this question.
I am attempting to crawl webpage A, which contains a list of links to webpages B1, B2, B3, ... Each B page contains a link to another page, C1, C2, C3, ...
-1
votes
2answers
21 views
Python Urllib2 module Error
Can somebody tell me what:
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
exactly means? Could not find it in the documentation.
Does it say that the URL is ...
0
votes
1answer
31 views
Problems with character-encoding when webscraping with scrapy
I have problem with the encoding of the text, I am scraping from a website. Specifically the Danish letters æ, ø, and å are coming out wrong. I feel confident that the encoding of the webpage is ...
0
votes
1answer
57 views
Can't follow links when web-scraping
I realize that others have covered similar topics, but having read these posts, I still can't solve my problem.
I am using Scrapy to write a crawl spider that should scrape search results pages. One ...