Tagged Questions

-7

votes

0answers

61 views

Looking to build a web-scraper that searches Google for emails [closed]

I'm looking to build or use a preexisting web scraper to search Google for emails. For example, I have an Excel spreadsheet with a list of emails and I'm trying to find their owners or who they may be ...

asked yesterday

Camo
1

votes

0answers

20 views

check if link doesnt redirect to standard page through python urllib2

I am trying to check through loop what all links are valid against which are not valid and are redirecting to some standard page import urllib2 import csv i=18509 yyy = ...

python csv web-scraping screen-scraping urllib2

asked yesterday

TTA wiki
12

vote

1answer

18 views

scrape data from website that turned next page when scrolled to bottom using Python and BeautifulSoup

If I need to scrape data from website that load next page automatically when one scrolled to be bottom of the page (i.e. endless extending the page) using Python and Beautiful, how can I do that? Is ...

python python-2.7 web-scraping beautifulsoup

asked 2 days ago

lokheart
2,56421248

votes

1answer

32 views

Problems with BeautifulSoup and lxml parser

I noticed a strange behavior when scraping some webpages using BeautifulSoup 4.1.0 and the lxml parser. The built-in html.parser didn't work for the webpage I was trying to scrape and I decided to use ...

python web-scraping beautifulsoup lxml

asked 2 days ago

LaGuille
172

votes

1answer

29 views

Alter beautifulSoup code to extract words

here is some code I found online that will get prices (i.e. decimals) from websites. I need to alter this code so it doesn't return a decimal, but a string. from bs4 import BeautifulSoup import ...

python string web-scraping decimal beautifulsoup

asked Jun 14 at 18:42

user1681664
766

votes

1answer

31 views

regarding web scraping - using urllib (maybe also beautifulsoup)

Website I'm scraping from: link The tags I want to parse between: START - <p id="p-1">, FINISH - </p> My code: from urllib import urlopen from bs4 import BeautifulSoup import re html = ...

python python-2.7 web-scraping

asked Jun 14 at 16:48

user2486910
1

votes

1answer

36 views

Python scraper mechanize/javascript

I have to scrape all info for former US governors from this site. However, to read out the results and then follow the links, I need to access the different results pages, or, preferably, simply set ...

javascript python web-scraping spider mechanize-python

asked Jun 13 at 11:19

ilprincipe
406212

votes

1answer

35 views

remote server performance when scraping

I have used the following problem in other questions, but this time my question regards server performance. And so, I decided to ask a new question. I try to run the spider below. It only has to go ...

python performance web-scraping scrapy

asked Jun 11 at 21:39

Mace
535

votes

1answer

35 views

Web scraping with Python, but values are empty

I'd like to grab values from this site: http://cdn.ime-co.ir/ with BeautifulSoup , but values are empty when I try to import tables. I think disabled with javascrip or anything that I don't know. ...

javascript python web-scraping beautifulsoup screen-scraping

asked Jun 10 at 20:26

user2468721
32

votes

2answers

73 views

Speed up web scraber

I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the job. It is, however, really slow (it ...

python performance web-scraping scrapy

asked Jun 10 at 17:42

Mace
535

votes

1answer

32 views

python urllib.open(url) but return history data

I use python's urllib library for checking the update of one webpage every 5 sec. But after I run the program a few hours, It seems that the urllib.open(url) just returns the outdated data.It usually ...

python session cookies web-scraping urllib

asked Jun 10 at 13:40

lhdgriver
7216

votes

1answer

42 views

How do I use Scrapy to crawl within pages?

I am using Python and Scrapy for this question. I am attempting to crawl webpage A, which contains a list of links to webpages B1, B2, B3, ... Each B page contains a link to another page, C1, C2, C3, ...

python web-scraping scrapy

asked Jun 10 at 0:48

sdasdadas
2,5231735

-1

votes

2answers

21 views

Python Urllib2 module Error

Can somebody tell me what: URLError: <urlopen error [Errno -3] Temporary failure in name resolution> exactly means? Could not find it in the documentation. Does it say that the URL is ...

python web-scraping urllib2

asked Jun 8 at 9:44

user2401772
326

votes

1answer

31 views

Problems with character-encoding when webscraping with scrapy

I have problem with the encoding of the text, I am scraping from a website. Specifically the Danish letters æ, ø, and å are coming out wrong. I feel confident that the encoding of the webpage is ...

python character-encoding web-scraping scrapy

asked Jun 7 at 18:58

Mace
535

votes

1answer

57 views

Can't follow links when web-scraping

I realize that others have covered similar topics, but having read these posts, I still can't solve my problem. I am using Scrapy to write a crawl spider that should scrape search results pages. One ...

python web-scraping screen-scraping scrapy

asked Jun 6 at 20:42

Mace
535

15 30 50 per page

newest web-scraping python questions feed

517

questions tagged

web-scraping python

beautifulsoup × 126
scrapy × 87
screen-scraping × 64
mechanize × 44
python-2.7 × 30
javascript × 30
urllib2 × 28
web-crawler × 26
html × 26
lxml × 26
selenium × 21
regex × 13
html-parsing × 12
http × 11
xml × 9
ajax × 9
urllib × 8
scraperwiki × 8
unicode × 7
json × 7
php × 6
xpath × 6
parsing × 6
url × 6
webdriver × 6

Tagged Questions

Related Tags