The web-scraping tag has no wiki summary.
2
votes
1answer
79 views
Preventing crawler from interfering with user tracking
I'm scraping text from various webshops (no images/videos or other data). I'm no expert on user tracking, so I'd like to know if there's a way for me to write my crawler so it won't interfere with the ...
-2
votes
4answers
333 views
Blocking IP address for web-scraping service
Background
Consider the following scenario:
Link. User provides a link to some poorly formatted website (e.g., creative commons content).
Scrape. Server downloads the content (web scrape), always ...
19
votes
3answers
1k views
What will happen if I don't follow robots.txt while crawling?
I am new to web crawling and I am testing my crawlers. I have been doings tests on various sites for testing. I forgot about robots.txt file during my tests.
I just want to know what will happen if ...
0
votes
3answers
179 views
From where do financial firms obtain the stock data to analyze [closed]
I've been thinking about the subject and I've always wondered, from where do they obtain the data that is analized in applications, I've looked at Nasdaq's website and they do not seem to have any ...
0
votes
1answer
479 views
Improving performance for web scraping code
I have a website in which the code scrapes other websites for getting the accurate data. While the code works good but there a decent lag in performance because the code firsts downloads the html ...
0
votes
2answers
205 views
How do I capture information from a website that doesn't provide an API?
Do you know any good tutorials, frameworks, anything that can help me to write code that captures information from a website that don't have a public API, or hasn't been written in a RESTful way?
...
0
votes
2answers
726 views
A good tool for browser automation/client-side Web scripting [closed]
I'm interested in adopting a tool/scripting language to automate some daily tasks connected with fighting forum spammers. A brief overview of these tasks: analyze new registrations and posts on a ...
2
votes
5answers
600 views
Data Scraping - One application or multiple?
I have 30+ sources of data I scrape daily in various formats (xml, html, csv). Over the last three years Ive built 20 or so c# console applications that go out, download the data and re-format it into ...
7
votes
4answers
3k views
Patterns and practices for Web Scraping in .Net (C#)
I will be putting together an application to automate an external web site/application. In some instances I will need to navigate the site as a user would (some links I need to follow cannot be ...
43
votes
7answers
2k views
How to be a good citizen when crawling web sites?
I'm going to be developing some functionality that will crawl various public web sites and process/aggregate the data on them. Nothing sinister like looking for e-mail addresses - in fact it's ...
2
votes
2answers
144 views
What tools can be used to get a reference like document.frames.item(0).document.innerhtml?
I want to scrape things from a web page. The way I want to do this is to extract the text of a DOM element (I believe this is the correct description). This means that I will to retrieve the text of ...
4
votes
4answers
1k views
Which language is the most flexible for scraping websites?
I'm new to programming. I know a little python and a little objective c, and I've been going through tutorials for each. Then it occurred to me, I need to know which language is more flexible (python, ...