The tag has no wiki summary.

learn more… | top users | synonyms

2
votes
1answer
99 views

Preventing crawler from interfering with user tracking

I'm scraping text from various webshops (no images/videos or other data). I'm no expert on user tracking, so I'd like to know if there's a way for me to write my crawler so it won't interfere with the ...
-1
votes
4answers
364 views

Blocking IP address for web-scraping service

Background Consider the following scenario: Link. User provides a link to some poorly formatted website (e.g., creative commons content). Scrape. Server downloads the content (web scrape), always ...
19
votes
3answers
1k views

What will happen if I don't follow robots.txt while crawling?

I am new to web crawling and I am testing my crawlers. I have been doings tests on various sites for testing. I forgot about robots.txt file during my tests. I just want to know what will happen if ...
0
votes
3answers
199 views

From where do financial firms obtain the stock data to analyze [closed]

I've been thinking about the subject and I've always wondered, from where do they obtain the data that is analized in applications, I've looked at Nasdaq's website and they do not seem to have any ...
1
vote
1answer
549 views

Improving performance for web scraping code

I have a website in which the code scrapes other websites for getting the accurate data. While the code works good but there a decent lag in performance because the code firsts downloads the html ...
0
votes
2answers
214 views

How do I capture information from a website that doesn't provide an API?

Do you know any good tutorials, frameworks, anything that can help me to write code that captures information from a website that don't have a public API, or hasn't been written in a RESTful way? ...
0
votes
2answers
891 views

A good tool for browser automation/client-side Web scripting [closed]

I'm interested in adopting a tool/scripting language to automate some daily tasks connected with fighting forum spammers. A brief overview of these tasks: analyze new registrations and posts on a ...
2
votes
5answers
620 views

Data Scraping - One application or multiple?

I have 30+ sources of data I scrape daily in various formats (xml, html, csv). Over the last three years Ive built 20 or so c# console applications that go out, download the data and re-format it into ...
8
votes
4answers
3k views

Patterns and practices for Web Scraping in .Net (C#)

I will be putting together an application to automate an external web site/application. In some instances I will need to navigate the site as a user would (some links I need to follow cannot be ...
2
votes
2answers
146 views

What tools can be used to get a reference like document.frames.item(0).document.innerhtml?

I want to scrape things from a web page. The way I want to do this is to extract the text of a DOM element (I believe this is the correct description). This means that I will to retrieve the text of ...
43
votes
7answers
2k views

How to be a good citizen when crawling web sites?

I'm going to be developing some functionality that will crawl various public web sites and process/aggregate the data on them. Nothing sinister like looking for e-mail addresses - in fact it's ...
4
votes
4answers
1k views

Which language is the most flexible for scraping websites?

I'm new to programming. I know a little python and a little objective c, and I've been going through tutorials for each. Then it occurred to me, I need to know which language is more flexible (python, ...