Crawler or web crawler is a bot program that automatically walks through World Wide Web sites. Crawlers are important part of search engines.

learn more… | top users | synonyms

0
votes
0answers
3 views

Pay service for crawling ajax website

Is there any company that offer crawling Ajax #hash website and return DOM result HTML ? i do not want to generate DOM snapshot myself
-1
votes
0answers
20 views
0
votes
0answers
9 views

Custom BCS indexing connector with changelog inremental crawl is not working properly

I am writing a custom indexing connector using changelog incremental crawl approach. I'm using sample from http://msdn.microsoft.com/en-us/library/ff625800%28v=office.14%29.aspx and trying to change ...
3
votes
3answers
37 views

What is the best way to download <very large> number of pages from a list of urls?

I have a >100,000 urls (different domains) in a list that I want to download and save in a database for further processing and tinkering. Would it be wise to use scrapy instead of python's ...
0
votes
1answer
22 views

Crawling an .net site with subdomain

I am trying to crawl a .net site with php curl. The site that i am trying to crawl is http://waltham.patriotproperties.com i am able to crawl the site. But when i am trying to crawl internal ...
0
votes
1answer
17 views

How to add XML Content to form in Symfony

I don't find a way to add "button" or "input" fields to a form to the crawler in Symfony for testing. I'm doing this : $crawler = $this->client->request('GET', ''); $document = new ...
0
votes
1answer
34 views

Using scrapy, how to a crawl a page with checkbox that have onclick attribute?

I am using Scrapy to crawl some data from a webpage. The page has a form which contains multiple checkboxes and drop down menus, all of which need to be selected for the form to generate a data table. ...
1
vote
2answers
31 views

Scrapy crawler not able to crawl data from multiple pages

I am trying to scrap result of the following page : http://www.peekyou.com/work/autodesk/page=1 with page = 1,2,3,4 ... so on as per the results. So I am getting a php file to run the crawler run ...
0
votes
1answer
16 views

Crawlers that work with infinite scroll pages

I'm looking for a a crawler app that scans the javascript of the page for AJAX requests and looks for functions that execute AJAX calls thus getting the whole content from beginning to end. I would ...
0
votes
2answers
30 views

How to run my js function after clicked a button add the dom has reloaded

I want to use js to crawl a website, But the website use the ajax to paging the contents. At first, you can only crawl the first page content. the you must click a button(next page), the website use ...
1
vote
2answers
40 views

how to get the metadata of files stored in alfresco content management system?

Hi i have stored some files(pdf, html,doc) in alfresco cms, my requirement is to classify these files using metadata "content-type" filed in the alfresco cms, is it possible to do it by using only ...
0
votes
1answer
24 views

Crawler server side

I have a question about a particular server side functionality. I have a server (Linux) with a PostgreSQL database (server side developed in python). I would like to create a system that at regular ...
0
votes
0answers
41 views

C# web crawler class improvement [closed]

I've found this class and I'm planning to use it as a core for a simple web crawler. Here is the class: using System; using System.Collections.Generic; using System.Linq; using System.Text; using ...
1
vote
2answers
23 views

Symfony 2 Service Error on unit Testing

I make some functional test with Symfony 2 and phpunit. But i've some trouble with a Service. Let me explain. During my run test, i want to use some service used by the application. So i juste set my ...
-1
votes
1answer
58 views

Data scraping with scrapy [closed]

i want to make a new betting tool, but i need a database of odds and results and can't find anything in the web. I found this site that has great archive: OddsPortal All i want to do is scrape the ...

1 2 3 4 5 35
15 30 50 per page