A computer program that accesses web pages for various purposes (to scrape content, to provide search engines with information about your site, etc.).

learn more… | top users | synonyms

2
votes
1answer
31 views

What does “all” mean in meta robots tag?

Several websites have the following meta tag aimed at search engines like Google, Bing, etc.: <meta name="robots" content="all" /> What does it do?
2
votes
0answers
24 views

Which tools can help limit maximum page view per ip to limit scrapers and bots?

I would like to prevent scrapers grabbing all my content except Google, Bing and other search engines. I Am thinking of going with Fail2ban and limiting hits from an ip maybe at around 1000 per day. ...
5
votes
2answers
100 views

Which meta “robots” tag gets preference?

My wife works at a high school in Germany. I recently noticed that it's extremely hard to find that school's homepage using Google. I looked at the source code of the page and I believe I've found the ...
1
vote
1answer
31 views

Different Blocked URLS Stats in Google Webmaster

Just want to know i got different blocked urls stats at webmaster. I blocked entire site for revamp and latest builds implementation. Here are different stats Under Health tab> Blocked URls>2,580 ...
1
vote
2answers
45 views

Can Google show some pages deleted six months back and blocked using Robot.txt?

If I have deleted some pages on my website six months back and blocked these pages using Robots.txt, can that non-existent page show up on Google?
3
votes
2answers
123 views

How can I prevent Googlebot from indexing web service URL's?

I have a problem with googlebot and bingbot accessing our web service. We have a search application created in aspx that is integrated into a drupal website. Users perform searches based on postal ...
5
votes
2answers
349 views

Should I block bots from my site and why?

My logs are full of bot visitors, often from Eastern Europe and China. The bots are identified as Ahrefs, Seznam, LSSRocketCrawler, Yandex, Sogou and so on. Should I block these bots from my site and ...
0
votes
0answers
131 views

Is there a way to make Alexa's ia_archiver slow down its crawling of my website?

Alexa's ia_archiver bot is the main contributor to the Internet Archive's "Wayback Machine" web collection, and there are advantages to having my website included in that collection. There are other ...
1
vote
0answers
79 views

Methods for keeping no-good scrapers off the site? [duplicate]

Possible Duplicate: Tactics for dealing with misbehaving robots I want to keep no-good scrapers (aka. bad bots) that steal content and consume bandwidth off my site. At the same time, I do ...
1
vote
1answer
46 views

What dangers await if I block non-standard, non-major-usa search engine bots from my USA only website?

I noticed tons of bandwidth being used by non-USA search engine bots, so I began blocking them in an effort to save bandwidth and cpu cycles for actual users and the search engines they come from ...
0
votes
1answer
114 views

mod_rewrite and SEO friendliness

My website has an atypical structure and I'm not sure if this could create problems in the long run, specially for SEO positioning purposes. I have a unique, large PHP script, and I use the Apache ...
4
votes
2answers
165 views

Number of page requests by any Bot in 5 secs

I am writing a script that will block any bot that requests page(s) for example X times in the past 5 secs. I need to find the X here. Do you guys know some approx values I can use?
0
votes
0answers
62 views

Facebook crawling [closed]

Does a normal person is allowed to crawl all the publically visible links of facebook for research purpose with the help of automated bots? And whether there is any API provided by ...
2
votes
1answer
128 views

How to hide download file from bots? [duplicate]

Possible Duplicate: How to restrict the download of all files in a folder? I want to make a private file available for download but not use username/password protection. I want to put the ...
1
vote
1answer
81 views

Set initial robots.txt restrictions so high, Googlebot won't even try to access the updated less strict file.

When I initially setup my blog I put the restrictions on robots.txt way too high. They were so stick that googlebot can't even access the udated less strict robots file. I've been at this problem for ...

1 2 3 4
15 30 50 per page