The tag has no wiki summary.

learn more… | top users | synonyms (3)

3
votes
2answers
53 views

Can the .htaccess file slow down a website to a crawl? If so, are there better ways to solve these problems with different rewrite rules and such?

here is my htaccess file...... RewriteCond %{REQUEST_URI} ^/patients/billing/FAQ_billing\.html$ [OR] RewriteCond %{REQUEST_URI} ^/patients/billing/getintouch\.html$ RewriteRule ...
0
votes
0answers
15 views

SEO - Big revamp of my URL structure - please suggestions [duplicate]

I have a big network of sites in many countries. There are over 30-40 Million indexed pages in 34 countries, what will be re-indexed hopefully and will be redirected with 301 to the new URL structure, ...
0
votes
1answer
22 views

Google adsense crawler unable to access your page [closed]

I have a problem which I have been trying to solve by Googling it but it seems I can’t get no answer. I have been using plain PHP and HTML on my site and everything was ok concerning my Google ...
0
votes
2answers
51 views

Googlebot ignore my new site [duplicate]

I have the following situation: I had a blog (made using Joomla) that in some month was pretty indexed by Google. For some technical problem I have delete it and I have newly created using WordPress ...
0
votes
3answers
115 views

Is this Google proxy a fake crawler: google-proxy-66-249-81-131.google.com?

Recently I discover that some variants of a google proxy visits my sites. I doubt these are legal google crawlers because these crawlers are NOT always behind a proxy (like the hostname describes) and ...
3
votes
0answers
243 views

What is a good open source web crawler? [closed]

I'm looking for a good open source web crawler and i found these: DataparkSearch, GNU Wget, GRUB, Heritrix, ht://Dig, HTTrack, ICDL, mnoGoSearch, Nutch, Open Search Server, PHP-Crawler, tkWWW Robot, ...
1
vote
1answer
81 views

SEO preparation for upcoming new site

I booked a domain and I created an image saying that the site is soon to be launched. What SEO preparations should I make to avoid harming my SEO ranking? Because my page is blank, the image itself ...
22
votes
1answer
3k views

Bingbot request for trafficbasedsspsitemap.xml which does not exist

The logs for a website I manage show a request for a non-existent file by Bingbot. The details of the request are Path: /trafficbasedsspsitemap.xml Useragent: "Mozilla/5.0 (compatible; bingbot/2.0; ...
2
votes
1answer
55 views

Is there a spider / link checker that can start deep inside a login-protected site

We use vendor hosted Blackboard for our distance education courses, but host course multimedia on our own servers. The multimedia server has been moved and the domain has changed. Blackboard DBAs have ...
1
vote
1answer
57 views

Crawling retail websites to use as templates

I'm setting up a commerce website. I really like the Overstock website, and I was wondering if could simply crawl the website for the webpages, remove Overstock's logos, and use their website as a ...
3
votes
1answer
138 views

Preventing botvisits to website

Every time some user shares my website's address inside his/her tweets, the following bots come to my website: UnwindFetchor/1.0 (+http://www.gnip.com/) ShowyouBot (http://showyou.com/crawler) ...
2
votes
2answers
82 views

Prevent crawler that doesn't honoring robots.txt

I have some problem, when I try to write robots.txt for my site ... I find some issues by search on Google, and tell me about honor and not honoring robots.txt, how I can prevent it, can I perform it ...
2
votes
1answer
86 views

How frequently Googlebot fetch sitemaps? Is it depending on page rank?

How much frequently google fetches sitemaps? I am now working with a high traffic website normally have 30 new posts per minute.But currently it provides sitemaps which links to new 100 posts(3 ...
2
votes
1answer
25 views

If C-Panel Indexing Manager sets a folder to “No Indexing” can it be crawled by a webcrawler?

People are able to view directories / folders on my site right now. So, they could go to mysite.com/images and see the full index. To prevent this, C-Panel offers an option to set a directory / ...
0
votes
0answers
17 views

Is this Anti-Scraping technique viable with Crawl-Delay? [duplicate]

Possible Duplicate: How do spambots work? I want to prevent web scrapers from abusing 1,000,000 on my website. I'd like to do this by returning a "503 Service Unavailable" error code for ...

1 2 3
15 30 50 per page