0

I have a python script that pings 12 pages on someExampleSite.com every 3 minutes. It's been working for a couple months but today I started receiving 404 errors for 6 of the pages every time it runs.

So I tried going to those urls on the pc that the script is running on and they load fine in Chrome and Safari. I've also tried changing the user agent string the script is using and that also didn't change anything. Also I tried removing the ['If-Modified-Since'] header which also didn't change anything.

Why would the server be sending my script a 404 for these 6 pages but on that same computer I can load them in Chrome and Safari just fine? (I made sure to do a hard refresh in Chrome and Safari and they still loaded)

I'm using urllib2 to make the request.

4
  • What server are you making the requests to? Is it possible that the server has a limit on how often you can make requests to prevent DDoS?
    – Scintillo
    Commented May 2, 2013 at 14:47
  • I thought about that, so instead of requesting all 12 I tried just requesting the 6 it's giving me a 404 on but I still got a 404 for all of them.
    – Cody C
    Commented May 2, 2013 at 14:49
  • If you haven't changed anything in your script it must be that the server software has changed. Are you sure that you correctly escape the urls? Also, is there any difference between the urls working correctly and the urls that don't work?
    – Scintillo
    Commented May 2, 2013 at 14:54
  • I can't detect any pattern to the urls. I came to the same conclusion, the server must have changed, I guess my question is what could it be differentiating on. ie I know it's not my ip because it works in Chrome and IE and also on another computer on the network. I know it also can't be user agent because I've tried changing that also.
    – Cody C
    Commented May 2, 2013 at 14:59

2 Answers 2

1

There could be multiple reasons for this, such as the server is rejecting your request based on missing headers, or throttling.

You could try and record your request header in chrome using HTTP Headers then use Python requests library to by adding all your browser headers in your request. Then you could try either changing or removing headers to see what exactly is happening.

0

So I figured out what the problem was.

The website is returning an erroneous response code for these 6 pages. Even though it's returning a 404, it's also returning the web page. Chrome and Safari seem to ignore the response code and display the page anyways, my script aborts on the 404.

2
  • Please submit this to The Daily WTF :D
    – Scintillo
    Commented May 2, 2013 at 15:57
  • Well, I was reading somewhere that this might be valid if the page has been removed or moved and global dns records haven't been updated yet.
    – Cody C
    Commented May 2, 2013 at 16:07

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.