vote up 0 vote down
star

I'm trying to download an image from a url. The process I wrote works for everyone except for ONE content provider that we're dealing with.

When I access their JPGs via Firefox, everything looks kosher (happy Passover, btw). However, when I use my process I either:

A) get a 404 or

B) in the debugger when I set a break point at the URL line (URL url = new URL(str);) then after the connection I DO get a file but it's not a .jpg, but rather some HTML that they're producing with generic links and stuff. I don't see a redirect code, though! It comes back as 200.

Here's my code...

URL url = new URL(urlString);       		
URLConnection uc = url.openConnection();    	
String val = uc.getHeaderField(0);
System.out.println("FOUND OBJECT OF TYPE:" + contType);
if(!val.contains("200")){   				      
  //problem
}
else{
    is = uc.getInputStream();
}

Has anyone seen anything of this nature? I'm thinking maybe it's some mime type issue, but that's just a total guess... I'm completely stumped.

flag
add comment

4 Answers

vote up 3 vote down
check
if(!val.contains("200")) // ...

First of all, I would suggest you to use this useful class called HttpURLConnection, which provides the method getResponseCode()

Searching the whole data for some '200' implies

  1. performance issues, and
  2. inconsistency (binary files can contain some '200')
link|flag
add comment
vote up 3 vote down

Have you tried using WireShark to see exactly what packets are going back and forth? This is often the fastest way to see what is different. That is:

  1. First run WireShark when using FireFox to get the GIF, and then
  2. Run WireShark to use your code to get it.

Then compare and contrast the packets in both directions and I almost guarantee that you'll see something different in the HTTP headers or some other part of the traffic that will explain the problem.

link|flag
add comment
vote up 4 vote down

Maybe the site is just using some kind of protection to prevent others from hotlinking their images or to disallow mass downloads.

They usually check either the HTTP referrer (it must be from their own domain), or the user agent (must be a browser, not a download manager). Set both and try it again.

link|flag
add comment
vote up 0 vote down

All good guesses, but the "right" answer reward, I think, has to go to ivan_pertrovich_ivanovich_harkovich_rostropovitch_o'neil because using HttpURLConnection I was able to see that, in fact, before getting a 404, I'm first getting a 301. So, now, it's just a matter of finding out from these people what they're expecting in the header, which would make them less inclined to redirect me.

thanks for the suggestion.

link|flag
Well, you guessed right, my real name is not Ivan Ivanovich Ivanoff, but you should know, there are really people who are named this way ;) (Though my first name is Ivan really)... The middle name in Russia is patronymic ( en.wikipedia.org/wiki/Patronymic ) – ivan_ivanovich_ivanoff Apr 10 at 20:49
I am well familiar with the nature of middle names in Russian (my own being Grigoryavich) but I didn't know it was known as "patronymic"... how can I grade your answer even MORE highly?! – Dr.Dredel Apr 10 at 22:03
add comment

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.