Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Basically, a page generates some dynamic content, and I want to get that dynamic content, and not just the static html. I am not being able to do this with cURL. Help please.

share|improve this question
1  
that's impossible using just curl... –  dandavis Jun 12 at 22:30
 
You need to find out where to properly get it from. Likely, the js is either making an ajax call which you can curl to scrape the data or it's hardcoded in a js/html file that's loaded with the normal page load. –  Matt Berkowitz Jun 12 at 22:58

2 Answers

up vote 0 down vote accepted

you could try selenium at http://seleniumhq.org, which supports js.

share|improve this answer
 
This might be what I'm looking for. I'll try this. Thanks for the link. –  Sambrit Khan Jun 13 at 6:37

You can't with just cURL.

cURL will grab the specific raw (static) files from the site, but to get javascript generated content, you would have to put that content into a browser-like envirionment that supports javascript and all other host objects that the javascript uses so the script can run.

Then once the script runs, you would have to access the DOM to grab whatever content you wanted from it.

This is why most search engines don't index javascript-generated content. It's not easy.


If this is one specific site that you're trying to gather info on, you may want to look into exactly how the site gets the data itself and see if you can't get the data directly from that source. For example, is the data embedded in JS in the page (in which case you can just parse out that JS) or is the JS obtained from an ajax call (in which case you can maybe just make that ajax call directly) or some other method.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.