Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

What I'm trying to do:

  1. Get user input for Google Play App page.

    e.g. https://play.google.com/store/apps/details?id=jp.scn.android

  2. Scrape 100 reviews from Google Play App and organize them into an array.

My JavaScript function:

function test() {
    var urlAdd = document.getElementById('input').value;
    var urlEnglish = urlAdd + '&hl=en';

    var query = {
    url: urlEnglish,
    type: 'html',
    selector: '[class=review-body]',
    extract: 'text'
    },
    request;

    request = 'http://example.noodlejs.com/?q=' +
    encodeURIComponent(JSON.stringify(query)) +
    '&callback=?';

    jQuery.getJSON(request, function (data) {
    document.getElementById('output').innerHTML = '<pre>' +
    JSON.stringify(data, null, 4) + '</pre>';
    })
};

Problems:

  1. This function feels bulky.

  2. It only returns 20 results. My guess is that this has something to do with how the Google Play DOM retrieves reviews.

Questions:

  1. How do I streamline/improve my scraper function?

  2. How do I get 100 results or more?

Comments:

My apologies if this is a simple solution. I just started learning JavaScript piecemeal in February.

share|improve this question
add comment

1 Answer

Use scraping as last resort. I suggest you find available APIs for that. It's more robust, and easy to work with. One issue with scraping is that it fetches the static HTML generated by the url. You can't fetch what is loaded via JS. Although there are ways to do it, it's just adds too much complexity.

In terms of hacking the code, you could check out how Google Play loads the rest of the comments. It should be an AJAX call, check the network tab of your browser. You checkout that url, the response and modify it to your needs.

As for your script: You're using jQuery, use it all the way!

// Cache as well as put configurables here
var input = $('#input');
var output = $('#output');
var noodleUrl = 'http://example.noodlejs.com/?';

var query = {
    // You should check if the url ends with parameters. Otherwise, this fails.
    url: input.val() + '&hl=en',
    type: 'html',
    selector: '[class=review-body]',
    extract: 'text'
};

// $.param
var request = noodleUrl + $.param({
  q : JSON.stringify(query),
  callback : '?'
});

$.getJSON(request, function (data) {
    output.html('<pre>' + JSON.stringify(data, null, 4) + '</pre>');
}
share|improve this answer
    
Hi Joseph, I really need some guidance. I have tried searching everywhere for an API to use with Google Play and I have not found anything worthwhile. Any suggestions? If I cannot go the API route, where should I start with getting multiple reviews from Google Play using web scraping? (I did not find an AJAX call in the network tab, instead it looked like json, but I'm not sure) –  Coder 314 Apr 24 at 21:49
    
@Coder314 You can filter network requests. Select XHR in filters (Chrome dev tools), and you should see what's using XHR. You see JSON? That could be the data. –  Joseph the Dreamer Apr 24 at 21:51
    
Link to Plants VS Zombies App Details: play.google.com/store/apps/details?id=com.ea.game.pvz2_na How would you get the most reviews from this site? –  Coder 314 Apr 24 at 22:33
    
I know this is essentially the original question, but I need some evidence with a real example that says, "there is an API way" or "scraping two pages of Google Play is possible using X", keeping in mind that I want to use JavaScript. –  Coder 314 Apr 24 at 23:59
    
@Coder314 Load the page and open the dev tools networking tab. Clear the logs first. Go to the reviews section and click the arrow to the right. Then you'll start to see the network tab flood with requests. There'll be one named getReviews with a JSON response containing the reviews. That's the request you'll need to replicate. Note that there is a token parameter indicating that each request needs authorization. You need to find where the script got it in order to get the data. –  Joseph the Dreamer Apr 25 at 3:02
show 3 more comments

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.