Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

Web-scraping library

up vote 0 down vote favorite

Here are two functions that make a request for a given URL and then takes the response body (HTML) and loads it into the cheerio library:

scrapeListing.js

var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');
var Listing = require('./listingClass.js');
var url = 'http://www.google.com';

var getResponseBody = (url) => {

return new Promise((resolve, reject) => {

    request({
            headers: {
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0',
                'Content-Type': 'application/x-www-form-urlencoded'
            },
            method: 'GET',
            url: url
        },
        (err, response, body) => {
            if (err) {
                reject({err})
            }
            else {
                resolve({err, response, body})
            }
        });
    })
};


var getCheerioInstance = (html) => {
    if (html){
        return cheerio.load(html);
    }
    else throw Error('no HTML body for cheerio to load')
};


module.exports = {
   getResponseBody: getResponseBody,
   getCheerioInstance: getCheerioInstance
};

Here is a class for listing objects which have methods that use cheerio to parse data from HTML of the listing page.

Should the class only contain methods associated with parsing the HTML or should the class incorporate the above more general functions as methods? Or is it better still to make the above functions methods of a new class that gets the response for a URL?

listingClass.js

module.exports = class Listing {
constructor(url, location){
    this.url = url;
    this.location = location;
}

// example method that retrieves data from html response
getTitle(){
    var title = this.html('div.js-details-column>div').contents(":not(:empty)").first().text();
    this.title = title
    }
}

edited Mar 14 at 1:08

Jamal♦

28.3k10105211

asked Mar 14 at 0:19

therewillbecode

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged javascript oop node.js web-scraping ecmascript-6 or ask your own question.

question feed

asked	7 months ago
viewed	41 times

current community

your communities

more stack exchange communities

Web-scraping library

Your Answer

Browse other questions tagged javascript oop node.js web-scraping ecmascript-6 or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Web-scraping library

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged javascript oop node.js web-scraping ecmascript-6 or ask your own question.

Related

Hot Network Questions