Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I'm about learning about implementing the solid principle in PHP.

I want to create simple content crawler/grabber from some websites. This crawler will grab the content from the website url. Since we know that every website has different html tag structure, so I created an interface.

<?php

interface BotInterface
{

    public function tembakURL();

    public function parseBody();
}

Then implemented the interface into child class. My plan is to create a class for every website. Here's an example.

<?php

class BotYogyes implements BotInterface
{

    protected $rest;
    protected $url;
    protected $body;

    public function __construct($url, \GuzzleHttp\Client $rest) {
        $this->rest = $rest;
        $this->url = $url;
    }

    public function tembakURL() {
        $response = $this->rest->get($this->url);
        $code = $response->getStatusCode();
        $body = $response->getBody();
        if ($code == 200) {
            $this->body = $body;
        } else {
            throw new Exception("Gagal menembak url [$code]");
        }
    }

    public function parseBody() {
        $html = implode('', $this->unfoldLines($this->body));
        #judul
        preg_match_all('/\<h1\>(.*)\<div\sclass="social\-share"\>/', $html, $header, PREG_PATTERN_ORDER);
        preg_match_all('/\<h1\>(.*)\<\/h1\>/', $header[0][0], $judul, PREG_PATTERN_ORDER);
        preg_match_all('/\<p\sclass="address"\>(.*?)\s\(\<a/', $header[0][0], $alamat, PREG_PATTERN_ORDER);
        preg_match_all('/\<p\sclass="meta\-description"\>(.*?)\<\/p>/', $header[0][0], $deskripsi, PREG_PATTERN_ORDER);
        #konten
        preg_match_all('/\<div\sid="photo\-gallery\-mini"\>(.*)\<\!\-\-\sinsertRelatedArticle\(\)\s\-\-\>/', $html, $match, PREG_PATTERN_ORDER);
        preg_match_all('/\<p\>(.*?)\<\/p\>/', $match[0][0], $paragraf, PREG_PATTERN_ORDER);
        #peta
        preg_match_all('/\<p\sclass="address"\>GPS\sCoordinate:\s\<a[^\>]*\>(.*?)\<\/a\>\<\/p\>/', $html, $peta, PREG_PATTERN_ORDER);
        file_put_contents('/vagrant/tmp/anarky.txt', var_export(array($judul, $alamat, $deskripsi, $paragraf, $peta), true));
    }

    private function unfoldLines($content) {
        $data = array();
        $content = explode("\n", $content);
        for ($i = 0; $i < count($content); $i++) {
            $line = rtrim($content[$i]);
            while (isset($content[$i + 1]) && strlen($content[$i + 1]) > 0 && ($content[$i + 1]{0} == ' ' || $content[$i + 1]{0} == "\t" )) {
                $line .= rtrim(substr($content[++$i], 1));
            }
            $data[] = $line;
        }
        return $data;
    }

}

This how the classes will be used:

<?php
try {
            $rest = new \GuzzleHttp\Client();
            $sync = new BotYogyes('http://website.to/grab/content', $rest);
            $sync->tembakURL();
            $sync->parseBody();
        } catch (Exception $e) {
            watchdog('bot_yogyes', $e->getMessage(), null, WATCHDOG_ERROR, null);
        }

These code blocks work properly, but, am I it doing right?

share|improve this question
    
The desire to improve code is implied for all questions on this site. Question titles should reflect the purpose of the code, not how you wish to have it reworked. See How to Ask. –  Jamal Jul 27 at 23:16

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.