I'm about learning about implementing the solid principle in PHP.
I want to create simple content crawler/grabber from some websites. This crawler will grab the content from the website url. Since we know that every website has different html tag structure, so I created an interface.
<?php
interface BotInterface
{
public function tembakURL();
public function parseBody();
}
Then implemented the interface into child class. My plan is to create a class for every website. Here's an example.
<?php
class BotYogyes implements BotInterface
{
protected $rest;
protected $url;
protected $body;
public function __construct($url, \GuzzleHttp\Client $rest) {
$this->rest = $rest;
$this->url = $url;
}
public function tembakURL() {
$response = $this->rest->get($this->url);
$code = $response->getStatusCode();
$body = $response->getBody();
if ($code == 200) {
$this->body = $body;
} else {
throw new Exception("Gagal menembak url [$code]");
}
}
public function parseBody() {
$html = implode('', $this->unfoldLines($this->body));
#judul
preg_match_all('/\<h1\>(.*)\<div\sclass="social\-share"\>/', $html, $header, PREG_PATTERN_ORDER);
preg_match_all('/\<h1\>(.*)\<\/h1\>/', $header[0][0], $judul, PREG_PATTERN_ORDER);
preg_match_all('/\<p\sclass="address"\>(.*?)\s\(\<a/', $header[0][0], $alamat, PREG_PATTERN_ORDER);
preg_match_all('/\<p\sclass="meta\-description"\>(.*?)\<\/p>/', $header[0][0], $deskripsi, PREG_PATTERN_ORDER);
#konten
preg_match_all('/\<div\sid="photo\-gallery\-mini"\>(.*)\<\!\-\-\sinsertRelatedArticle\(\)\s\-\-\>/', $html, $match, PREG_PATTERN_ORDER);
preg_match_all('/\<p\>(.*?)\<\/p\>/', $match[0][0], $paragraf, PREG_PATTERN_ORDER);
#peta
preg_match_all('/\<p\sclass="address"\>GPS\sCoordinate:\s\<a[^\>]*\>(.*?)\<\/a\>\<\/p\>/', $html, $peta, PREG_PATTERN_ORDER);
file_put_contents('/vagrant/tmp/anarky.txt', var_export(array($judul, $alamat, $deskripsi, $paragraf, $peta), true));
}
private function unfoldLines($content) {
$data = array();
$content = explode("\n", $content);
for ($i = 0; $i < count($content); $i++) {
$line = rtrim($content[$i]);
while (isset($content[$i + 1]) && strlen($content[$i + 1]) > 0 && ($content[$i + 1]{0} == ' ' || $content[$i + 1]{0} == "\t" )) {
$line .= rtrim(substr($content[++$i], 1));
}
$data[] = $line;
}
return $data;
}
}
This how the classes will be used:
<?php
try {
$rest = new \GuzzleHttp\Client();
$sync = new BotYogyes('http://website.to/grab/content', $rest);
$sync->tembakURL();
$sync->parseBody();
} catch (Exception $e) {
watchdog('bot_yogyes', $e->getMessage(), null, WATCHDOG_ERROR, null);
}
These code blocks work properly, but, am I it doing right?