Build software better, together

scrapy / scrapy

28.6k

Scrapy, a fast high-level web crawling & scraping framework for Python.

python scraping crawling framework crawler

Python Updated Aug 11, 2018 46 issues need help

binux / pyspider

11.8k

A Powerful Spider(Web Crawler) System in Python.

python crawler

Python Updated Aug 11, 2018

code4craft / webmagic

6.9k

A scalable web crawler framework for Java.

crawler java scraping framework

Java Updated Aug 9, 2018

codelucas / newspaper

6.6k

News, full-text, and article metadata extraction in Python 3. Advanced docs:

python news crawler crawling scraper news-aggregator

Python Updated Aug 8, 2018

gocolly / colly

5.4k

Elegant Scraper and Crawler Framework for Golang

golang scraper framework crawler scraping crawling spider go

Go Updated Aug 10, 2018 1 issue needs help

henrylee2cn / pholcus

4.3k

[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.

crawler spider multi-interface golang distributed-crawler high-concurrency-crawler fastest-crawler cross-platform-crawler

Go Updated Jul 12, 2018

bda-research / node-crawler

4k

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

crawler javascript spider extract-data cheerio jquery nodejs

JavaScript Updated Jul 27, 2018

jhao104 / proxy_pool

3.9k

Python爬虫代理IP池(proxy pool)

crawler proxy proxypool spider ssdb flask schedule crawl

Python Updated Jul 10, 2018 2 issues need help

iawia002 / annie

3.8k

👾 Fast, simple and clean video downloader

downloader go crawler scraper video bilibili youtube youku iqiyi tumblr qq

Go Updated Aug 11, 2018 3 issues need help

rmax / scrapy-redis

3.4k

Redis-based components for Scrapy.

scrapy crawler distributed redis

Python Updated May 5, 2018

yujiosaka / headless-chrome-crawler

3.2k

Distributed crawler powered by Headless Chrome

headless-chrome puppeteer jquery crawler crawling scraper scraping chrome chromium promise

JavaScript Updated Aug 11, 2018

s0md3v / Photon

2.9k

Incredibly fast crawler which extracts urls, emails, files, website accounts and much more.

crawler spider python osint information-gathering

Python Updated Aug 11, 2018

BruceDone / awesome-crawler

2.9k

A collection of awesome web crawler,spider in different languages

web-crawler crawler web-scraper spider node-crawler scraper awesome

Updated Jul 17, 2018

gaojiuli / toapi

2.6k

Every web site provides APIs.

html json api python web spider crawler flask toapi

Python Updated Aug 4, 2018

SpiderClub / haipproxy

2.4k

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

high-availability scrapy ipproxy distributed redis crawler scheduler spider

Python Updated Jul 22, 2018

Chyroc / WechatSogou

2.3k

基于搜狗微信搜索的微信公众号爬虫接口

wechat sogou python crawler pypi scrapy

Python Updated Jun 26, 2018

Arachni / arachni

2.1k

Web Application Security Scanner Framework

arachni dom ruby audit detection security-audit analysis modular javascript scanners web-application vulnerability-detection crawler scanner hack hacking penetration-testing xss sql-injection

Ruby Updated Aug 4, 2018 2 issues need help

PuerkitoBio / gocrawl

1.6k

Polite, slim and concurrent web crawler.

crawler robots-txt

Go Updated Apr 29, 2018

gaojiuli / gain

1.6k

Web crawling framework based on asyncio.

python crawler spider asyncio uvloop aiohttp

Python Updated Mar 19, 2018

imWildCat / scylla

1.6k

Intelligent proxy pool for Humans™

crawler python3 scylla proxy-pool python

Python Updated Aug 2, 2018 3 issues need help

dotnetcore / DotnetSpider

1.4k

DotnetSpider, a .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and…

crawler dotnetcore cross-platform csharp distributed

C# Updated Aug 9, 2018

symfony / dom-crawler

1.3k

The DomCrawler component eases DOM navigation for HTML and XML documents.

php symfony component symfony-component crawler dom-crawler

PHP Updated Aug 9, 2018

hu17889 / go_spider

1.3k

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be ex…

spider crawler go schedule pipeline

Go Updated Nov 16, 2017

xtuhcy / gecco

1.2k

Easy to use lightweight web crawler（易用的轻量化网络爬虫）

gecco crawler jsoup fastjson java dynamic

Java Updated Jul 6, 2018

injetlee / Python

1.2k

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号

python crawler wechat excel

Python Updated Jul 29, 2018

felipecsl / wombat

1.1k

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

ruby scraper crawler dsl

Ruby Updated Aug 2, 2018

github / lightcrawler

1.1k

Crawl a website and run it through Google lighthouse

google-lighthouse chrome crawler

JavaScript Updated Feb 22, 2018

xianhu / PSpider

984

简单易用的Python爬虫框架，QQ交流群：597510560

crawler spider python distributed proxies web-spider multi-threading

Python Updated Aug 2, 2018

LeonardoCardoso / SwiftLinkPreview

858

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

swift flow watchos tvos macos ios carthage cocoapods swift-package-manager crawler preview url website regular-expressions relevant-texts

Swift Updated Jun 25, 2018 5 issues need help

constverum / ProxyBroker

831

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭

proxy anonymity privacy socks http-proxy crawler proxy-server anonymous proxy-checker proxy-list proxypool proxies

Python Updated Jun 23, 2018

crawler

Repositories 2,465

scrapy / scrapy

binux / pyspider

code4craft / webmagic

codelucas / newspaper

gocolly / colly

henrylee2cn / pholcus

bda-research / node-crawler

jhao104 / proxy_pool

iawia002 / annie

rmax / scrapy-redis

yujiosaka / headless-chrome-crawler

s0md3v / Photon

BruceDone / awesome-crawler

gaojiuli / toapi

SpiderClub / haipproxy

Chyroc / WechatSogou

Arachni / arachni

PuerkitoBio / gocrawl

gaojiuli / gain

imWildCat / scylla

dotnetcore / DotnetSpider

symfony / dom-crawler

hu17889 / go_spider

xtuhcy / gecco

injetlee / Python

felipecsl / wombat

github / lightcrawler

xianhu / PSpider

LeonardoCardoso / SwiftLinkPreview

constverum / ProxyBroker

Features

Platform

Community

Company

Resources