crawling

Summary

Usage of HttpCompressionMiddleware needs to be relfected in Scrapy stats.

Motivation

In order to estimate scrapy memory usage efficiency and prevent.. memory leaks like this.
I will need to know:

number of request/response objects that can be active (can be achieved by using [trackref](https://docs.scrapy.org/en/latest/topi

Is your feature request related to a problem? Please describe.
Currently, there are services that secure website from automation tools like ferret. Some of them send 405 in response to the DOCUMENT function call that make a ferret script fail with an error even though a page is available (not the original page, but usually a page with the captcha).

Describe the solution you'd like
It

Main examples at Apify SDK webpage, Github repo and CLI templates should demonstrate how to manipulate with DOM and retrieve data from it.

Also add one example of scraping with Apify SDK + jQuery to https://sdk.apify.com/docs/examples/basiccrawler

Feedback from: https://medium.com/better-programming/do-i-need-python-scrapy-to-build-a-web-scraper-7cc7cac2081d

I lost an hour trying to make

We have different mixins in spidermon/contrib/monitors/mixins directory, but no documentation.

crawling

Here are 568 public repositories matching this topic...

scrapy / scrapy

Summary

Motivation

gocolly / colly

codelucas / newspaper

yujiosaka / headless-chrome-crawler

MontFerret / ferret

apify / apify-js

apache / nutch

transitive-bullshit / awesome-puppeteer

iawia002 / Lulu

MorvanZhou / easy-scraping-tutorial

clemfromspace / scrapy-selenium

slotix / dataflowkit

essandess / isp-data-pollution

oltarasenko / crawly

zhuyingda / webster

scrapinghub / spidermon

DarkSand / Sasila

infinitbyte / gopa

stopstalk / stopstalk-deployment

rivermont / spidy

alephdata / memorious

antchfx / antch

forkonlp / N2H4

trandoshan-io / crawler

jvandenaardweg / linkedin-profile-scraper

dimkouv / massivedl

N0taN3rd / Squidwarc

google / corpuscrawler

NateScarlet / holiday-cn

mehmetozkaya / DotnetCrawler

Improve this page

Add this topic to your repo